Getting an error when training a new text classifier - gold = GoldParse(doc, **gold)
See original GitHub issueI am trying to train a custom dataset from Kaggle after successfully stepping through the spacy training example for the IMDB dataset for a text classifier. https://raw.githubusercontent.com/explosion/spaCy/master/examples/training/train_textcat.py
I am however getting an error when performing the training. My custom training dataset can be downloaded from here: https://www.kaggle.com/nicapotato/womens-ecommerce-clothing-reviews/data
Here is the code that I altered.
#train_data, _ = thinc.extra.datasets.imdb()
#Custom dataset
df = pd.read_csv("~/notebooks/data/ecom_reviews.csv")
train_data = df.apply(lambda row: (row['Review Text'],row['Recommended IND']), axis=1).tolist()
#End custom dataset
It seems to me that the train_data is exactly the same shape and yet I am getting an error when switching to my custom dataset compared to the imdb dataset.
Here is the full error.
Training the model...
LOSS P R F
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-19-2bef4fe1e613> in <module>()
12 texts, annotations = zip(*batch)
13 nlp.update(texts, annotations, sgd=optimizer, drop=0.2,
---> 14 losses=losses)
15 with textcat.model.use_params(optimizer.averages):
16 # evaluate on the dev data split off in load_data()
~/anaconda2/envs/ipykernel_py3/lib/python3.6/site-packages/spacy/language.py in update(self, docs, golds, drop, sgd, losses)
397 doc = self.make_doc(doc)
398 if not isinstance(gold, GoldParse):
--> 399 gold = GoldParse(doc, **gold)
400 doc_objs.append(doc)
401 gold_objs.append(gold)
gold.pyx in spacy.gold.GoldParse.__init__()
TypeError: 'float' object is not iterable
============================================================================
Your Environment
Info about spaCy
- spaCy version: 2.0.9
- Platform: Linux-4.4.0-21-generic-x86_64-with-debian-stretch-sid
- Python version: 3.6.3
- Models: en
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (2 by maintainers)
Top Results From Across the Web
Cryptic error message when training a model after evaluation
I just struggled a long time with a cryptic error message and ... GoldParse(doc, entities=entities) res.append((doc, gold)) return res def ...
Read more >Error while doing multi-class classification in spacy
1 Answer 1 ... Please note in the code we are only training 5000 data points of the training data. I hope this...
Read more >Training Pipelines & Models · spaCy Usage Documentation
To train a model, you first need training data – examples of text, ... Whether to train on sequences with 'gold standard' sentence...
Read more >Unable to Train the model in Language studio - Single Label ...
I have created a project in Language studio- Single Label Text Classification. The dataset is completely labeled and ready for the training.
Read more >explosion/spaCy - Gitter
Hi everyone, I'm training a text classifier and want add more customized features, is there any related ... gold = GoldParse(doc, entities=entity_offsets)
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
My bad the data needed to be cleansed.
After dropping the null values I did not run into any issues.
I am pretty happy with the results. This model classifies the review if it has been recommended or not. It can then be used in turn to predict which review is most likely to be recommended.
Training the model… LOSS P R F
52.900 0.839 0.991 0.909 34.806 0.870 0.964 0.914 23.225 0.895 0.946 0.919 15.268 0.905 0.943 0.923 10.861 0.898 0.931 0.914 8.093 0.905 0.943 0.923 7.451 0.906 0.934 0.920 4.859 0.906 0.934 0.920 4.395 0.906 0.934 0.920 3.798 0.913 0.919 0.916 2.978 0.911 0.928 0.919 2.894 0.912 0.937 0.924 2.514 0.906 0.925 0.915 2.595 0.901 0.931 0.916 1.843 0.901 0.931 0.916 2.508 0.902 0.919 0.910 1.950 0.905 0.916 0.910 1.544 0.905 0.916 0.910 1.742 0.904 0.934 0.919 1.895 0.893 0.934 0.913
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.