question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Getting an error when training a new text classifier - gold = GoldParse(doc, **gold)

See original GitHub issue

I am trying to train a custom dataset from Kaggle after successfully stepping through the spacy training example for the IMDB dataset for a text classifier. https://raw.githubusercontent.com/explosion/spaCy/master/examples/training/train_textcat.py

I am however getting an error when performing the training. My custom training dataset can be downloaded from here: https://www.kaggle.com/nicapotato/womens-ecommerce-clothing-reviews/data

Here is the code that I altered.

#train_data, _ = thinc.extra.datasets.imdb()
#Custom dataset
df = pd.read_csv("~/notebooks/data/ecom_reviews.csv")
train_data = df.apply(lambda row: (row['Review Text'],row['Recommended IND']), axis=1).tolist()
#End custom dataset

It seems to me that the train_data is exactly the same shape and yet I am getting an error when switching to my custom dataset compared to the imdb dataset.

Here is the full error.

Training the model...
LOSS 	  P  	  R  	  F  
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-19-2bef4fe1e613> in <module>()
     12                 texts, annotations = zip(*batch)
     13                 nlp.update(texts, annotations, sgd=optimizer, drop=0.2,
---> 14                            losses=losses)
     15             with textcat.model.use_params(optimizer.averages):
     16                 # evaluate on the dev data split off in load_data()

~/anaconda2/envs/ipykernel_py3/lib/python3.6/site-packages/spacy/language.py in update(self, docs, golds, drop, sgd, losses)
    397                 doc = self.make_doc(doc)
    398             if not isinstance(gold, GoldParse):
--> 399                 gold = GoldParse(doc, **gold)
    400             doc_objs.append(doc)
    401             gold_objs.append(gold)

gold.pyx in spacy.gold.GoldParse.__init__()

TypeError: 'float' object is not iterable
============================================================================

Your Environment

Info about spaCy

  • spaCy version: 2.0.9
  • Platform: Linux-4.4.0-21-generic-x86_64-with-debian-stretch-sid
  • Python version: 3.6.3
  • Models: en

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
sublimotioncommented, Apr 6, 2018

My bad the data needed to be cleansed.

After dropping the null values I did not run into any issues.

I am pretty happy with the results. This model classifies the review if it has been recommended or not. It can then be used in turn to predict which review is most likely to be recommended.

Training the model… LOSS P R F
52.900 0.839 0.991 0.909 34.806 0.870 0.964 0.914 23.225 0.895 0.946 0.919 15.268 0.905 0.943 0.923 10.861 0.898 0.931 0.914 8.093 0.905 0.943 0.923 7.451 0.906 0.934 0.920 4.859 0.906 0.934 0.920 4.395 0.906 0.934 0.920 3.798 0.913 0.919 0.916 2.978 0.911 0.928 0.919 2.894 0.912 0.937 0.924 2.514 0.906 0.925 0.915 2.595 0.901 0.931 0.916 1.843 0.901 0.931 0.916 2.508 0.902 0.919 0.910 1.950 0.905 0.916 0.910 1.544 0.905 0.916 0.910 1.742 0.904 0.934 0.919 1.895 0.893 0.934 0.913

0reactions
lock[bot]commented, May 19, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Cryptic error message when training a model after evaluation
I just struggled a long time with a cryptic error message and ... GoldParse(doc, entities=entities) res.append((doc, gold)) return res def ...
Read more >
Error while doing multi-class classification in spacy
1 Answer 1 ... Please note in the code we are only training 5000 data points of the training data. I hope this...
Read more >
Training Pipelines & Models · spaCy Usage Documentation
To train a model, you first need training data – examples of text, ... Whether to train on sequences with 'gold standard' sentence...
Read more >
Unable to Train the model in Language studio - Single Label ...
I have created a project in Language studio- Single Label Text Classification. The dataset is completely labeled and ready for the training.
Read more >
explosion/spaCy - Gitter
Hi everyone, I'm training a text classifier and want add more customized features, is there any related ... gold = GoldParse(doc, entities=entity_offsets)
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found