Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Getting an error when training a new text classifier - gold = GoldParse(doc, **gold)

See original GitHub issue

I am trying to train a custom dataset from Kaggle after successfully stepping through the spacy training example for the IMDB dataset for a text classifier. https://raw.githubusercontent.com/explosion/spaCy/master/examples/training/train_textcat.py

I am however getting an error when performing the training. My custom training dataset can be downloaded from here: https://www.kaggle.com/nicapotato/womens-ecommerce-clothing-reviews/data

Here is the code that I altered.

#train_data, _ = thinc.extra.datasets.imdb()
#Custom dataset
df = pd.read_csv("~/notebooks/data/ecom_reviews.csv")
train_data = df.apply(lambda row: (row['Review Text'],row['Recommended IND']), axis=1).tolist()
#End custom dataset

It seems to me that the train_data is exactly the same shape and yet I am getting an error when switching to my custom dataset compared to the imdb dataset.

Here is the full error.

Training the model...
LOSS 	  P  	  R  	  F  
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-19-2bef4fe1e613> in <module>()
     12                 texts, annotations = zip(*batch)
     13                 nlp.update(texts, annotations, sgd=optimizer, drop=0.2,
---> 14                            losses=losses)
     15             with textcat.model.use_params(optimizer.averages):
     16                 # evaluate on the dev data split off in load_data()

~/anaconda2/envs/ipykernel_py3/lib/python3.6/site-packages/spacy/language.py in update(self, docs, golds, drop, sgd, losses)
    397                 doc = self.make_doc(doc)
    398             if not isinstance(gold, GoldParse):
--> 399                 gold = GoldParse(doc, **gold)
    400             doc_objs.append(doc)
    401             gold_objs.append(gold)

gold.pyx in spacy.gold.GoldParse.__init__()

TypeError: 'float' object is not iterable
============================================================================

Your Environment

Info about spaCy

spaCy version: 2.0.9
Platform: Linux-4.4.0-21-generic-x86_64-with-debian-stretch-sid
Python version: 3.6.3
Models: en

Issue Analytics

State:
Created 5 years ago
Comments:6 (2 by maintainers)

Top GitHub Comments

1reaction

sublimotioncommented, Apr 6, 2018

My bad the data needed to be cleansed.

After dropping the null values I did not run into any issues.

I am pretty happy with the results. This model classifies the review if it has been recommended or not. It can then be used in turn to predict which review is most likely to be recommended.

Training the model… LOSS P R F
52.900 0.839 0.991 0.909 34.806 0.870 0.964 0.914 23.225 0.895 0.946 0.919 15.268 0.905 0.943 0.923 10.861 0.898 0.931 0.914 8.093 0.905 0.943 0.923 7.451 0.906 0.934 0.920 4.859 0.906 0.934 0.920 4.395 0.906 0.934 0.920 3.798 0.913 0.919 0.916 2.978 0.911 0.928 0.919 2.894 0.912 0.937 0.924 2.514 0.906 0.925 0.915 2.595 0.901 0.931 0.916 1.843 0.901 0.931 0.916 2.508 0.902 0.919 0.910 1.950 0.905 0.916 0.910 1.544 0.905 0.916 0.910 1.742 0.904 0.934 0.919 1.895 0.893 0.934 0.913

0reactions

lock[bot]commented, May 19, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Top Results From Across the Web

Cryptic error message when training a model after evaluation

I just struggled a long time with a cryptic error message and ... GoldParse(doc, entities=entities) res.append((doc, gold)) return res def ...

Error while doing multi-class classification in spacy

1 Answer 1 ... Please note in the code we are only training 5000 data points of the training data. I hope this...

Training Pipelines & Models · spaCy Usage Documentation

To train a model, you first need training data – examples of text, ... Whether to train on sequences with 'gold standard' sentence...

Unable to Train the model in Language studio - Single Label ...

I have created a project in Language studio- Single Label Text Classification. The dataset is completely labeled and ready for the training.

explosion/spaCy - Gitter

Hi everyone, I'm training a text classifier and want add more customized features, is there any related ... gold = GoldParse(doc, entities=entity_offsets)