question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Hello, my train csv file looks like

mbploreto:script loretoparisi$ head -n2 /root/spam_dataset.csv 
label	text
HAM	waiting waiting waiting waiting solitude stands by the window as someone said i tried hard to find you i found fake promises instead the thought behind to join the thought before i thought i was blind sometimes i feel i feel the way to live i thought i had strength to overcome these walls i thought i was wonderful memories keep together things now would you like to know how it feels to be always stuck in the past without any rest the thought behind to join the thought before i thought i was blind
SPAM	please every body click cross

so I have my configuration as string

"{input_features: [{name: text, type: text}], output_features: [{name: label, type: category}]}"

and I start training then:

ludwig train --data_csv /root/spam_dataset.csv --model_definition "{input_features: [{name: text, type: text}], output_features: [{name: label, type: category}]}"

Suddenly I get that error about the text field:

 _         _        _      
| |_  _ __| |_ __ _(_)__ _ 
| | || / _` \ V  V / / _` |
|_|\_,_\__,_|\_/\_/|_\__, |
                     |___/ 
ludwig v0.1.0 - Train

Experiment name: experiment
Model name: run
Output path: results/experiment_run_1


ludwig_version: '0.1.0'
command: ('ludwig train '
 '--data_csv /root/spam_dataset.csv --model_definition {input_features: '
 '[{name: text, type: text}], output_features: [{name: label, type: '
 'category}]}')
commit_hash: '98b82b3f56c0'
dataset_type: '/root/spam_dataset.csv'
model_definition: {   'combiner': {'type': 'concat'},
    'input_features': [   {   'encoder': 'parallel_cnn',
                              'level': 'word',
                              'name': 'text',
                              'tied_weights': None,
                              'type': 'text'}],
    'output_features': [   {   'dependencies': [],
                               'loss': {   'class_distance_temperature': 0,
                                           'class_weights': 1,
                                           'confidence_penalty': 0,
                                           'distortion': 1,
                                           'labels_smoothing': 0,
                                           'negative_samples': 0,
                                           'robust_lambda': 0,
                                           'sampler': None,
                                           'type': 'softmax_cross_entropy',
                                           'unique': False,
                                           'weight': 1},
                               'name': 'label',
                               'reduce_dependencies': 'sum',
                               'reduce_input': 'sum',
                               'top_k': 3,
                               'type': 'category'}],
    'preprocessing': {   'bag': {   'fill_value': '',
                                    'format': 'space',
                                    'lowercase': 10000,
                                    'missing_value_strategy': 'fill_with_const',
                                    'most_common': False},
                         'binary': {   'fill_value': 0,
                                       'missing_value_strategy': 'fill_with_const'},
                         'category': {   'fill_value': '<UNK>',
                                         'lowercase': False,
                                         'missing_value_strategy': 'fill_with_const',
                                         'most_common': 10000},
                         'force_split': False,
                         'image': {'missing_value_strategy': 'backfill'},
                         'numerical': {   'fill_value': 0,
                                          'missing_value_strategy': 'fill_with_const'},
                         'sequence': {   'fill_value': '',
                                         'format': 'space',
                                         'lowercase': False,
                                         'missing_value_strategy': 'fill_with_const',
                                         'most_common': 20000,
                                         'padding': 'right',
                                         'padding_symbol': '<PAD>',
                                         'sequence_length_limit': 256,
                                         'unknown_symbol': '<UNK>'},
                         'set': {   'fill_value': '',
                                    'format': 'space',
                                    'lowercase': False,
                                    'missing_value_strategy': 'fill_with_const',
                                    'most_common': 10000},
                         'split_probabilities': (0.7, 0.1, 0.2),
                         'stratify': None,
                         'text': {   'char_format': 'characters',
                                     'char_most_common': 70,
                                     'char_sequence_length_limit': 1024,
                                     'fill_value': '',
                                     'lowercase': True,
                                     'missing_value_strategy': 'fill_with_const',
                                     'padding': 'right',
                                     'padding_symbol': '<PAD>',
                                     'unknown_symbol': '<UNK>',
                                     'word_format': 'space_punct',
                                     'word_most_common': 20000,
                                     'word_sequence_length_limit': 256},
                         'timeseries': {   'fill_value': '',
                                           'format': 'space',
                                           'missing_value_strategy': 'fill_with_const',
                                           'padding': 'right',
                                           'padding_value': 0,
                                           'timeseries_length_limit': 256}},
    'training': {   'batch_size': 128,
                    'bucketing_field': None,
                    'decay': False,
                    'decay_rate': 0.96,
                    'decay_steps': 10000,
                    'dropout_rate': 0.0,
                    'early_stop': 3,
                    'epochs': 200,
                    'gradient_clipping': None,
                    'increase_batch_size_on_plateau': 0,
                    'increase_batch_size_on_plateau_max': 512,
                    'increase_batch_size_on_plateau_patience': 5,
                    'increase_batch_size_on_plateau_rate': 2,
                    'learning_rate': 0.001,
                    'learning_rate_warmup_epochs': 5,
                    'optimizer': {   'beta1': 0.9,
                                     'beta2': 0.999,
                                     'epsilon': 1e-08,
                                     'type': 'adam'},
                    'reduce_learning_rate_on_plateau': 0,
                    'reduce_learning_rate_on_plateau_patience': 5,
                    'reduce_learning_rate_on_plateau_rate': 0.5,
                    'regularization_lambda': 0,
                    'regularizer': 'l2',
                    'staircase': False,
                    'validation_field': 'combined',
                    'validation_measure': 'loss'}}


Using full raw csv, no hdf5 and json file with the same name have been found
Building dataset (it may take a while)
Traceback (most recent call last):
  File "/Users/loretoparisi/Documents/Projects/AI/ludwig/venv/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2656, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'text'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "ludwig", line 11, in <module>
    load_entry_point('ludwig==0.1.0', 'console_scripts', 'ludwig')()
  File "/Users/loretoparisi/Documents/Projects/AI/ludwig/venv/lib/python3.6/site-packages/ludwig-0.1.0-py3.6.egg/ludwig/cli.py", line 86, in main
  File "/Users/loretoparisi/Documents/Projects/AI/ludwig/venv/lib/python3.6/site-packages/ludwig-0.1.0-py3.6.egg/ludwig/cli.py", line 64, in __init__
  File "/Users/loretoparisi/Documents/Projects/AI/ludwig/venv/lib/python3.6/site-packages/ludwig-0.1.0-py3.6.egg/ludwig/cli.py", line 70, in train
  File "/Users/loretoparisi/Documents/Projects/AI/ludwig/venv/lib/python3.6/site-packages/ludwig-0.1.0-py3.6.egg/ludwig/train.py", line 663, in cli
  File "/Users/loretoparisi/Documents/Projects/AI/ludwig/venv/lib/python3.6/site-packages/ludwig-0.1.0-py3.6.egg/ludwig/train.py", line 224, in full_train
  File "/Users/loretoparisi/Documents/Projects/AI/ludwig/venv/lib/python3.6/site-packages/ludwig-0.1.0-py3.6.egg/ludwig/data/preprocessing.py", line 457, in preprocess_for_training
  File "/Users/loretoparisi/Documents/Projects/AI/ludwig/venv/lib/python3.6/site-packages/ludwig-0.1.0-py3.6.egg/ludwig/data/preprocessing.py", line 62, in build_dataset
  File "/Users/loretoparisi/Documents/Projects/AI/ludwig/venv/lib/python3.6/site-packages/ludwig-0.1.0-py3.6.egg/ludwig/data/preprocessing.py", line 83, in build_dataset_df
  File "/Users/loretoparisi/Documents/Projects/AI/ludwig/venv/lib/python3.6/site-packages/ludwig-0.1.0-py3.6.egg/ludwig/data/preprocessing.py", line 123, in build_metadata
  File "/Users/loretoparisi/Documents/Projects/AI/ludwig/venv/lib/python3.6/site-packages/pandas/core/frame.py", line 2927, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/Users/loretoparisi/Documents/Projects/AI/ludwig/venv/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2658, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'text'

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:9

github_iconTop GitHub Comments

1reaction
w4nderlustcommented, Feb 15, 2019

That’s a great suggestion, a good workaround until we implement a better solution for reading TSVs and other file formats.

1reaction
loretoparisicommented, Feb 20, 2019

@w4nderlust ah yeah you were right, that was the question here https://github.com/uber/ludwig/issues/66 Ok I will try to change \t to , 🤕

If anyone else is having the same issue:

When on macOS:

sed -i "" $'s/,/ /g' /root/spam_dataset.csv 
sed -i "" $'s/\t/,/g' /root/spam_dataset.csv

(keep an eye to the "" before the -i for inline replacement and to the ANSI-C style quoting since OSX sed does not recognize \t)

while on linux

sed -i "" $'s/,/ /g' /root/spam_dataset.csv 
sed -i "s/\t/,/g" /root/spam_dataset.csv

We should to replace every , in the dataset to a (as example) before converting tab to commas, otherwise we could have comma in some columns, breaking the resulting CSV file.

and if for some reason you have forget the header:

sed -i '' -e '1i\'$'\n''label,text' /root/spam_dataset.csv
Read more comments on GitHub >

github_iconTop Results From Across the Web

KeyError when Key exists - python - Stack Overflow
I have a file (tweetfile = a .txt file on my computer) with tweets and I'm trying to loop through the objects to...
Read more >
How to Fix KeyError Exceptions in Python - Rollbar
The Python KeyError is an exception that occurs when an attempt is made to access an item in a dictionary that does not...
Read more >
keyerror in Python – How to Fix Dictionary Error
When working with dictionaries in Python, a KeyError gets raised when you try to access an item that doesn't exist in a Python...
Read more >
Python KeyError: A Beginner's Guide % - Career Karma
A KeyError is raised when you try to access a value from a dictionary that does not exist. To solve a key error,...
Read more >
Python KeyError Exception Handling Examples - DigitalOcean
Python KeyError is raised when we try to access a key from dict, which doesn't exist. It's one of the built-in exception classes...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found