Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

KeyError: None of the keys are in index

See original GitHub issue

I am getting this error ,

KeyError: ("None of [Index(['CLASS_1', 'CLASS_2', 'CLASS_3', 'CLASS_4', 'CLASS_5', 'CLASS_6',\n       'CLASS_7', 'CLASS_8', 'CLASS_9', 'CLASS_10', 'CLASS_11', 'CLASS_12',\n       'CLASS_13', 'CLASS_14', 'CLASS_15', 'CLASS_16', 'CLASS_17', 'CLASS_E',\n       'CLASS_V'],\n      dtype='object')] are in the [index]", 'occurred at index 0')

When i run the new_toxic_multilabel.ipynb from sample notebook, I am getting this error for command:

databunch = BertDataBunch(args['data_dir'], LABEL_PATH, args.model_name, train_file='train.csv', val_file='val.csv',
                          test_data='test.csv',
                          text_col="NOTES", label_col=label_cols,
                          batch_size_per_gpu=args['train_batch_size'], max_seq_length=args['max_seq_length'], 
                          multi_gpu=args.multi_gpu, multi_label=True, model_type=args.model_type)

here is my label_col:

label_cols = ['CLASS_1','CLASS_2','CLASS_3','CLASS_4','CLASS_5','CLASS_6','CLASS_7','CLASS_8','CLASS_9','CLASS_10','CLASS_11','CLASS_12','CLASS_13','CLASS_14','CLASS_15','CLASS_16','CLASS_17','CLASS_E','CLASS_V']

my labels.csv contains the same classes but listed one after the another: labels.csv:


'CLASS_1'
'CLASS_2'
'CLASS_3'
'CLASS_4'
'CLASS_5'
'CLASS_6'
'CLASS_7'
'CLASS_8'
'CLASS_9'
'CLASS_10'
'CLASS_11'
'CLASS_12'
'CLASS_13'
'CLASS_14'
'CLASS_15'
'CLASS_16'
'CLASS_17'
'CLASS_E'
'CLASS_V'

Here is the traceback:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-13-c5a2ac3a5e99> in <module>
      3                           text_col="NOTES", label_col=label_cols,
      4                           batch_size_per_gpu=args['train_batch_size'], max_seq_length=args['max_seq_length'],
----> 5                           multi_gpu=args.multi_gpu, multi_label=True, model_type=args.model_type)

~/virtualenvs/anaconda3/envs/pytorch/lib/python3.7/site-packages/fast_bert/data_cls.py in __init__(self, data_dir, label_dir, tokenizer, train_file, val_file, test_data, label_file, text_col, label_col, batch_size_per_gpu, max_seq_length, multi_gpu, multi_label, backend, model_type, logger, clear_cache, no_cache)
    352             if os.path.exists(cached_features_file) == False or self.no_cache == True:
    353                 train_examples = processor.get_train_examples(
--> 354                     train_file, text_col=text_col, label_col=label_col)
    355 
    356             train_dataset = self.get_dataset_from_examples(

~/virtualenvs/anaconda3/envs/pytorch/lib/python3.7/site-packages/fast_bert/data_cls.py in get_train_examples(self, filename, text_col, label_col, size)
    230             data_df = pd.read_csv(os.path.join(self.data_dir, filename))
    231 
--> 232             return self._create_examples(data_df, "train", text_col=text_col, label_col=label_col)
    233         else:
    234             data_df = pd.read_csv(os.path.join(self.data_dir, filename))

~/virtualenvs/anaconda3/envs/pytorch/lib/python3.7/site-packages/fast_bert/data_cls.py in _create_examples(self, df, set_type, text_col, label_col)
    286         else:
    287             return list(df.apply(lambda row: InputExample(guid=row.index, text_a=row[text_col],
--> 288                                                           label=_get_labels(row, label_col)), axis=1))
    289 
    290 

~/virtualenvs/anaconda3/envs/pytorch/lib/python3.7/site-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, result_type, args, **kwds)
   6926             kwds=kwds,
   6927         )
-> 6928         return op.get_result()
   6929 
   6930     def applymap(self, func):

~/virtualenvs/anaconda3/envs/pytorch/lib/python3.7/site-packages/pandas/core/apply.py in get_result(self)
    184             return self.apply_raw()
    185 
--> 186         return self.apply_standard()
    187 
    188     def apply_empty_result(self):

~/virtualenvs/anaconda3/envs/pytorch/lib/python3.7/site-packages/pandas/core/apply.py in apply_standard(self)
    290 
    291         # compute the result using the series generator
--> 292         self.apply_series_generator()
    293 
    294         # wrap results

~/virtualenvs/anaconda3/envs/pytorch/lib/python3.7/site-packages/pandas/core/apply.py in apply_series_generator(self)
    319             try:
    320                 for i, v in enumerate(series_gen):
--> 321                     results[i] = self.f(v)
    322                     keys.append(v.name)
    323             except Exception as e:

~/virtualenvs/anaconda3/envs/pytorch/lib/python3.7/site-packages/fast_bert/data_cls.py in <lambda>(row)
    286         else:
    287             return list(df.apply(lambda row: InputExample(guid=row.index, text_a=row[text_col],
--> 288                                                           label=_get_labels(row, label_col)), axis=1))
    289 
    290 

~/virtualenvs/anaconda3/envs/pytorch/lib/python3.7/site-packages/fast_bert/data_cls.py in _get_labels(row, label_col)
    273         def _get_labels(row, label_col):
    274             if isinstance(label_col, list):
--> 275                 return list(row[label_col])
    276             else:
    277                 # create one hot vector of labels

~/virtualenvs/anaconda3/envs/pytorch/lib/python3.7/site-packages/pandas/core/series.py in __getitem__(self, key)
   1111             key = check_bool_indexer(self.index, key)
   1112 
-> 1113         return self._get_with(key)
   1114 
   1115     def _get_with(self, key):

~/virtualenvs/anaconda3/envs/pytorch/lib/python3.7/site-packages/pandas/core/series.py in _get_with(self, key)
   1153             # handle the dup indexing case (GH 4246)
   1154             if isinstance(key, (list, tuple)):
-> 1155                 return self.loc[key]
   1156 
   1157             return self.reindex(key)

~/virtualenvs/anaconda3/envs/pytorch/lib/python3.7/site-packages/pandas/core/indexing.py in __getitem__(self, key)
   1422 
   1423             maybe_callable = com.apply_if_callable(key, self.obj)
-> 1424             return self._getitem_axis(maybe_callable, axis=axis)
   1425 
   1426     def _is_scalar_access(self, key: Tuple):

~/virtualenvs/anaconda3/envs/pytorch/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1837                     raise ValueError("Cannot index with multidimensional key")
   1838 
-> 1839                 return self._getitem_iterable(key, axis=axis)
   1840 
   1841             # nested tuple slicing

~/virtualenvs/anaconda3/envs/pytorch/lib/python3.7/site-packages/pandas/core/indexing.py in _getitem_iterable(self, key, axis)
   1131         else:
   1132             # A collection of keys
-> 1133             keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
   1134             return self.obj._reindex_with_indexers(
   1135                 {axis: [keyarr, indexer]}, copy=True, allow_dups=True

~/virtualenvs/anaconda3/envs/pytorch/lib/python3.7/site-packages/pandas/core/indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
   1090 
   1091         self._validate_read_indexer(
-> 1092             keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing
   1093         )
   1094         return keyarr, indexer

~/virtualenvs/anaconda3/envs/pytorch/lib/python3.7/site-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
   1175                 raise KeyError(
   1176                     "None of [{key}] are in the [{axis}]".format(
-> 1177                         key=key, axis=self.obj._get_axis_name(axis)
   1178                     )
   1179                 )

KeyError: ("None of [Index(['CLASS_1', 'CLASS_2', 'CLASS_3', 'CLASS_4', 'CLASS_5', 'CLASS_6',\n       'CLASS_7', 'CLASS_8', 'CLASS_9', 'CLASS_10', 'CLASS_11', 'CLASS_12',\n       'CLASS_13', 'CLASS_14', 'CLASS_15', 'CLASS_16', 'CLASS_17', 'CLASS_E',\n       'CLASS_V'],\n      dtype='object')] are in the [index]", 'occurred at index 0')

What is the issue?

Issue Analytics

State:
Created 4 years ago
Comments:5

Top GitHub Comments

1reaction

greecehalfcommented, Jan 4, 2020

Could you tell me how to match features with label_cols?

0reactions

mohammedayub44commented, Apr 23, 2020

@Habiba05 For me it was a class mismatch, the steps on the main page are a little unclear. So, following through the “sample_notebooks/new-toxic-multilabel.ipynb” jupyter notebook and “sample_data/” made more sense.