ValueError: Found input variables with inconsistent numbers of samples: [27321, 27223]
See original GitHub issueThe number of predictions of model.eval_model
compared to the inserted data does not fit. This also corresponds to model.predict
(and here may be the cause of the error).
That is: I provide a pandas dataframe
e.g. of length 12 to the function model.eval_model
(or text data to model.predict
) and receive an output of length 10. This is pretty weird. I, however, do use the classification_report
in the args, but I am missing the ‘O
’-label, so I wanted to calculate the report myself. I am using the most recent version of the library. Besides, using verbose=True
does not get the classification_report
to get printed in a Jupyter Notebook. And here comes the code:
# lis is a list in the form required for the NER-class of the library. [[101, 'word1', 'label-1'],[...]]
df_prepared = pd.DataFrame(lis, columns=['sentence_id', 'words', 'labels'])
print(len(df_prepared))
df_prepared.head()
#%%
train_df, eval_df = train_test_split(df_prepared, test_size=0.1, shuffle=False)
#%%
# Create a NERModel
model = NERModel('bert', 'bert-base-german-cased', args={'overwrite_output_dir': True, 'reprocess_input_data': True,
'num_train_epochs': 5, 'classification_report' : True, 'use_cached_eval_features' : False},
labels=list(set(train_df.labels)))
# Train the model
model.train_model(train_df, eval_df=eval_df)
#%%
# Evaluate the model
result, model_outputs, predictions = model.eval_model(eval_df, verbose=True)
print(result)
predictions_flat = [item for sublist in predictions for item in sublist]
print(classification_report(eval_df['labels'].tolist(), predictions_flat))
#%%
## Another way to calculate input for classification_report, but does fail the same way. The error must be somewhere in model.eval_model or predict:
id = eval_df.iloc[0].sentence_id
sent_lis = list()
sents = list()
for row in eval_df.itertuples():
if row.sentence_id == id:
sent_lis.append(row.words)
else:
id = row.sentence_id
sents.append(' '.join(sent_lis))
sent_lis = list()
sent_lis.append(row.words)
preds, model_outputs=model.predict(sents)
predictions_flat = [list(item.values())[0] for sublist in preds for item in sublist]
Error:
-------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-19-cc4cd7440094> in <module>
----> 1 print(classification_report(eval_df.labels.tolist(), predictions_flat))
2
3
~/anaconda3/envs/pytorch_1.3/lib/python3.7/site-packages/sklearn/metrics/_classification.py in classification_report(y_true, y_pred, labels, target_names, sample_weight, digits, output_dict, zero_division)
1965 """
1966
-> 1967 y_type, y_true, y_pred = _check_targets(y_true, y_pred)
1968
1969 labels_given = True
~/anaconda3/envs/pytorch_1.3/lib/python3.7/site-packages/sklearn/metrics/_classification.py in _check_targets(y_true, y_pred)
78 y_pred : array or indicator matrix
79 """
---> 80 check_consistent_length(y_true, y_pred)
81 type_true = type_of_target(y_true)
82 type_pred = type_of_target(y_pred)
~/anaconda3/envs/pytorch_1.3/lib/python3.7/site-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)
210 if len(uniques) > 1:
211 raise ValueError("Found input variables with inconsistent numbers of"
--> 212 " samples: %r" % [int(l) for l in lengths])
213
214
ValueError: Found input variables with inconsistent numbers of samples: [27321, 27223]
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (3 by maintainers)
Top Results From Across the Web
Found input variables with inconsistent numbers of samples ...
Fairly new to Python but building out my first RF model based on some classification data. I've converted all of the labels into...
Read more >Found input variables with inconsistent numbers of samples ...
I have the music.csv dataset. The error is Found input variables with inconsistent numbers of samples: [4, 1]. The error details is after...
Read more >ValueError Found input variables with inconsistent numbers of ...
I am trying to create one Machine Learning model using LinearRegression model, but I am getting ... with inconsistent numbers of samples: [1 ......
Read more >Valueerror: Found Input Variables With Inconsistent - ADocLib
Specifying this generative model for each label is the main piece of the training of such a Bayesian classifier. The general version of...
Read more >ValueError: Found input variables with inconsistent numbers ...
How can I fix this error it throws? ValueError: Found input variables with inconsistent numbers of samples:[143, 426]
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
There does seem to be an issue with certain characters like these when using the
NERModel
. It’s possibly related to how the tokenization happens. I’ll see if I can do something about this.Same problem, any updates now?