question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ValueError: Found input variables with inconsistent numbers of samples: [27321, 27223]

See original GitHub issue

The number of predictions of model.eval_model compared to the inserted data does not fit. This also corresponds to model.predict (and here may be the cause of the error). That is: I provide a pandas dataframe e.g. of length 12 to the function model.eval_model (or text data to model.predict) and receive an output of length 10. This is pretty weird. I, however, do use the classification_report in the args, but I am missing the ‘O’-label, so I wanted to calculate the report myself. I am using the most recent version of the library. Besides, using verbose=True does not get the classification_report to get printed in a Jupyter Notebook. And here comes the code:

# lis is a list in the form required for the NER-class of the library. [[101, 'word1', 'label-1'],[...]]
df_prepared = pd.DataFrame(lis, columns=['sentence_id', 'words', 'labels'])
print(len(df_prepared))
df_prepared.head()

#%%

train_df, eval_df = train_test_split(df_prepared, test_size=0.1, shuffle=False)

#%%

# Create a NERModel
model = NERModel('bert', 'bert-base-german-cased', args={'overwrite_output_dir': True, 'reprocess_input_data': True,
                 'num_train_epochs': 5, 'classification_report' : True, 'use_cached_eval_features' : False},
                 labels=list(set(train_df.labels)))

# Train the model
model.train_model(train_df, eval_df=eval_df)

#%%

# Evaluate the model
result, model_outputs, predictions = model.eval_model(eval_df, verbose=True)

print(result)

predictions_flat = [item for sublist in predictions for item in sublist]
print(classification_report(eval_df['labels'].tolist(), predictions_flat))


#%%
## Another way to calculate input for classification_report, but does fail the same way. The error must be somewhere in model.eval_model or predict:
id = eval_df.iloc[0].sentence_id
sent_lis = list()
sents = list()
for row in eval_df.itertuples():
    if row.sentence_id == id:
        sent_lis.append(row.words)
    else:
        id = row.sentence_id
        sents.append(' '.join(sent_lis))
        sent_lis = list()
        sent_lis.append(row.words)

preds, model_outputs=model.predict(sents)
predictions_flat = [list(item.values())[0] for sublist in preds for item in sublist]




Error:

-------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-19-cc4cd7440094> in <module>
----> 1 print(classification_report(eval_df.labels.tolist(), predictions_flat))
      2 
      3 

~/anaconda3/envs/pytorch_1.3/lib/python3.7/site-packages/sklearn/metrics/_classification.py in classification_report(y_true, y_pred, labels, target_names, sample_weight, digits, output_dict, zero_division)
   1965     """
   1966 
-> 1967     y_type, y_true, y_pred = _check_targets(y_true, y_pred)
   1968 
   1969     labels_given = True

~/anaconda3/envs/pytorch_1.3/lib/python3.7/site-packages/sklearn/metrics/_classification.py in _check_targets(y_true, y_pred)
     78     y_pred : array or indicator matrix
     79     """
---> 80     check_consistent_length(y_true, y_pred)
     81     type_true = type_of_target(y_true)
     82     type_pred = type_of_target(y_pred)

~/anaconda3/envs/pytorch_1.3/lib/python3.7/site-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)
    210     if len(uniques) > 1:
    211         raise ValueError("Found input variables with inconsistent numbers of"
--> 212                          " samples: %r" % [int(l) for l in lengths])
    213 
    214 

ValueError: Found input variables with inconsistent numbers of samples: [27321, 27223]


Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
ThilinaRajapaksecommented, May 6, 2020

There does seem to be an issue with certain characters like these when using the NERModel. It’s possibly related to how the tokenization happens. I’ll see if I can do something about this.

0reactions
Jefffish09commented, Apr 24, 2021

Same problem, any updates now?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Found input variables with inconsistent numbers of samples ...
Fairly new to Python but building out my first RF model based on some classification data. I've converted all of the labels into...
Read more >
Found input variables with inconsistent numbers of samples ...
I have the music.csv dataset. The error is Found input variables with inconsistent numbers of samples: [4, 1]. The error details is after...
Read more >
ValueError Found input variables with inconsistent numbers of ...
I am trying to create one Machine Learning model using LinearRegression model, but I am getting ... with inconsistent numbers of samples: [1 ......
Read more >
Valueerror: Found Input Variables With Inconsistent - ADocLib
Specifying this generative model for each label is the main piece of the training of such a Bayesian classifier. The general version of...
Read more >
ValueError: Found input variables with inconsistent numbers ...
How can I fix this error it throws? ValueError: Found input variables with inconsistent numbers of samples:[143, 426]
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found