question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Passing numpy array to ClassLabel names causes ValueError

See original GitHub issue

Describe the bug

If a numpy array is passed to the names argument of ClassLabel, creating a dataset with those features causes an error.

Steps to reproduce the bug

https://colab.research.google.com/drive/1cV_es1PWZiEuus17n-2C-w0KEoEZ68IX

TLDR:

If I define my classes as:

my_classes = np.array(['one', 'two', 'three'])

Then this errors:

features = Features({'value': Value('string'), 'label': ClassLabel(names=my_classes)})
dataset = Dataset.from_list(my_data, features=features)
ValueError                                Traceback (most recent call last)
[<ipython-input-8-a8a9d53ec82f>](https://localhost:8080/#) in <module>
----> 1 dataset = Dataset.from_list(my_data, features=features)

11 frames
[/usr/local/lib/python3.8/dist-packages/datasets/utils/py_utils.py](https://localhost:8080/#) in _asdict_inner(obj)
    183             for f in fields(obj):
    184                 value = _asdict_inner(getattr(obj, f.name))
--> 185                 if not f.init or value != f.default or f.metadata.get("include_in_asdict_even_if_is_default", False):
    186                     result[f.name] = value
    187             return result

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

But this works:

features2 = Features({'value': Value('string'), 'label': ClassLabel(names=list(my_classes))})
dataset2 = Dataset.from_list(my_data, features=features2)

Expected behavior

If I provide a numpy array of class names, I would expect either an error that the names list is the wrong type, or for it to be cast internally.

Environment info

  • datasets version: 2.7.1
  • Platform: Linux-5.15.0-56-generic-x86_64-with-glibc2.10
  • Python version: 3.8.15
  • PyArrow version: 10.0.1
  • Pandas version: 1.5.2

Additionally:

  • Numpy version: 1.23.5

Issue Analytics

  • State:open
  • Created 10 months ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
mariosaskocommented, Dec 9, 2022

Hi! No, I don’t think so. The names parameter is annotated as List[str] (NumPy arrays are not lists), and considering that type checking is not a common practice in Python, I think we can leave the code as-is.

0reactions
albertvillanovacommented, Dec 14, 2022

What about checking for Sequence instead? I think users can pass a list or a tuple as well.

Read more comments on GitHub >

github_iconTop Results From Across the Web

ValueError: Data must be passed in as a list of numpy arrays
I am running the simple code below however running into the error with the value error. Then I tried to fix it with...
Read more >
Main classes - Hugging Face
name (str) — Column name. column (list or np.array) — Column data to be added. Returns ... You can define a sharded dataset...
Read more >
sklearn.preprocessing.MultiLabelBinarizer
Set to True if output binary array is desired in CSR sparse format. ... A common mistake is to pass in a list,...
Read more >
Release 0.9.7 Snorkel Team
Raises ValueError – If a specified standard metric is not found in the METRICS ... Parameters *y – A list of np.ndarray of...
Read more >
Subclassing ndarray — NumPy v1.24 Manual
Besides the additional complexities of subclassing a NumPy array, ... class C(np.ndarray): pass >>> # create a standard ndarray >>> arr = np.zeros((3,)) ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found