Passing numpy array to ClassLabel names causes ValueError
See original GitHub issueDescribe the bug
If a numpy array is passed to the names argument of ClassLabel, creating a dataset with those features causes an error.
Steps to reproduce the bug
https://colab.research.google.com/drive/1cV_es1PWZiEuus17n-2C-w0KEoEZ68IX
TLDR:
If I define my classes as:
my_classes = np.array(['one', 'two', 'three'])
Then this errors:
features = Features({'value': Value('string'), 'label': ClassLabel(names=my_classes)})
dataset = Dataset.from_list(my_data, features=features)
ValueError Traceback (most recent call last)
[<ipython-input-8-a8a9d53ec82f>](https://localhost:8080/#) in <module>
----> 1 dataset = Dataset.from_list(my_data, features=features)
11 frames
[/usr/local/lib/python3.8/dist-packages/datasets/utils/py_utils.py](https://localhost:8080/#) in _asdict_inner(obj)
183 for f in fields(obj):
184 value = _asdict_inner(getattr(obj, f.name))
--> 185 if not f.init or value != f.default or f.metadata.get("include_in_asdict_even_if_is_default", False):
186 result[f.name] = value
187 return result
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
But this works:
features2 = Features({'value': Value('string'), 'label': ClassLabel(names=list(my_classes))})
dataset2 = Dataset.from_list(my_data, features=features2)
Expected behavior
If I provide a numpy array of class names, I would expect either an error that the names list is the wrong type, or for it to be cast internally.
Environment info
datasets
version: 2.7.1- Platform: Linux-5.15.0-56-generic-x86_64-with-glibc2.10
- Python version: 3.8.15
- PyArrow version: 10.0.1
- Pandas version: 1.5.2
Additionally:
- Numpy version: 1.23.5
Issue Analytics
- State:
- Created 10 months ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
ValueError: Data must be passed in as a list of numpy arrays
I am running the simple code below however running into the error with the value error. Then I tried to fix it with...
Read more >Main classes - Hugging Face
name (str) — Column name. column (list or np.array) — Column data to be added. Returns ... You can define a sharded dataset...
Read more >sklearn.preprocessing.MultiLabelBinarizer
Set to True if output binary array is desired in CSR sparse format. ... A common mistake is to pass in a list,...
Read more >Release 0.9.7 Snorkel Team
Raises ValueError – If a specified standard metric is not found in the METRICS ... Parameters *y – A list of np.ndarray of...
Read more >Subclassing ndarray — NumPy v1.24 Manual
Besides the additional complexities of subclassing a NumPy array, ... class C(np.ndarray): pass >>> # create a standard ndarray >>> arr = np.zeros((3,)) ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi! No, I don’t think so. The
names
parameter is annotated asList[str]
(NumPy arrays are not lists), and considering that type checking is not a common practice in Python, I think we can leave the code as-is.What about checking for
Sequence
instead? I think users can pass a list or a tuple as well.