FYI: pandas can fail to read participants.tsv
See original GitHub issue/tmp/ds001868 > cat participants.tsv
participant_id age sex
sub-ecog01 38 m%
/tmp/ds001868 > python -c 'from bids import BIDSLayout; b=BIDSLayout(".", derivatives=False); b.get_collections(level="dataset")' Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/pandas/core/arrays/categorical.py", line 345, in __init__
codes, categories = factorize(values, sort=True)
File "/usr/lib/python3/dist-packages/pandas/util/_decorators.py", line 178, in wrapper
return func(*args, **kwargs)
File "/usr/lib/python3/dist-packages/pandas/core/algorithms.py", line 630, in factorize
na_value=na_value)
File "/usr/lib/python3/dist-packages/pandas/core/algorithms.py", line 476, in _factorize_array
na_value=na_value)
File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_labels
TypeError: unhashable type: 'dict'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/yoh/proj/bids/pybids/bids/layout/layout.py", line 850, in get_collections
sampling_rate=sampling_rate)
File "/home/yoh/proj/bids/pybids/bids/variables/entities.py", line 92, in get_collections
nodes = self.get_nodes(unit, entities)
File "/home/yoh/proj/bids/pybids/bids/variables/entities.py", line 161, in get_nodes
rows = rows.sort_values(sort_cols)
File "/usr/lib/python3/dist-packages/pandas/core/frame.py", line 4414, in sort_values
na_position=na_position)
File "/usr/lib/python3/dist-packages/pandas/core/sorting.py", line 207, in lexsort_indexer
c = Categorical(key, ordered=True)
File "/usr/lib/python3/dist-packages/pandas/core/arrays/categorical.py", line 347, in __init__
codes, categories = factorize(values, sort=False)
File "/usr/lib/python3/dist-packages/pandas/util/_decorators.py", line 178, in wrapper
return func(*args, **kwargs)
File "/usr/lib/python3/dist-packages/pandas/core/algorithms.py", line 630, in factorize
na_value=na_value)
File "/usr/lib/python3/dist-packages/pandas/core/algorithms.py", line 476, in _factorize_array
na_value=na_value)
File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_labels
TypeError: unhashable type: 'dict'
/tmp/ds001868 > apt-cache policy python3-pandas
python3-pandas:
Installed: 0.23.3+dfsg-3
Candidate: 0.23.3+dfsg-3
Version table:
*** 0.23.3+dfsg-3 900
900 http://http.debian.net/debian buster/main amd64 Packages
900 http://http.debian.net/debian buster/main i386 Packages
600 http://http.debian.net/debian sid/main amd64 Packages
600 http://http.debian.net/debian sid/main i386 Packages
100 /var/lib/dpkg/status
edit 1: additional sample ds001810
(git)smaug:/mnt/btrfs/datasets/datalad/crawl/openneuro/ds001810[master]
$> python -c 'from bids import BIDSLayout; b=BIDSLayout(".", derivatives=False); b.get_collections(level="dataset")'
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/pandas/core/arrays/categorical.py", line 345, in __init__
codes, categories = factorize(values, sort=True)
File "/usr/lib/python3/dist-packages/pandas/util/_decorators.py", line 178, in wrapper
return func(*args, **kwargs)
File "/usr/lib/python3/dist-packages/pandas/core/algorithms.py", line 630, in factorize
na_value=na_value)
File "/usr/lib/python3/dist-packages/pandas/core/algorithms.py", line 476, in _factorize_array
na_value=na_value)
File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_labels
TypeError: unhashable type: 'dict'
...
$> head -n 2 participants.tsv
participant_id tDCS_on_first_day gender age
sub-01 anodal female 20
Issue Analytics
- State:
- Created 4 years ago
- Comments:14
Top Results From Across the Web
Error when reading tsv file to Pandas Dataframe
I am trying to read a TSV file to Pandas Dataframe and I am getting an error. Error tokenizing data. C error: Expected...
Read more >Optimized ways to Read Large CSVs in Python - Medium
Problem: Importing (reading) a large CSV file leads Out of Memory error. Not enough RAM to read the entire CSV at once crashes...
Read more >Simple Ways to Read TSV Files in Python - GeeksforGeeks
Method 1: Using Pandas. We will read data from TSV file using pandas read_csv(). Along with the TSV file, we also pass separator...
Read more >pandas.read_table — pandas 1.5.2 documentation
Default behavior is to infer the column names: if no names are passed the behavior is identical to header=0 and column names are...
Read more >Pandas Read TSV with Examples
In this pandas article, I will explain how to read a TSV file with or without a header, skip rows, skip columns, set...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I can confirm that current master (0.9.2-66-g6751eec AKA 0.9.3-48-g6751eec since 0.9.3 was not annotated) no longer blows, thanks!
Agreed that we don’t call the validator, but when there are conditions that we can identify as only arising from invalid data, then raising an exception that basically says “Go run the validator for more details” could be useful.