Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

FYI: pandas can fail to read participants.tsv

See original GitHub issue

/tmp/ds001868 > cat participants.tsv          
participant_id	age	sex
sub-ecog01	38	m%                                                                                      

/tmp/ds001868 > python -c 'from bids import BIDSLayout; b=BIDSLayout(".", derivatives=False); b.get_collections(level="dataset")'                                                                                               Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/pandas/core/arrays/categorical.py", line 345, in __init__
    codes, categories = factorize(values, sort=True)
  File "/usr/lib/python3/dist-packages/pandas/util/_decorators.py", line 178, in wrapper
    return func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/pandas/core/algorithms.py", line 630, in factorize
    na_value=na_value)
  File "/usr/lib/python3/dist-packages/pandas/core/algorithms.py", line 476, in _factorize_array
    na_value=na_value)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_labels
TypeError: unhashable type: 'dict'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/yoh/proj/bids/pybids/bids/layout/layout.py", line 850, in get_collections
    sampling_rate=sampling_rate)
  File "/home/yoh/proj/bids/pybids/bids/variables/entities.py", line 92, in get_collections
    nodes = self.get_nodes(unit, entities)
  File "/home/yoh/proj/bids/pybids/bids/variables/entities.py", line 161, in get_nodes
    rows = rows.sort_values(sort_cols)
  File "/usr/lib/python3/dist-packages/pandas/core/frame.py", line 4414, in sort_values
    na_position=na_position)
  File "/usr/lib/python3/dist-packages/pandas/core/sorting.py", line 207, in lexsort_indexer
    c = Categorical(key, ordered=True)
  File "/usr/lib/python3/dist-packages/pandas/core/arrays/categorical.py", line 347, in __init__
    codes, categories = factorize(values, sort=False)
  File "/usr/lib/python3/dist-packages/pandas/util/_decorators.py", line 178, in wrapper
    return func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/pandas/core/algorithms.py", line 630, in factorize
    na_value=na_value)
  File "/usr/lib/python3/dist-packages/pandas/core/algorithms.py", line 476, in _factorize_array
    na_value=na_value)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_labels
TypeError: unhashable type: 'dict'

/tmp/ds001868 > apt-cache policy python3-pandas
python3-pandas:
  Installed: 0.23.3+dfsg-3
  Candidate: 0.23.3+dfsg-3
  Version table:
 *** 0.23.3+dfsg-3 900
        900 http://http.debian.net/debian buster/main amd64 Packages
        900 http://http.debian.net/debian buster/main i386 Packages
        600 http://http.debian.net/debian sid/main amd64 Packages
        600 http://http.debian.net/debian sid/main i386 Packages
        100 /var/lib/dpkg/status

edit 1: additional sample ds001810

(git)smaug:/mnt/btrfs/datasets/datalad/crawl/openneuro/ds001810[master]
$> python -c 'from bids import BIDSLayout; b=BIDSLayout(".", derivatives=False); b.get_collections(level="dataset")'
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/pandas/core/arrays/categorical.py", line 345, in __init__
    codes, categories = factorize(values, sort=True)
  File "/usr/lib/python3/dist-packages/pandas/util/_decorators.py", line 178, in wrapper
    return func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/pandas/core/algorithms.py", line 630, in factorize
    na_value=na_value)
  File "/usr/lib/python3/dist-packages/pandas/core/algorithms.py", line 476, in _factorize_array
    na_value=na_value)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_labels
TypeError: unhashable type: 'dict'

...
$> head -n 2 participants.tsv
participant_id  tDCS_on_first_day       gender  age
sub-01  anodal  female  20

Issue Analytics

State:
Created 4 years ago
Comments:14

Top GitHub Comments

1reaction

yarikopticcommented, Sep 13, 2019

Can you verify that, @yarikoptic?

I can confirm that current master (0.9.2-66-g6751eec AKA 0.9.3-48-g6751eec since 0.9.3 was not annotated) no longer blows, thanks!

1reaction

effigiescommented, Aug 14, 2019

Agreed that we don’t call the validator, but when there are conditions that we can identify as only arising from invalid data, then raising an exception that basically says “Go run the validator for more details” could be useful.