question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

FYI: pandas can fail to read participants.tsv

See original GitHub issue
/tmp/ds001868 > cat participants.tsv          
participant_id	age	sex
sub-ecog01	38	m%                                                                                      

/tmp/ds001868 > python -c 'from bids import BIDSLayout; b=BIDSLayout(".", derivatives=False); b.get_collections(level="dataset")'                                                                                               Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/pandas/core/arrays/categorical.py", line 345, in __init__
    codes, categories = factorize(values, sort=True)
  File "/usr/lib/python3/dist-packages/pandas/util/_decorators.py", line 178, in wrapper
    return func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/pandas/core/algorithms.py", line 630, in factorize
    na_value=na_value)
  File "/usr/lib/python3/dist-packages/pandas/core/algorithms.py", line 476, in _factorize_array
    na_value=na_value)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_labels
TypeError: unhashable type: 'dict'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/yoh/proj/bids/pybids/bids/layout/layout.py", line 850, in get_collections
    sampling_rate=sampling_rate)
  File "/home/yoh/proj/bids/pybids/bids/variables/entities.py", line 92, in get_collections
    nodes = self.get_nodes(unit, entities)
  File "/home/yoh/proj/bids/pybids/bids/variables/entities.py", line 161, in get_nodes
    rows = rows.sort_values(sort_cols)
  File "/usr/lib/python3/dist-packages/pandas/core/frame.py", line 4414, in sort_values
    na_position=na_position)
  File "/usr/lib/python3/dist-packages/pandas/core/sorting.py", line 207, in lexsort_indexer
    c = Categorical(key, ordered=True)
  File "/usr/lib/python3/dist-packages/pandas/core/arrays/categorical.py", line 347, in __init__
    codes, categories = factorize(values, sort=False)
  File "/usr/lib/python3/dist-packages/pandas/util/_decorators.py", line 178, in wrapper
    return func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/pandas/core/algorithms.py", line 630, in factorize
    na_value=na_value)
  File "/usr/lib/python3/dist-packages/pandas/core/algorithms.py", line 476, in _factorize_array
    na_value=na_value)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_labels
TypeError: unhashable type: 'dict'

/tmp/ds001868 > apt-cache policy python3-pandas
python3-pandas:
  Installed: 0.23.3+dfsg-3
  Candidate: 0.23.3+dfsg-3
  Version table:
 *** 0.23.3+dfsg-3 900
        900 http://http.debian.net/debian buster/main amd64 Packages
        900 http://http.debian.net/debian buster/main i386 Packages
        600 http://http.debian.net/debian sid/main amd64 Packages
        600 http://http.debian.net/debian sid/main i386 Packages
        100 /var/lib/dpkg/status

edit 1: additional sample ds001810
(git)smaug:/mnt/btrfs/datasets/datalad/crawl/openneuro/ds001810[master]
$> python -c 'from bids import BIDSLayout; b=BIDSLayout(".", derivatives=False); b.get_collections(level="dataset")'
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/pandas/core/arrays/categorical.py", line 345, in __init__
    codes, categories = factorize(values, sort=True)
  File "/usr/lib/python3/dist-packages/pandas/util/_decorators.py", line 178, in wrapper
    return func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/pandas/core/algorithms.py", line 630, in factorize
    na_value=na_value)
  File "/usr/lib/python3/dist-packages/pandas/core/algorithms.py", line 476, in _factorize_array
    na_value=na_value)
  File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_labels
TypeError: unhashable type: 'dict'

...
$> head -n 2 participants.tsv
participant_id  tDCS_on_first_day       gender  age
sub-01  anodal  female  20

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:14

github_iconTop GitHub Comments

1reaction
yarikopticcommented, Sep 13, 2019

Can you verify that, @yarikoptic?

I can confirm that current master (0.9.2-66-g6751eec AKA 0.9.3-48-g6751eec since 0.9.3 was not annotated) no longer blows, thanks!

1reaction
effigiescommented, Aug 14, 2019

Agreed that we don’t call the validator, but when there are conditions that we can identify as only arising from invalid data, then raising an exception that basically says “Go run the validator for more details” could be useful.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Error when reading tsv file to Pandas Dataframe
I am trying to read a TSV file to Pandas Dataframe and I am getting an error. Error tokenizing data. C error: Expected...
Read more >
Optimized ways to Read Large CSVs in Python - Medium
Problem: Importing (reading) a large CSV file leads Out of Memory error. Not enough RAM to read the entire CSV at once crashes...
Read more >
Simple Ways to Read TSV Files in Python - GeeksforGeeks
Method 1: Using Pandas. We will read data from TSV file using pandas read_csv(). Along with the TSV file, we also pass separator...
Read more >
pandas.read_table — pandas 1.5.2 documentation
Default behavior is to infer the column names: if no names are passed the behavior is identical to header=0 and column names are...
Read more >
Pandas Read TSV with Examples
In this pandas article, I will explain how to read a TSV file with or without a header, skip rows, skip columns, set...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found