question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Problems after upgrading to 2.6.1

See original GitHub issue

Describe the bug

Loading a dataset_dict from disk with load_from_disk is now creating a KeyError "length" that was not occurring in v2.5.2.

Context:

  • Each individual dataset in the dict is created with Dataset.from_pandas
  • The dataset_dict is create from a dict of Datasets, e.g., `DatasetDict({“train”: train_ds, “validation”: val_ds})
  • The pandas dataframe, besides text columns, has a column with a dictionary inside and potentially different keys in each row. Correctly the Dataset.from_pandas function adds key: None to all dictionaries in each row so that the schema can be correctly inferred.

Steps to reproduce the bug

Steps to reproduce:

  • Upgrade to datasets==2.6.1
  • Create a dataset from pandas dataframe with Dataset.from_pandas
  • Create a dataset_dict from a dict of Datasets, e.g., `DatasetDict({“train”: train_ds, “validation”: val_ds})
  • Save to disk with the save function

Expected behavior

Same as in v2.5.2, that is load from disk without errors

Environment info

  • datasets version: 2.6.1
  • Platform: Linux-5.4.209-129.367.amzn2int.x86_64-x86_64-with-glibc2.26
  • Python version: 3.9.13
  • PyArrow version: 9.0.0
  • Pandas version: 1.5.1

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
d-v-dleecommented, Dec 8, 2022

im getting the same error.

  • using the base AWS HF container that uses a datasets <2.
  • updating the AWS HF container to use dataset 2.4
0reactions
cgpeltiercommented, Dec 16, 2022

Same here, running on our SageMaker pipelines. It’s only happening for some but not all of our saved Datasets.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Upgrading to 2.6.1 problems - Google Groups
I believe it has something to do with maven and dependencies getting screwed up. So here is the deal, I have a project...
Read more >
What's new in the updates for macOS Monterey - Apple Support
macOS Monterey 12.3. 1 includes bug fixes and security updates for your Mac. This update fixes the following issues: USB-C or Thunderbolt ...
Read more >
Spring boot application fails to start after upgrading to 2.6.0 ...
java - Spring boot application fails to start after upgrading to 2.6. 0 due to circular dependency[ unresolvable circular reference] - Stack ...
Read more >
[SOLVED] Problems in Mojave with Isadora 2.6.1
I have a patch that was made in Isadora 2.6.1 running macOS Sierra and used to run with no problems. I was forced...
Read more >
Troubleshooting Upgrades | pfSense Documentation
If cosmetic problems occur after performing an upgrade, this is nearly always due to stale browser cache entries for CSS, JavaScript, or other ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found