Problems after upgrading to 2.6.1
See original GitHub issueDescribe the bug
Loading a dataset_dict from disk with load_from_disk
is now creating a KeyError "length"
that was not occurring in v2.5.2.
Context:
- Each individual dataset in the dict is created with
Dataset.from_pandas
- The dataset_dict is create from a dict of
Dataset
s, e.g., `DatasetDict({“train”: train_ds, “validation”: val_ds}) - The pandas dataframe, besides text columns, has a column with a dictionary inside and potentially different keys in each row. Correctly the
Dataset.from_pandas
function addskey: None
to all dictionaries in each row so that the schema can be correctly inferred.
Steps to reproduce the bug
Steps to reproduce:
- Upgrade to datasets==2.6.1
- Create a dataset from pandas dataframe with
Dataset.from_pandas
- Create a dataset_dict from a dict of
Dataset
s, e.g., `DatasetDict({“train”: train_ds, “validation”: val_ds}) - Save to disk with the
save
function
Expected behavior
Same as in v2.5.2, that is load from disk without errors
Environment info
datasets
version: 2.6.1- Platform: Linux-5.4.209-129.367.amzn2int.x86_64-x86_64-with-glibc2.26
- Python version: 3.9.13
- PyArrow version: 9.0.0
- Pandas version: 1.5.1
Issue Analytics
- State:
- Created a year ago
- Comments:7 (2 by maintainers)
Top Results From Across the Web
Upgrading to 2.6.1 problems - Google Groups
I believe it has something to do with maven and dependencies getting screwed up. So here is the deal, I have a project...
Read more >What's new in the updates for macOS Monterey - Apple Support
macOS Monterey 12.3. 1 includes bug fixes and security updates for your Mac. This update fixes the following issues: USB-C or Thunderbolt ...
Read more >Spring boot application fails to start after upgrading to 2.6.0 ...
java - Spring boot application fails to start after upgrading to 2.6. 0 due to circular dependency[ unresolvable circular reference] - Stack ...
Read more >[SOLVED] Problems in Mojave with Isadora 2.6.1
I have a patch that was made in Isadora 2.6.1 running macOS Sierra and used to run with no problems. I was forced...
Read more >Troubleshooting Upgrades | pfSense Documentation
If cosmetic problems occur after performing an upgrade, this is nearly always due to stale browser cache entries for CSS, JavaScript, or other ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
im getting the same error.
Same here, running on our SageMaker pipelines. It’s only happening for some but not all of our saved Datasets.