question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Do we need to implement `_prepare_split`?

See original GitHub issue

Describe the bug

I’m not sure this is a bug or if it’s just missing in the documentation, or i’m not doing something correctly, but I’m subclassing DatasetBuilder and getting the following error because on the DatasetBuilder class the _prepare_split method is abstract (as are the others we are required to implement, hence the genesis of my question):

Traceback (most recent call last):
  File "/home/jason/source/python/prism_machine_learning/examples/create_hf_datasets.py", line 28, in <module>
    dataset_builder.download_and_prepare()
  File "/home/jason/.virtualenvs/pml/lib/python3.8/site-packages/datasets/builder.py", line 704, in download_and_prepare
    self._download_and_prepare(
  File "/home/jason/.virtualenvs/pml/lib/python3.8/site-packages/datasets/builder.py", line 793, in _download_and_prepare
    self._prepare_split(split_generator, **prepare_split_kwargs)
  File "/home/jason/.virtualenvs/pml/lib/python3.8/site-packages/datasets/builder.py", line 1124, in _prepare_split
    raise NotImplementedError()
NotImplementedError

Steps to reproduce the bug

I will share implementation if it turns out that everything should be working (i.e. we only need to implement those 3 methods the docs mention), but I don’t want to distract from the original question.

Expected behavior

I just need to know if there are additional methods we need to implement when subclassing DatasetBuilder besides what the documentation specifies -> _info, _split_generators and _generate_examples

Environment info

  • datasets version: 2.4.0
  • Platform: Linux-5.4.0-135-generic-x86_64-with-glibc2.2.5
  • Python version: 3.8.12
  • PyArrow version: 7.0.0
  • Pandas version: 1.4.1

Issue Analytics

  • State:closed
  • Created 9 months ago
  • Comments:11 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
mariosaskocommented, Dec 15, 2022

the requirement of a loading script has always seemed counterintuitive to me

This is a requirement only for datasets not stored in standard formats such as CSV, JSON, SQL, Parquet, ImageFolder, etc.

if i have to provide a script with every dataset, what is the point of using datasets if we’re doing all the work of loading it, I can just do that in my code and skip the datasets integration (this of course discounts other potential benefits around metadata management, etc., my example is just simplest use case though for the sake of discussion)

Our README/documentation lists the main features…

One of the main ones is that our library makes it easy to work with datasets larger than RAM (thanks to Arrow and the caching mechanism), and this is not trivial to implement.

Regarding the step-by-step builder, this is the pattern:

from datasets import load_dataset_builder
builder = load_dataset_builder("path/to/script") # or direct instantiation with MyDatasetBuilder(...)
builder.download_and_prepare()
dset = builder.as_dataset()
1reaction
mariosaskocommented, Dec 12, 2022

cc @stevhliu who may have some ideas on how to improve this part of the docs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Cook Split Peas + Our Best Split Pea Recipes!
Split peas are most often cooked in another recipe (like Split Pea and Ham soup), but you can also cook split peas "straight"...
Read more >
How & Why You Wash Split Peas For Cooking - YouTube
In this episode of Food FAQ Chris will explain why you should wash dried split peas before cooking and he'll share two simple...
Read more >
Split Pea Soup Recipe (Stovetop, Crockpot, Instant Pot)
Dried split peas – it's a common misconception to believe split peas need to be soaked before cooking, no soaking is necessary. But...
Read more >
Split Pea Soup Recipe - Allrecipes
These are the ingredients you'll need to make this top-rated split ... Yes, you can freeze split pea soup with ham for up...
Read more >
Guide to Split Peas & Lentils | Whole Foods Market
Like beans, lentils and split peas are low in fat and high in protein and fiber, and they have the added advantage ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found