Do we need to implement `_prepare_split`?
See original GitHub issueDescribe the bug
I’m not sure this is a bug or if it’s just missing in the documentation, or i’m not doing something correctly, but I’m subclassing DatasetBuilder
and getting the following error because on the DatasetBuilder
class the _prepare_split
method is abstract (as are the others we are required to implement, hence the genesis of my question):
Traceback (most recent call last):
File "/home/jason/source/python/prism_machine_learning/examples/create_hf_datasets.py", line 28, in <module>
dataset_builder.download_and_prepare()
File "/home/jason/.virtualenvs/pml/lib/python3.8/site-packages/datasets/builder.py", line 704, in download_and_prepare
self._download_and_prepare(
File "/home/jason/.virtualenvs/pml/lib/python3.8/site-packages/datasets/builder.py", line 793, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/home/jason/.virtualenvs/pml/lib/python3.8/site-packages/datasets/builder.py", line 1124, in _prepare_split
raise NotImplementedError()
NotImplementedError
Steps to reproduce the bug
I will share implementation if it turns out that everything should be working (i.e. we only need to implement those 3 methods the docs mention), but I don’t want to distract from the original question.
Expected behavior
I just need to know if there are additional methods we need to implement when subclassing DatasetBuilder
besides what the documentation specifies -> _info
, _split_generators
and _generate_examples
Environment info
datasets
version: 2.4.0- Platform: Linux-5.4.0-135-generic-x86_64-with-glibc2.2.5
- Python version: 3.8.12
- PyArrow version: 7.0.0
- Pandas version: 1.4.1
Issue Analytics
- State:
- Created 9 months ago
- Comments:11 (4 by maintainers)
Top Results From Across the Web
How to Cook Split Peas + Our Best Split Pea Recipes!
Split peas are most often cooked in another recipe (like Split Pea and Ham soup), but you can also cook split peas "straight"...
Read more >How & Why You Wash Split Peas For Cooking - YouTube
In this episode of Food FAQ Chris will explain why you should wash dried split peas before cooking and he'll share two simple...
Read more >Split Pea Soup Recipe (Stovetop, Crockpot, Instant Pot)
Dried split peas – it's a common misconception to believe split peas need to be soaked before cooking, no soaking is necessary. But...
Read more >Split Pea Soup Recipe - Allrecipes
These are the ingredients you'll need to make this top-rated split ... Yes, you can freeze split pea soup with ham for up...
Read more >Guide to Split Peas & Lentils | Whole Foods Market
Like beans, lentils and split peas are low in fat and high in protein and fiber, and they have the added advantage ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
This is a requirement only for datasets not stored in standard formats such as CSV, JSON, SQL, Parquet, ImageFolder, etc.
Our README/documentation lists the main features…
One of the main ones is that our library makes it easy to work with datasets larger than RAM (thanks to Arrow and the caching mechanism), and this is not trivial to implement.
Regarding the step-by-step builder, this is the pattern:
cc @stevhliu who may have some ideas on how to improve this part of the docs.