question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

LocalDatasetModuleFactoryWithoutScript extracts invalid builder name

See original GitHub issue

Describe the bug

Trying to load a local dataset raises an error indicating that the config builder has to have a name. No error should be reported, since the call is completly valid.

Steps to reproduce the bug

load_dataset("./data/some-dataset/", name="some-name")

Expected results

The dataset should be loaded.

Actual results

Traceback (most recent call last):
  File "train_lquad.py", line 19, in <module>
    load(tokenize_target_function, tokenize_target_function, {}, tokenizer)
  File "train_lquad.py", line 14, in load
    dataset = load_dataset("./data/lquad/", name="lquad")
  File "/net/pr2/scratch/people/plgapohl/python-3.8.6/lib/python3.8/site-packages/datasets/load.py", line 1708, in load_dataset                                                                           
    builder_instance = load_dataset_builder(
  File "/net/pr2/scratch/people/plgapohl/python-3.8.6/lib/python3.8/site-packages/datasets/load.py", line 1560, in load_dataset_builder                                                                   
    builder_instance: DatasetBuilder = builder_cls(
  File "/net/pr2/scratch/people/plgapohl/python-3.8.6/lib/python3.8/site-packages/datasets/builder.py", line 269, in __init__                                                                             
    self.config, self.config_id = self._create_builder_config(
  File "/net/pr2/scratch/people/plgapohl/python-3.8.6/lib/python3.8/site-packages/datasets/builder.py", line 403, in _create_builder_config                                                               
    raise ValueError(f"BuilderConfig must have a name, got {builder_config.name}")
ValueError: BuilderConfig must have a name, got

Environment info

  • datasets version: 2.2.2
  • Platform: Linux-4.18.0-348.20.1.el8_5.x86_64-x86_64-with-glibc2.2.5
  • Python version: 3.8.6
  • PyArrow version: 8.0.0
  • Pandas version: 1.4.2

The error is probably in line 795 in load.py:

 builder_kwargs = {                        
     "hash": hash,
     "data_files": data_files,
     "name": os.path.basename(self.path),
     "base_path": self.path,
     **builder_kwargs,
 }

os.path.basename for a directory returns an empty string, rather than the name of the directory.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
apohllocommented, Sep 11, 2022

@mariosasko here we go:

https://github.com/huggingface/datasets/pull/4967

TBH I haven’t tested it yet, but should work, since this is a basic change.

1reaction
apohllocommented, May 24, 2022

The fix is:

"name": os.path.basename(self.path[:-1] if self.path[-1] == "/" else self.path)
Read more comments on GitHub >

github_iconTop Results From Across the Web

Source code for datasets.load - Hugging Face
The module can be imported using its name. ... Type[Metric]]]: """Import a module at module_path and return its main class: - a DatasetBuilder...
Read more >
https://patch-diff.githubusercontent.com/raw/huggi...
For example to separate "squad" from "lhoestq/squad" (the builder name would ... A formatter is an object that extracts and formats data from...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found