Implement ability to define splits in metadata section of dataset card
See original GitHub issueFeature request
If you go here: https://huggingface.co/datasets/inria-soda/tabular-benchmark/tree/main you will see bunch of folders that has various CSV files. I’d like dataset viewer to show these files instead of only one dataset like it currently does. (and also people to be able to load them as splits instead of loading through data_files
)
e.g GLUE has various splits on viewer but it’s too overkill to ask people to implement loading script, so it would be better to let them define these in the README file instead.
Also pinging @polinaeterna @lhoestq @adrinjalali
Issue Analytics
- State:
- Created 10 months ago
- Reactions:3
- Comments:7 (7 by maintainers)
Top Results From Across the Web
Create a dataset card - Hugging Face
Fill out the dataset card sections to the best of your ability. ... You can use the dataset_info YAML fields to define additional...
Read more >huggingface_datasets/ADD_NEW_DATASET.md at master ... - GitHub
Open a new online dataset card form to fill out: you will be able to download it ... configurations and/or splits (usually at...
Read more >Split Single Dataset into Multiple DataSets based on Condition
Defining different types of Datastores (Source and destination data stores) · Use data store and system configurations · Defining file ...
Read more >Q&A Flashcards - Quizlet
A. The benefit of analyzing the metadata is that you can clearly identify data inconsistences with your dataset. B. The benefit of analyzing...
Read more >Advanced Tool Development Topics - Planemo - Read the Docs
One possible implementation for tests is as follows (sections with ... Galaxy Pull Request #538 implemented the ability to define nested output collections....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@merveenoyan ignore my comment above, I’m switching to this task now 😄
Love it ! Some other ideas to name the “custom_configs_info” field: “configs”, “parameters”, “config_args”, “configurations”
If we update the
get_dataset_config_names()
function indatasets
in inspect.py we should be fine - that’s what the viewer is usingActually I feel like the second solution includes the first use case you mentioned. If you implement the second solution, then users would just have to add a few lines of YAML and their directories would be considered configurations no ? Maybe there’s no need to implement two different logics to do the same thing