question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Implement ability to define splits in metadata section of dataset card

See original GitHub issue

Feature request

If you go here: https://huggingface.co/datasets/inria-soda/tabular-benchmark/tree/main you will see bunch of folders that has various CSV files. I’d like dataset viewer to show these files instead of only one dataset like it currently does. (and also people to be able to load them as splits instead of loading through data_files) e.g GLUE has various splits on viewer but it’s too overkill to ask people to implement loading script, so it would be better to let them define these in the README file instead.

Also pinging @polinaeterna @lhoestq @adrinjalali

Issue Analytics

  • State:open
  • Created 10 months ago
  • Reactions:3
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

3reactions
polinaeternacommented, Nov 30, 2022

@merveenoyan ignore my comment above, I’m switching to this task now 😄

2reactions
lhoestqcommented, Nov 9, 2022

We can add new metadata yaml field (say, “custom_configs_info”), so that we can provide smth like:

Love it ! Some other ideas to name the “custom_configs_info” field: “configs”, “parameters”, “config_args”, “configurations”

it might require changes in interaction with the viewer on the hub side - to parse these configurations, as they not default configurations (not in BUILDER_CONFIGS list)

If we update the get_dataset_config_names() function in datasets in inspect.py we should be fine - that’s what the viewer is using

Overall, I would start from implementing the first solution since it’s related to what I’m doing now and is super useful for datasets in general. And then if we agree that having more flexibility in providing parameters to the viewer is required, I can implement the second one. Let me know what you think 😃

Actually I feel like the second solution includes the first use case you mentioned. If you implement the second solution, then users would just have to add a few lines of YAML and their directories would be considered configurations no ? Maybe there’s no need to implement two different logics to do the same thing

Read more comments on GitHub >

github_iconTop Results From Across the Web

Create a dataset card - Hugging Face
Fill out the dataset card sections to the best of your ability. ... You can use the dataset_info YAML fields to define additional...
Read more >
huggingface_datasets/ADD_NEW_DATASET.md at master ... - GitHub
Open a new online dataset card form to fill out: you will be able to download it ... configurations and/or splits (usually at...
Read more >
Split Single Dataset into Multiple DataSets based on Condition
Defining different types of Datastores (Source and destination data stores) · Use data store and system configurations · Defining file ...
Read more >
Q&A Flashcards - Quizlet
A. The benefit of analyzing the metadata is that you can clearly identify data inconsistences with your dataset. B. The benefit of analyzing...
Read more >
Advanced Tool Development Topics - Planemo - Read the Docs
One possible implementation for tests is as follows (sections with ... Galaxy Pull Request #538 implemented the ability to define nested output collections....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found