Automatically add filename for image/audio folder
See original GitHub issueFeature request
When creating a custom audio of image dataset, it would be great to automatically have access to the filename. It should be both:
a) Automatically displayed in the viewer
b) Automatically added as a column to the dataset when doing load_dataset
In diffusers
our test rely quite heavily on images and audio files now and it’s a bit tedious at the moment to download specific images from a datasets repo.
E.g. we have a dataset of images for tests in diffusers
: https://huggingface.co/datasets/hf-internal-testing/diffusers-images
where it would be extremely nice to have direct access to the filename both visually on the datasets page (@severo ) as well as via the load_datasets
function. We currently have some akward functionality to download images by path name: https://github.com/huggingface/diffusers/blob/2fb8fafa4b761f6fc144cf75a6f6f0ea6af3a1c1/src/diffusers/utils/testing_utils.py#L131
It would be much nicer to just go over load_dataset(...)
Motivation
Intuitively the filename is something people understand directly. E.g if you upload a folder of images online, it’s nice if you recognize the image as well as the filename next to it directly and that you’re able to use it right away.
The label on the other hand is less intuitive to understand as you haven’t added it yourself.
Your contribution
Not sure if I have the time to add it myself anytime soon, but it would help us a lot for diffusers
.
Issue Analytics
- State:
- Created a year ago
- Comments:10 (10 by maintainers)
Top GitHub Comments
Yes I think the relative path as you proposed makes a lot of sense 😃
Yea I agree it’s often the wrong default. We can also imagine adding the builder’s parameters as YAML in the repo.