Better structure dataset implementations
See original GitHub issueThe suggestion is to adjust all the datasets (e.g. speechcommands) to follow some changes done in tedlium, see example below.
- Add a
_parse_filesystem
method to extract a list of “data point identifiers” in a pre-determined order to replace the genericwalk_files
, as here and #791. - Move
_load_item
as method, as here. - Replace class attributes by constructor arguments, e.g. here.
- Remove non-standard attributes, e.g. here? or add attributes, e.g. here?
Relates to #852, GTZAN #791, tedlium #882. cc @mthrok @cpuhrsch
Issue Analytics
- State:
- Created 3 years ago
- Comments:22 (20 by maintainers)
Top Results From Across the Web
The Beginner's Guide to Structured Data for Organizing ...
Learn how to use structured data to optimize and organize your website and make your customers' lives easier.
Read more >8 Common Data Structures every Programmer must know
A quick introduction to 8 commonly used data structures. Data Structures are a specialized means of organizing and storing data in computers in ......
Read more >Intro to How Structured Data Markup Works - Google Developers
Google uses structured data markup to understand content. Explore this guide to discover how structured data works, review formats, and learn where to...
Read more >Python Implementations of Data Structures | by Jiahui Wang
Python Implementations of Data Structures. A summary of the Python implementation of stacks, queues, sets, dictionaries, linked lists, and trees.
Read more >Data Structures Used in Git Implementation. - Medium
The directory structure is represented as a tree, but commits and tags form a more complicated structure because of branching and merging. The...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@krishnakalyan3
Thanks for the feedback. Do you have any other thoughts while working on #1127?
Do you mean string “constant”? I think that makes sense, especially for “.wav” formats.
Yes. We had to do this with CommonVoice recently, and the resulting code became much simple.
For
url
, even though it is rarely used, there are multiple potential scenarios where the source becomes unavailable and the archive is re-hosted somewhere else.But there are cases that users do not have an admin privilege to modify the installed package, so it’s better if users can provide their configuration from their client code. Maybe not as a single variable but making a custom configuration type that is specific to the dataset might be a possible option.
Looking at #1127 (thanks @krishnakalyan3 ), the
folder_in_archive
should be deprecated and removed, from all the Dataset implementation.It does not provide consistent behavior with
download
+extract
behavior and, in the first place, directory structure being changed is not something library should be expecting and conforming. If user has changed the directory structure, that’s on the user, and library should not be taking care of it.