Provide complete filepath to is_valid_file in make_dataset rather than only the filename
See original GitHub issue🚀 The feature
(First issue/feature request, tried my best to follow the guidelines, apologies if I missed something).
In torchvision.datasets.folder.make_dataset
, we are given the option to use is_valid_file
(or extensions
).
My feature request is to allow is_valid_file
to get the whole path to the file rather than just the filename.
Motivation, pitch
Currently, if we wish to use is_valid_file
, we can only act on the filename without getting the whole path to the file, which means it’s currently particularly tricky to open the file and verify whether it meets certain criteria and decide whether it’s valid.
Perhaps this was not the intended function initially, but it seems there’s an opportunity of improving the possibilities of is_valid_file
by providing it the whole path rather than just the filename.
My exact implementation idea would be the following: Replace the following snippet:
for root, _, fnames in sorted(os.walk(target_dir, followlinks=True)):
for fname in sorted(fnames):
if is_valid_file(fname):
path = os.path.join(root, fname)
item = path, class_index
instances.append(item)
with:
for root, _, fnames in sorted(os.walk(target_dir, followlinks=True)):
for fname in sorted(fnames):
path = os.path.join(root, fname)
if is_valid_file(path):
item = path, class_index
instances.append(item)
This change might break retro-compatibility for users who make use of is_valid_file
, however their fix would be particularly simple as they could add the following line in their is_valid_file
function:
root, fname= os.path.split(path)
where path
is the variable for is_valid_file
and they could continue using fname
in their function (or however they’ve named it) as previously.
Alternatives
No response
Additional context
One example of how this could be useful is if one wants to use pictures which are meeting certain resolution, channels, or other criteria, in a configurable way, where the criteria could be coded in the is_valid_file
function, without having to delete or move files.
Issue Analytics
- State:
- Created 2 years ago
- Comments:10 (7 by maintainers)
Top GitHub Comments
The commit you pushed included this information: https://github.com/pytorch/vision/pull/4885/commits/6f825951964d48b3c247a1f3063da1b9d91ebc27. It seems the email you used for that, was not recognized by GitHub. So when we merged the PR, GitHub assumed that the author of the PR is the main author and all other authors are assumed to be co-authors. To resolve this in the future, you can link your email in the email settings here on GitHub.
That being said, AFAIK it is not possible to use
git
without supplying a name and an email. For the name you can simply supply and alias and use some nonsense data for the email. Doing the latter will probably prevent you from linking it to GitHub though.@kotchin You are good to go. Just merged. We were all waiting for the final test to run. Sometimes the CI is slow 😃