question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Provide complete filepath to is_valid_file in make_dataset rather than only the filename

See original GitHub issue

🚀 The feature

(First issue/feature request, tried my best to follow the guidelines, apologies if I missed something).

In torchvision.datasets.folder.make_dataset, we are given the option to use is_valid_file (or extensions).

My feature request is to allow is_valid_file to get the whole path to the file rather than just the filename.

Motivation, pitch

Currently, if we wish to use is_valid_file, we can only act on the filename without getting the whole path to the file, which means it’s currently particularly tricky to open the file and verify whether it meets certain criteria and decide whether it’s valid.

Perhaps this was not the intended function initially, but it seems there’s an opportunity of improving the possibilities of is_valid_file by providing it the whole path rather than just the filename.

My exact implementation idea would be the following: Replace the following snippet:

for root, _, fnames in sorted(os.walk(target_dir, followlinks=True)):
            for fname in sorted(fnames):
                if is_valid_file(fname):
                    path = os.path.join(root, fname)
                    item = path, class_index
                    instances.append(item)

with:

for root, _, fnames in sorted(os.walk(target_dir, followlinks=True)):
            for fname in sorted(fnames):
            path = os.path.join(root, fname)
                if is_valid_file(path):
                    item = path, class_index
                    instances.append(item)

This change might break retro-compatibility for users who make use of is_valid_file, however their fix would be particularly simple as they could add the following line in their is_valid_file function: root, fname= os.path.split(path) where path is the variable for is_valid_file and they could continue using fname in their function (or however they’ve named it) as previously.

Alternatives

No response

Additional context

One example of how this could be useful is if one wants to use pictures which are meeting certain resolution, channels, or other criteria, in a configurable way, where the criteria could be coded in the is_valid_file function, without having to delete or move files.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:10 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
pmeiercommented, Nov 12, 2021

The commit you pushed included this information: https://github.com/pytorch/vision/pull/4885/commits/6f825951964d48b3c247a1f3063da1b9d91ebc27. It seems the email you used for that, was not recognized by GitHub. So when we merged the PR, GitHub assumed that the author of the PR is the main author and all other authors are assumed to be co-authors. To resolve this in the future, you can link your email in the email settings here on GitHub.

That being said, AFAIK it is not possible to use git without supplying a name and an email. For the name you can simply supply and alias and use some nonsense data for the email. Doing the latter will probably prevent you from linking it to GitHub though.

1reaction
datumboxcommented, Nov 9, 2021

@kotchin You are good to go. Just merged. We were all waiting for the final test to run. Sometimes the CI is slow 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

<input type='file'> for IE gives full path, need file name only
IE by defaults gives the full path along with the file name, and it causes issue while ... if you post by xhr...
Read more >
File path formats on Windows systems | Microsoft Learn
In this article, learn about file path formats on Windows systems, such as traditional DOS paths, DOS device paths, and universal naming ...
Read more >
Source code for torchvision.datasets.folder - PyTorch
Args: filename (string): path to a file extensions (tuple of strings): extensions to consider (lowercase) Returns: bool: True if the filename ends with...
Read more >
os.path — Common pathname manipulations — Python 3.11 ...
The result is an object of the same type, if a path or file name is returned. ... and ismount() now return False...
Read more >
System.FilePath - Hackage
FilePath. Contents. Separator predicates; $PATH methods; Extension functions; Filename/directory functions; Drive functions; Trailing slash functions; File ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found