question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ImageFolder dataset: "too many open files" error

See original GitHub issue

Hi, I am using ImageFolder dataset to train on imagenet. I repeatedly get “too many open files” OSError after training for several hours.
I suspect the issue comes from [here][1]:

def pil_loader(path):
    # open path as file to avoid ResourceWarning (https://github.com/python-pillow/Pillow/issues/835)
    with open(path, 'rb') as f:
        img = Image.open(f)
        return img.convert('RGB')  # <--

The return is inside with ... clause, thus (I suspect) the file handle f is not closed properly when the function returns. Shouldn’t the function be

def pil_loader(path):
    # open path as file to avoid ResourceWarning (https://github.com/python-pillow/Pillow/issues/835)
    with open(path, 'rb') as f:
        img = Image.open(f)
    return img.convert('RGB')  # <--  not indented

Thanks, -Shai [1]: https://github.com/pytorch/vision/blob/master/torchvision/datasets/folder.py#L156

cc @pmeier

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:11 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
vfdev-5commented, Jul 3, 2018

@shaibagon I tested the following and no error found:

from PIL import Image

def pil_loader(path):
    # open path as file to avoid ResourceWarning (https://github.com/python-pillow/Pillow/issues/835)
    with open(path, 'rb') as f:
        img = Image.open(f)
        return img.convert('RGB')

import tempfile
import os

a = None

with tempfile.TemporaryDirectory() as tmp:    
    print("Temp dir: ", tmp)
    for i in range(10000000):
        if i % 50000 == 0:
            print("{}".format(i), end=" ")
        img = Image.new('RGB', size=(10, 10))
        p = os.path.join(tmp, "test_{}.png".format(i))
        img.save(p)
        a = pil_loader(p)
1reaction
shaibagoncommented, Jul 2, 2018

@vfdev-5 I am sorry, but because of the remote environment I am working at it is very difficult to provide additional information. I am also suspecting I have not located to problem correctly.

Bottom line:

running for long time (day+) using ImageFolder data results with OSError “too many open files”.

I will do my best to shed more light on this issue. meanwhile, I don’t think there’s much to do.

Other people who stumble upon this error - please try and provide more information.

Read more comments on GitHub >

github_iconTop Results From Across the Web

pytorch Dataloader error "Too many open files" when yielding ...
I'm trying to implement a custom IterableDataset in which I read words from a file, get theirs unique id, gather them and return...
Read more >
Too many open files when using dataLoader - PyTorch Forums
Hi, When I use the data loader, I have met the following error: Too many open files. In my implementation of the Dataset, ......
Read more >
T-202/github-issues · Datasets at Hugging Face
Describe the bug Cannot load the dataset conll2012_ontonotesv5 ## Steps to reproduce ... in `Image.decode_example` to avoid the `Too many open files` error....
Read more >
Taking Datasets, DataLoaders, and PyTorch's New DataPipes ...
We define a Dataset instance how the data files are opened, ... Also, the PyTorch team aims to keep the original Dataset and...
Read more >
PyTorch ImageFolder for Training CNN Models - DebuggerCafe
DatasetFolder class. So, we can override the classes to create custom datasets as well. ... If you observe closely, this is how many...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found