Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cross-platform Memory.cache to tempfs

See original GitHub issue

Currently it is not possible use RAM / tempfs as the cache directory for the Memory class in a cross-platform way.

The corresponding logic is embedded within the MemmapingPool class,

        if temp_folder is None:
            temp_folder = os.environ.get('JOBLIB_TEMP_FOLDER', None)
        if temp_folder is None:
            if os.path.exists(SYSTEM_SHARED_MEM_FS):
                try:
                    temp_folder = SYSTEM_SHARED_MEM_FS
                    pool_folder = os.path.join(temp_folder, pool_folder_name)
                    if not os.path.exists(pool_folder):
                        os.makedirs(pool_folder)
                    use_shared_mem = True
                except IOError:
                    # Missing rights in the the /dev/shm partition,
                    # fallback to regular temp folder.
                    temp_folder = None
        if temp_folder is None:
            # Fallback to the default tmp folder, typically /tmp
            temp_folder = tempfile.gettempdir()

and cannot be easily applied to the Memory class.

Maybe it could be worth moving this chunk of code into a separate private function, that could also be called for some pre-defined values of the cachedir Memory parameter (for instance cachedir=True, since cachedir=None is already taken)?

Issue Analytics

State:
Created 7 years ago
Comments:9 (9 by maintainers)

Top GitHub Comments

1reaction

lestevecommented, Mar 21, 2017

Well probably not a huge performance difference, more avoiding writing to disk when it’s not necessary (also with each of my stored objects in the 1-20 MB range, after loading these objects from cache several times due to @pytest.mark.parametrize that can become a non negligible amount of I/O).

This may be premature optimization but it’s hard to tell without numbers. Not sure this applies to your use case but it could help to use module-level fixtures that would allow to load your objects once per test module rather than once per test function.

True, but then there is also the case when /dev/shm is not writable for some reason, which the original function handles.

Fair enough I missed the non-writable part. Feel free to open a PR with a test if you feel like it. Probably keeping the function private as mentioned @GaelVaroquaux, i.e. starting with an underscore, is more appropriate.

0reactions

rthcommented, Mar 21, 2017

Just curious why not use a cachedir in /tmp that lives for the duration of your tests ? Do you see a big performance difference ?

Well probably not a huge performance difference, more avoiding writing to disk when it’s not necessary (also with each of my stored objects in the 1-20 MB range, after loading these objects from cache several times due to @pytest.mark.parametrize that can become a non negligible amount of I/O).

Also I am guessing that you are not that interested by using the JOBLIB_TEMP_FOLDER environment variable for your use case, right? Not sure it is worth adding a function for this in joblib if that is all you want to do.

True, but then there is also the case when /dev/shm is not writable for some reason, which the original function handles.

It any case it’s a non issue (I can just use that code), more an observation that it was a shame to have that logic embedded within the MemmapingPool. Thanks for your response, I understand better why there is no such option in Memory by default. Closing this issue…