question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cross-platform Memory.cache to tempfs

See original GitHub issue

Currently it is not possible use RAM / tempfs as the cache directory for the Memory class in a cross-platform way.

The corresponding logic is embedded within the MemmapingPool class,

        if temp_folder is None:
            temp_folder = os.environ.get('JOBLIB_TEMP_FOLDER', None)
        if temp_folder is None:
            if os.path.exists(SYSTEM_SHARED_MEM_FS):
                try:
                    temp_folder = SYSTEM_SHARED_MEM_FS
                    pool_folder = os.path.join(temp_folder, pool_folder_name)
                    if not os.path.exists(pool_folder):
                        os.makedirs(pool_folder)
                    use_shared_mem = True
                except IOError:
                    # Missing rights in the the /dev/shm partition,
                    # fallback to regular temp folder.
                    temp_folder = None
        if temp_folder is None:
            # Fallback to the default tmp folder, typically /tmp
            temp_folder = tempfile.gettempdir()

and cannot be easily applied to the Memory class.

Maybe it could be worth moving this chunk of code into a separate private function, that could also be called for some pre-defined values of the cachedir Memory parameter (for instance cachedir=True, since cachedir=None is already taken)?

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
lestevecommented, Mar 21, 2017

Well probably not a huge performance difference, more avoiding writing to disk when it’s not necessary (also with each of my stored objects in the 1-20 MB range, after loading these objects from cache several times due to @pytest.mark.parametrize that can become a non negligible amount of I/O).

This may be premature optimization but it’s hard to tell without numbers. Not sure this applies to your use case but it could help to use module-level fixtures that would allow to load your objects once per test module rather than once per test function.

True, but then there is also the case when /dev/shm is not writable for some reason, which the original function handles.

Fair enough I missed the non-writable part. Feel free to open a PR with a test if you feel like it. Probably keeping the function private as mentioned @GaelVaroquaux, i.e. starting with an underscore, is more appropriate.

0reactions
rthcommented, Mar 21, 2017

Just curious why not use a cachedir in /tmp that lives for the duration of your tests ? Do you see a big performance difference ?

Well probably not a huge performance difference, more avoiding writing to disk when it’s not necessary (also with each of my stored objects in the 1-20 MB range, after loading these objects from cache several times due to @pytest.mark.parametrize that can become a non negligible amount of I/O).

Also I am guessing that you are not that interested by using the JOBLIB_TEMP_FOLDER environment variable for your use case, right? Not sure it is worth adding a function for this in joblib if that is all you want to do.

True, but then there is also the case when /dev/shm is not writable for some reason, which the original function handles.

It any case it’s a non issue (I can just use that code), more an observation that it was a shame to have that logic embedded within the MemmapingPool. Thanks for your response, I understand better why there is no such option in Memory by default. Closing this issue…

Read more comments on GitHub >

github_iconTop Results From Across the Web

Does a tmpfs (/dev/shm) use Linux Page Cache? If it does ...
Since tmpfs lives completely in the page cache and on swap, all tmpfs pages will be shown as "Shmem" in /proc/meminfo and "Shared"...
Read more >
Tmpfs - The Linux Kernel documentation
Tmpfs is a file system which keeps all of its files in virtual memory. Everything in tmpfs is temporary in the sense that...
Read more >
Mount a folder into the RAM with TMPFS - WP Rocket
How to mount a folder directly into the RAM? ... A static cache system like WP Rocket stores all the website pages as...
Read more >
How to use tmpfs and SSD for smart cache - Super User
first you should be clear if you want a cache or fast storage, storage cannot be at the same time used as a...
Read more >
tmpfs - ArchWiki
tmpfs is a temporary filesystem that resides in memory and/or swap partition(s). Mounting directories as tmpfs can be an effective way of ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found