Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add caching to load_dag

See original GitHub issue

load_dag is an expensive operation and many times it’s called with the same argument, so we should cache the result

here’s the implementation:

https://github.com/ploomber/ploomber/blob/2e0d764c8f914ba21480b275c545ea50ff800513/src/ploomber/jupyter/manager.py#L107

so every time the function is called, we should save the starting_dir value, and the returned path:

https://github.com/ploomber/ploomber/blob/2e0d764c8f914ba21480b275c545ea50ff800513/src/ploomber/jupyter/manager.py#L120

next time the function is called, check if starting_dir is the same and the file in path hasn’t changed. if so, skip the function’s body

add tests here: https://github.com/ploomber/ploomber/blob/master/tests/test_jupyter.py#L901

Issue Analytics

State:
Created 2 years ago
Comments:10 (5 by maintainers)

Top GitHub Comments

1reaction

edublancascommented, Jan 4, 2022

Hey @kugiyasan, sure. It was more an issue on my end since I didn’t consider the complete picture, and my original description wasn’t clear enough.

Users can create Ploomber pipelines in different ways (from directories, Python modules, or YAML files). Each one has its details, so let’s restrict this feature on YAML, since it’s by far the most common.

My original description wasn’t clear, so let me add more context.

First, we have two parameters: starting_dir and path. If starting_dir changes from one run to the other, we refresh, and I see that you implemented, so that’s ok. However, for “path”, we look at the file’s contents (e.g., if “path” is path/to/pipeline.yaml, we store the content of it, and in the next run, we check if the content has changed), but in your PR, I see you’re looking at all the files in starting_dir.
Second, there are two cases where looking at the content of “path” is not enough: the pipeline.yaml may be using and env.yaml, which we also need to keep track of; secondly, if it uses the import_tasks_from directive, we also need to keep track of the file.

So if you’re up for the challenge we can work on this in two parts: first, update your PR so we cover the points described in 1), then, you can work no covering the edge cases for 2)

1reaction

edublancascommented, Jan 1, 2022

There’s a PR open https://github.com/ploomber/ploomber/pull/396

but I found a few edge cases, so it’s pending merge

Top Results From Across the Web

Caching strategies - Amazon ElastiCache - AWS Documentation

In the rest of this section, we discuss common cache maintenance strategies and their advantages and disadvantages. Topics. Lazy loading; Write-through; Adding ......

5 Loading Data Into a Cache - Oracle Help Center

Loading the cache can be made much more efficient by using the putAll method ... Also add the classes and files related to...

NGINX Content Caching | NGINX Plus

To enable caching, include the proxy_cache_path directive in the top‑level ... Loading the whole cache at once could consume sufficient resources to slow ......

Scripting API: Caching - Unity - Manual

Returns true if Caching system is ready for use. Static Methods. AddCache, Add a cache with the given path. ClearAllCachedVersions, Removes all the...

How to load @Cache on startup in spring? - Stack Overflow

Just use the cache as before, add a scheduler to update cache, code snippet is below. @Service public class CacheScheduler { @Autowired ...