question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add caching to load_dag

See original GitHub issue

load_dag is an expensive operation and many times it’s called with the same argument, so we should cache the result

here’s the implementation:

https://github.com/ploomber/ploomber/blob/2e0d764c8f914ba21480b275c545ea50ff800513/src/ploomber/jupyter/manager.py#L107

so every time the function is called, we should save the starting_dir value, and the returned path:

https://github.com/ploomber/ploomber/blob/2e0d764c8f914ba21480b275c545ea50ff800513/src/ploomber/jupyter/manager.py#L120

next time the function is called, check if starting_dir is the same and the file in path hasn’t changed. if so, skip the function’s body

add tests here: https://github.com/ploomber/ploomber/blob/master/tests/test_jupyter.py#L901

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:10 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
edublancascommented, Jan 4, 2022

Hey @kugiyasan, sure. It was more an issue on my end since I didn’t consider the complete picture, and my original description wasn’t clear enough.

Users can create Ploomber pipelines in different ways (from directories, Python modules, or YAML files). Each one has its details, so let’s restrict this feature on YAML, since it’s by far the most common.

My original description wasn’t clear, so let me add more context.

  1. First, we have two parameters: starting_dir and path. If starting_dir changes from one run to the other, we refresh, and I see that you implemented, so that’s ok. However, for “path”, we look at the file’s contents (e.g., if “path” is path/to/pipeline.yaml, we store the content of it, and in the next run, we check if the content has changed), but in your PR, I see you’re looking at all the files in starting_dir.
  2. Second, there are two cases where looking at the content of “path” is not enough: the pipeline.yaml may be using and env.yaml, which we also need to keep track of; secondly, if it uses the import_tasks_from directive, we also need to keep track of the file.

So if you’re up for the challenge we can work on this in two parts: first, update your PR so we cover the points described in 1), then, you can work no covering the edge cases for 2)

1reaction
edublancascommented, Jan 1, 2022

There’s a PR open https://github.com/ploomber/ploomber/pull/396

but I found a few edge cases, so it’s pending merge

Read more comments on GitHub >

github_iconTop Results From Across the Web

Caching strategies - Amazon ElastiCache - AWS Documentation
In the rest of this section, we discuss common cache maintenance strategies and their advantages and disadvantages. Topics. Lazy loading; Write-through; Adding ......
Read more >
5 Loading Data Into a Cache - Oracle Help Center
Loading the cache can be made much more efficient by using the putAll method ... Also add the classes and files related to...
Read more >
NGINX Content Caching | NGINX Plus
To enable caching, include the proxy_cache_path directive in the top‑level ... Loading the whole cache at once could consume sufficient resources to slow ......
Read more >
Scripting API: Caching - Unity - Manual
Returns true if Caching system is ready for use. Static Methods. AddCache, Add a cache with the given path. ClearAllCachedVersions, Removes all the...
Read more >
How to load @Cache on startup in spring? - Stack Overflow
Just use the cache as before, add a scheduler to update cache, code snippet is below. @Service public class CacheScheduler { @Autowired ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found