question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Customizing logic using Python

See original GitHub issue

Currently, there is no way to customize ploomber build behavior, but in some cases, a user may want to override the default rules (i.e. execute a task when the code changes). This is possible by loading the pipeline.yaml into Python and then modifying the DAG object. This issue discusses some use cases as well as what we need to do to officially support this.

Please comment if these examples solve your problem or if you have any other use cases we should also consider

We’ve got a few example use cases:

Skip a branch based on an input parameter

Say the pipeline looks like this:

graph LR;
    A-->B;
    A-->C;
    B-->D;
    C-->E;

In some cases, we may want to skip an entire branch (e.g. B -> D) based on an input parameter

Working example

# download example
ploomber examples -n cookbook/python-load --branch custom-logic -o python-load
cd python-load

What’s missing?

  • While this is technically possible by deleting tasks one by one (e.g. del dag['B']), there isn’t a simple way to delete an entire branch, so we should add a few handy methods (e.g., dag.delete_branch('B'))
  • Add a cookbook example

Customize caching logic

With long-running tasks, users may want to skip execution even if the code has slightly changed, or even apply custom rules, this usually happens with data ingestion tasks. It’s possible to achieve that using private APIs

Working example

# download example
ploomber examples -n cookbook/python-load --branch custom-logic -o python-load
cd python-load

What’s missing

  • We are lacking documentation on TaskStatus and currently the only way to achieve this is via the dag._params.cache_rendered_status private, we should make this a public API.
  • Add a cookbook example

TO DO:

  • Add tutorial showing how to use Ploomber’s CLI to call a factory entry point where the pipeline loads from a pipeline.yaml and env.yaml (show that cli args and cell injection work)
  • Add a link to the tutorial from the point above to the cookbook that shows how to load the dag using Python

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:13 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
edublancascommented, Apr 1, 2022

Yes, as the name suggests, --force should run everything regardless of anything. The case where we want to skip something “artificially” (that is, modifying Ploomber’s standard behavior) is highly dependent on the use case. I think it’s best to keep the current behavior (--force executes everything) and have users determine how they want to customize if needed - in 90% of the cases, the default behavior works.

For example, someone may decide to apply some very custom rules, and they can turn them on/off by adding an argument to the custom entry point:

@with_env('env.yaml')
def make(env, custom_flag=False):
    # use custom_stuff to manually determine task status (override default behavior)
    dag = DAGSpec('pipeline.yaml', env=dict(env)).to_dag()
    return dag

Then the custom_stuff becomes accessible in the CLIL ploomber build -e pipeline.make --custom-flag

The benefit of using the Python API is that you can define your logic: as you mention @mitch-at-orika, you can even write some logic that takes a JSON file and determines that task’s status based on that.

Thanks for the feedback! I see a lot of value in explaining users how to customize the DAG execution logic so I’ll add a tutorial showing all the things we’ve discussed here.

1reaction
edublancascommented, Mar 28, 2022

awesome, thanks for the feedback! I’ll work on adding a bit more details to the examples I provided, merge them to the master branch, and open some issues to tackle the use of private APIs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

correct way to add custom (deep) copying logic to a python class
i am implementing a python class that provides some nested data structure. i want to add support for copying through copy.copy() and deep ......
Read more >
Develop custom processors by using Python - Alibaba Cloud
This topic describes how to develop custom processors by using Python. The SDK for Python provided by Elastic Algorithm Service (EAS) ...
Read more >
Customize your Python class with Magic or Dunder methods
The magic methods ensure a consistent data model that retains the inherited feature of the built-in class while providing customized class ...
Read more >
How to Collect, Customize, and Centralize Python Logs
Configure a custom setup that involves multiple loggers and destinations; Incorporate exception handling and tracebacks in your logs; Format ...
Read more >
Python using FPGA Custom Logic (Part 5) | by Josh Massover
Goal (reminder): Call a function in python that uses custom logic in an fpga for its processing. Last post we were able to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found