question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Setting task class default init parameters

See original GitHub issue

(note that this is related to #513)

In some cases, we want to customize how a task is initialized, for example, if we want to hide the code in an output HTML report, we can create a task like this:

tasks:
  - source: fit.py
    product:
        nb: output/nb.html
        model: output/model.pickle
    
    # hide code
    nbconvert_export_kwargs:
      exclude_input: True

However, if we have multiple tasks, and want to hide the code in all the outputs, we need to pass the initialization parameters to all of them, which is too verbose. Alternatively, we can provide a way to pass default initialization parameters:

init_defaults:
    # passes to all tasks
    Task:
        params: {a: 1}
    # passes to all notebook tasks
    NotebookRunner:
        nbconvert_export_kwargs:
            exclude_input: True

Notes

  • this conflicts with clients, perhaps throw an error if clients appears here and tell the user to pass them in the clients section
  • do not allow certain arguments. For example all init methods take a dag argument but that should be allowed here
  • this requires knowledge of the names of the underlying classes. is there any way to make this simpler?

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
edublancascommented, Aug 19, 2022

@edublancas I am working on the grid case, but I am not fully understand class meaning, in this test case, the NotebookRunner. What is it?

We represent each task in pipeline.yaml as an object internally. For example, when you’re running a notebook, we create an instance of NotebookRunner. You can see all the classes here: https://docs.ploomber.io/en/latest/api/python_api.html#tasks

that’s why we have the NotebookRunner there, we’re essentially saying: I want this default parameters for all instances of NotebookRunner

for the CI issues: please open a PR, and let it fail, then ping me so I can look at the logs

1reaction
edublancascommented, May 31, 2022

Sure, so the changes need to go in dagspec.py and taskspec.py, the former deals with the pipeline.yaml spec, while the latter with each entry in the tasks section.

Now that I think about it task_defaults sounds like a better name (instead of init_defaults). What do you think?

So, yes, we need to add a new top-level section, as a pointer, here’s where we validate the top keys in the pipeline.yaml:

https://github.com/ploomber/ploomber/blob/beb625cc977bcd34481608a91daddc5493e0983c/src/ploomber/spec/dagspec.py#L387

Then, this is the section where we process each task entry:

https://github.com/ploomber/ploomber/blob/beb625cc977bcd34481608a91daddc5493e0983c/src/ploomber/spec/dagspec.py#L785

you’ll see that we are calling task_dict.to_task() - task_dict is an instance of TaskSpec, here’s the definition of to_task:

https://github.com/ploomber/ploomber/blob/beb625cc977bcd34481608a91daddc5493e0983c/src/ploomber/spec/taskspec.py#L253

this is where you want to take into account the newly added section. note that for to_task won’t have access to the new section, so you’ll need to modify the TaskSpec.__init__ and pass it (e.g. TaskSpec(task_defaults=...))

you’ll see that there is one conditional, let’s focus on the first scenario for now:

https://github.com/ploomber/ploomber/blob/beb625cc977bcd34481608a91daddc5493e0983c/src/ploomber/spec/taskspec.py#L321

Inside _init_task, you’ll see a class_ variable, this will contain the class of the task, so to make it generic for all tasks, you’ll need to match the class name with the sub-sections in task_defaults, then use the information in task_defaults to modify the call to the constructor here:

https://github.com/ploomber/ploomber/blob/beb625cc977bcd34481608a91daddc5493e0983c/src/ploomber/spec/taskspec.py#L392

I think this should help you get started, but feel free to ask any questions!

I’d recommend go step-by-step:

  • allow task_defaults as top-level section in pipeline.yaml
  • ensure to_task has access to the task_defaults dictionary
  • take the values in task_defaults and pass them to initialize the Task object
  • test that one can set the task_defaults and the task is initialized with them
  • test that task_defaults also work with grid
  • test that task_defaults validates the keys in the dictionary (they must be task classes)

Once this works, we can work with the second scenario (which happens when the user passes the grid argument to a task entry)

Read more comments on GitHub >

github_iconTop Results From Across the Web

Creating luigi parameters from other parameters at initialization
Is it possible to define this as a parameter ? Would the solution be to specify the default value as in: logpath=luigi.Parameter(default= ...
Read more >
Initializing parameters based on the value of other parameters
I am trying to have default values for a task that come from the values provided in a config file and ingested. For...
Read more >
Initialization — The Swift Programming Language (Swift 5.7)
The default initializer simply creates a new instance with all of its properties set to their default values. This example defines a class...
Read more >
Named and Optional Arguments - C# Programming Guide
Named arguments in C# specify arguments by name, not position. ... Each optional parameter has a default value as part of its definition....
Read more >
2.2. Creating and Initializing Objects: Constructors
A parameter (also called actual parameter or argument) is a value that is passed into a constructor. It can be used to initialize...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found