Setting task class default init parameters
See original GitHub issue(note that this is related to #513)
In some cases, we want to customize how a task is initialized, for example, if we want to hide the code in an output HTML report, we can create a task like this:
tasks:
- source: fit.py
product:
nb: output/nb.html
model: output/model.pickle
# hide code
nbconvert_export_kwargs:
exclude_input: True
However, if we have multiple tasks, and want to hide the code in all the outputs, we need to pass the initialization parameters to all of them, which is too verbose. Alternatively, we can provide a way to pass default initialization parameters:
init_defaults:
# passes to all tasks
Task:
params: {a: 1}
# passes to all notebook tasks
NotebookRunner:
nbconvert_export_kwargs:
exclude_input: True
Notes
- this conflicts with
clients
, perhaps throw an error if clients appears here and tell the user to pass them in the clients section - do not allow certain arguments. For example all init methods take a dag argument but that should be allowed here
- this requires knowledge of the names of the underlying classes. is there any way to make this simpler?
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (7 by maintainers)
Top Results From Across the Web
Creating luigi parameters from other parameters at initialization
Is it possible to define this as a parameter ? Would the solution be to specify the default value as in: logpath=luigi.Parameter(default= ...
Read more >Initializing parameters based on the value of other parameters
I am trying to have default values for a task that come from the values provided in a config file and ingested. For...
Read more >Initialization — The Swift Programming Language (Swift 5.7)
The default initializer simply creates a new instance with all of its properties set to their default values. This example defines a class...
Read more >Named and Optional Arguments - C# Programming Guide
Named arguments in C# specify arguments by name, not position. ... Each optional parameter has a default value as part of its definition....
Read more >2.2. Creating and Initializing Objects: Constructors
A parameter (also called actual parameter or argument) is a value that is passed into a constructor. It can be used to initialize...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
We represent each task in
pipeline.yaml
as an object internally. For example, when you’re running a notebook, we create an instance ofNotebookRunner
. You can see all the classes here: https://docs.ploomber.io/en/latest/api/python_api.html#tasksthat’s why we have the NotebookRunner there, we’re essentially saying: I want this default parameters for all instances of NotebookRunner
for the CI issues: please open a PR, and let it fail, then ping me so I can look at the logs
Sure, so the changes need to go in
dagspec.py
andtaskspec.py
, the former deals with thepipeline.yaml
spec, while the latter with each entry in thetasks
section.Now that I think about it
task_defaults
sounds like a better name (instead ofinit_defaults
). What do you think?So, yes, we need to add a new top-level section, as a pointer, here’s where we validate the top keys in the
pipeline.yaml
:https://github.com/ploomber/ploomber/blob/beb625cc977bcd34481608a91daddc5493e0983c/src/ploomber/spec/dagspec.py#L387
Then, this is the section where we process each task entry:
https://github.com/ploomber/ploomber/blob/beb625cc977bcd34481608a91daddc5493e0983c/src/ploomber/spec/dagspec.py#L785
you’ll see that we are calling
task_dict.to_task()
-task_dict
is an instance ofTaskSpec
, here’s the definition ofto_task
:https://github.com/ploomber/ploomber/blob/beb625cc977bcd34481608a91daddc5493e0983c/src/ploomber/spec/taskspec.py#L253
this is where you want to take into account the newly added section. note that for
to_task
won’t have access to the new section, so you’ll need to modify theTaskSpec.__init__
and pass it (e.g.TaskSpec(task_defaults=...)
)you’ll see that there is one conditional, let’s focus on the first scenario for now:
https://github.com/ploomber/ploomber/blob/beb625cc977bcd34481608a91daddc5493e0983c/src/ploomber/spec/taskspec.py#L321
Inside
_init_task
, you’ll see aclass_
variable, this will contain the class of the task, so to make it generic for all tasks, you’ll need to match the class name with the sub-sections intask_defaults
, then use the information intask_defaults
to modify the call to the constructor here:https://github.com/ploomber/ploomber/blob/beb625cc977bcd34481608a91daddc5493e0983c/src/ploomber/spec/taskspec.py#L392
I think this should help you get started, but feel free to ask any questions!
I’d recommend go step-by-step:
task_defaults
as top-level section inpipeline.yaml
to_task
has access to thetask_defaults
dictionarytask_defaults
and pass them to initialize the Task objecttask_defaults
and the task is initialized with themgrid
task_defaults
validates the keys in the dictionary (they must be task classes)Once this works, we can work with the second scenario (which happens when the user passes the
grid
argument to a task entry)