Max Workers for Parallel Runner
See original GitHub issueDescription
I am working with a very slow database connection to get data migrated to S3. The parallel runner allows me to get up to the number of cpu core connections running in parallel.
Context
I would like to be able to kick it up a notch while waiting on slow sql queries.
Possible Implementation
Parallel Runner Module
As shown int the ProcessPoolExecutor docs max_workers
can be passed into the ProcessPoolExecutor
Change Line 34 in the parallel runner module to bring in Optional.
- from typing import Any, Dict
+ from typing import Any, Dict, Optional
Change Line 59-68 in the parallel runner module
- def __init__(self):
+ def __init__(self, max_workers: Optional[int]=None):
"""Instantiates the runner by creating a Manager.
+
+ Args:
max_workers
"""
self._manager = ParallelRunnerManager()
self._manager.start()
+ self.max_workers = max_workers
Change line 170 in the parallel runner module to the following
- with ProcessPoolExecutor() as pool:
+ with ProcessPoolExecutor(max_workers=self.max_workers) as pool:
kedro_cli in template
add MAX_WORKERS_HELP
to kedro_cli line 91
+
+MAX_WORKERS_HELP = """Maximum number of process to run in parallel. If max_workers is None or not given, it will default to the number of processors on the machine. If max_workers is lower or equal to 0, then a ValueError will be raised.
In the default template add a flag to the cli in [kedro_cli lines 102-147](https://github.com/quantumblacklabs/kedro/blob/861baa5ee8dd1c8ffce3ef83ae598fa38ecb55e6/kedro/template/%7B%7B%20cookiecutter.repo_name%20%7D%7D/kedro_cli.py#L102-L147)
``` diff
@click.group(context_settings=CONTEXT_SETTINGS, name=__file__)
def cli():
"""Command line tools for manipulating a Kedro project."""
@cli.command()
@click.option("--from-nodes", type=str, default="", help=FROM_NODES_HELP)
@click.option("--to-nodes", type=str, default="", help=TO_NODES_HELP)
@click.option(
"--node",
"-n",
"node_names",
type=str,
default=None,
multiple=True,
help=NODE_ARG_HELP,
)
@click.option(
"--runner", "-r", type=str, default=None, multiple=False, help=RUNNER_ARG_HELP
)
@click.option("--parallel", "-p", is_flag=True, multiple=False, help=PARALLEL_ARG_HELP)
+ @click.option("--max-workers", type=int, default=None, multiple=False, help=MAX_WORKERS_HELP)
@click.option("--env", "-e", type=str, default=None, multiple=False, help=ENV_ARG_HELP)
@click.option("--tag", "-t", type=str, default=None, multiple=True, help=TAG_ARG_HELP)
- def run(tag, env, parallel, runner, node_names, to_nodes, from_nodes):
+ def run(tag, env, parallel, runner, max_workers, node_names, to_nodes, from_nodes):
"""Run the pipeline."""
from {{cookiecutter.python_package}}.run import main
from_nodes = [n for n in from_nodes.split(",") if n]
to_nodes = [n for n in to_nodes.split(",") if n]
if parallel and runner:
raise KedroCliError(
"Both --parallel and --runner options cannot be used together. "
"Please use either --parallel or --runner."
)
+
+ if runner and max_workers != None:
+ raise KedroCliError(
+ "Both --runner and --max-workers options cannot be used together."
+ "Please use either --parallel with --max-workers or --runner."
if parallel:
runner = "ParallelRunner"
runner_class = load_obj(runner, "kedro.runner") if runner else SequentialRunner
+
+ runner_kwargs = {}
+ if parallel and max_workers !=None:
+ runner_kwargs['max_workers'] = max_workers
main(
tags=tag,
env=env,
- runner=runner_class(),
+ runner=runner_class(**runner_kwargs), # i am not familiar with load_obj, so I am not completely sure this is correct
node_names=node_names,
from_nodes=from_nodes,
to_nodes=to_nodes,
)
Docs
04_create_pipelines.md line 727
- * `ParallelRunner` - runs your nodes in parallel; independent nodes are able to run at the same time, allowing you to take advantage of multiple CPU cores.
+ * `ParallelRunner` - runs your nodes in parallel; independent nodes are able to run at the same time, allowing you to take advantage of multiple CPU cores as set by `max_workers`.
05_nodes_and_pipelines.md line 526
+ > *Note: * the parallel runner can be called with --max-workers to set the maximum number of worker processes.
004_create_pipelines.md line 737
+ > *Note: * the parallel runner can be called with --max-workers to set the maximum number of worker processes.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:2
- Comments:6 (6 by maintainers)
Top Results From Across the Web
Parallelism and sharding | Playwright
You can control the maximum number of parallel worker processes via command line or in the configuration file. ... In the configuration file:...
Read more >Max Workers for Parallel Runner · Issue #88 · kedro ... - GitHub
Description I am working with a very slow database connection to get data migrated to S3. The parallel runner allows me to get...
Read more >Increasing max parallel workers per gather in Postgres
The first setting you're likely to be limited by is the max_parallel_workers_per_gather parameter, which is only two by default. This means that ...
Read more >What's an easy way to tell how many Playwright Tests I can ...
This command will kick off your suite of playwright tests with different workers passed in (iterating through min/max). The output will be ...
Read more >What is limiting the number of active workers in a Gradle build?
--max-workers is not only for tests execution but also for parallel project execution. In the example, however, you are limiting tests ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@WaylonWalker We have added this in https://github.com/quantumblacklabs/kedro/commit/4dc48e7b17104356934b5a7cb91c760a739fe44e (and we are releasing the new version soon). I am closing this issue but let us know if you have any questions 😃
@Flid Thanks for the heads up. I’ll make sure to revert that part. Our team will be using a custom cookiecutter based on
kedro new
with a few customizations for us. I can make the change there.I was able to easily make the change, testing it has proven tricky as my current pipeline is in
14.3
and several things seemed broken when I tried to use my branch ofkedro
on it. Nothing major just need a bit of time to make the upgrade and test it out.