kubernetes executor: python3: argument list too long for complex dependency graphs
See original GitHub issueSummary
I have a re-execution of a job with a complex dynamic DAG (> 1K ops, of which i 600 ran during the first try) i’m using the kubernetes executor to launch the steps. the kubernetes executor adds a “known_state” to the list of python arguments to launch the pod:
this “known_state” object seems to be huge (3K lines) and the pod fails with:
exec /usr/bin/python3: argument list too long
Reproduction
any job with a large enough DAG will trigger this (at least in its re-execution)
Dagit UI/UX Issue Screenshots
Additional Info about Your Environment
Message from the maintainers:
Impacted by this bug? Give it a 👍. We factor engagement into prioritization.
Issue Analytics
- State:
- Created a year ago
- Reactions:6
- Comments:7 (3 by maintainers)
Top Results From Across the Web
Argument list too long application failures - Azure
Solution: Shorten the argument list. Eliminate any redundant or unnecessary arguments that you specify for the executable.
Read more >The Kubernetes executor for GitLab Runner
The Kubernetes executor, when used with GitLab CI, connects to the Kubernetes API in the cluster creating a Pod for each GitLab CI...
Read more >Spark Streaming Programming Guide
This guide shows you how to start writing Spark Streaming programs with DStreams. You can write Spark Streaming programs in Scala, Java or...
Read more >1.1.7 (core) / 0.17.7 (libraries) - Dagster Docs
[dagit] When viewing the config dialog for a run with a very long config, ... Google dependencies. dagster-gcp now supports google-api-python-client 2.x.
Read more >Concepts - Apache Airflow Documentation - Read the Docs
DAGs¶. In Airflow, a DAG – or a Directed Acyclic Graph – is a collection of all the tasks you want to run,...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @johannkm - we have upgraded dagster to v1.0.13 and attempted to run a job which includes ~600 dynamically generated ops. However, we are able to consistently replicate the error described in this issue: exec
/opt/conda/envs/user/bin/dagster
: argument list too long, where all ops fail at around the same time, both in a standalone run and via job re-execution.We have confirmed that the known_state field is represented as part of the
DAGSTER_EXECUTE_STEP_ARGS
env var as opposed to being included as in a CLI arg, in accordance with the merged fix. Can it be possible that the issue is not actually fully resolved yet / is there still a limit to how large a single workflow can be?The above fix will go out in the 1.0.16 release this Thursday