Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Deployments stalling when using "parallel" keyword

See original GitHub issue

Describe the bug We are running into an issue in runway version 1.15.0 where deployments will stall on the following log message when trying to use parallel or parallel_regions to deploy to us-east-1 and us-west-2. This always results in the deployment timing out unless it is manually aborted.

NOTICE:runway.core.components.deployment:deployment_1:processing deployment (in progress)
VERBOSE:runway.core.components.deployment:deployment_1:attempting to deploy to region(s): us-east-1, us-west-2
INFO:runway.core.components.deployment:deployment_1:processing regions in parallel... (output will be interwoven)

Does anyone know why this might be happening?

Issue Analytics

State:
Created 3 years ago
Comments:8 (5 by maintainers)

Top GitHub Comments

1reaction

ITProKylecommented, Feb 22, 2021

I did a bit more digging into the root cause of the output being delayed during parallel execution. There are a few factors contributing to it that I will try to explain in detail with a tl;dr at the end for those that don’t care about the inner workings of python.

Prior to python 3.7, when creating a subprocess on a POSIX it would fork the parent process (using functionality built into the OS). This copies everything that is currently in memory to each subprocess such as logging configuration. However, Windows does not have the same functionality so each subprocess spawns a new instance of python which does not carry over anything from the parent process. Starting with python 3.7, the default action when creating a new subprocess on Windows and macOS is to spawn a new process while other POSIX systems should still default to fork.

In Runway 1.10.0 we restructured some of the code to consolidate some code that was inherited from another codebase into the Runway code base. Prior to this, logging was being setup for CloudFormation modules each time a module was run. While inefficient, it hid the above mentioned fork vs spawn differences.

As for solving the issue, there are a few things to consider:

Reverting the changes made to 1.10.0, we would lose quite a bit of functionality. IMO, this is not an option.
There is an option to specifically tell python use fork when creating new process. However, this still does not support Windows and requires python >= 3.7. Since Runway 1.x supports python < 3.7, it can’t be used until the next major release. (mp_context arg [source])
When implementing the solution in the next major release, we could force all child processes to be spawn rather than fork to handle them all uniformly. However, this would break our binary releases. [source]

The most viable workaround (for now) would be to force all child processes to use fork (unless on Windows) in the next major release since we are already dropping support for python < 3.7 as part of it. This should solve for most cases, except for Windows users who will continue to have the same delayed output.

With the workaround in place, we can put focus on improving our support of multiprocessing/multithreading to devise a long term solution that would solve for all platforms (#298). While forking processes on POSX works it has its draw backs especially when trying to do more complex multiprocess actions.

tl;dr

Runway 1.10.0 moved some code that was hiding this issue with python multiprocessing and how we are trying to use it
python >= 3.7 changes some defaults for multiprocessing making the above point more prevalent on newer versions of python

1reaction

gogineni99commented, Jan 18, 2021

Understood. Update to docs mentioning that parallel_regions works only for CloudFormation would help users.

Top Results From Across the Web

How we used parallel CI/CD jobs to increase our productivity

How to make frontend-fixtures a parallel job. Fortunately, GitLab CI provides an easy way to run a job in parallel using the parallel...

Stream Stall Analysis - 2022.2 English - Xilinx

In the Performance Metrics view, select Stream Stall Time (%) to view stream stalls across all tiles. Identify the tile(s) to be analyzed....

Irregular Parallelism - an overview | ScienceDirect Topics

Cilk started with two keywords and a simple concept: the asynchronous function call. Such a call, marked with the keyword cilk_spawn, is like...

ZooKeeper Administrator's Guide

At Yahoo!, ZooKeeper is usually deployed on dedicated RHEL boxes, with ... be set using Java system properties, generally of the form zookeeper.keyword....

Experimental Investigation of Spoiler Deployment on Wing Stall

From this study, it was found that as the spoiler is deflected, the change in lift between the airfoil with the spoiler compared...