Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`WorkerProcess` leaks environment variables to parent process

See original GitHub issue

Since https://github.com/dask/distributed/pull/6681, WorkerProcess leaks the environment specified via the env kwarg, for example the CUDA_VISIBLE_DEVICES variable we use in Dask-CUDA.

Before https://github.com/dask/distributed/pull/6681

In [1]: import os

In [2]: from dask_cuda import LocalCUDACluster

In [3]: os.environ.get("CUDA_VISIBLE_DEVICES")

In [4]: cluster = LocalCUDACluster()
/datasets/pentschev/src/distributed/distributed/node.py:179: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 43355 instead
  warnings.warn(
2022-07-20 11:37:39,518 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-07-20 11:37:39,518 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-07-20 11:37:39,519 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-07-20 11:37:39,519 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-07-20 11:37:39,525 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-07-20 11:37:39,526 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-07-20 11:37:39,542 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-07-20 11:37:39,542 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-07-20 11:37:39,548 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-07-20 11:37:39,548 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-07-20 11:37:39,548 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-07-20 11:37:39,549 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-07-20 11:37:39,551 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-07-20 11:37:39,551 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-07-20 11:37:39,551 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-07-20 11:37:39,552 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize

In [5]: os.environ.get("CUDA_VISIBLE_DEVICES")

In [6]:

After https://github.com/dask/distributed/pull/6681

In [1]: import os

In [2]: from dask_cuda import LocalCUDACluster

In [3]: os.environ.get("CUDA_VISIBLE_DEVICES")

In [4]: cluster = LocalCUDACluster()
/datasets/pentschev/src/distributed/distributed/node.py:179: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 39759 instead
  warnings.warn(
2022-07-20 11:37:00,532 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-07-20 11:37:00,533 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-07-20 11:37:00,535 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-07-20 11:37:00,536 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-07-20 11:37:00,607 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-07-20 11:37:00,607 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-07-20 11:37:00,661 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-07-20 11:37:00,662 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-07-20 11:37:00,662 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-07-20 11:37:00,662 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-07-20 11:37:00,663 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-07-20 11:37:00,664 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-07-20 11:37:00,666 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-07-20 11:37:00,666 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-07-20 11:37:00,742 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2022-07-20 11:37:00,742 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize

In [5]: os.environ.get("CUDA_VISIBLE_DEVICES")
Out[5]: '7,0,1,2,3,4,5,6'

In [6]:

What happens now is that os.environ.updated(self.env) is called from the parent process and never reverted. One of the issues this causes is leaking environment variables between pytests. Furthermore, if multiple workers are created they may overwrite each other’s variables (I’m not sure if a cluster can create WorkerProcesses with different environment variables, so this may be a non-issue).

This problem has been discussed in length in the past in https://github.com/dask/distributed/issues/3682, which is a difficult problem to tackle from Python given any newly-spawned process must inherit environment variables from the parent process. One of the suggestions in https://github.com/dask/distributed/issues/3682#issuecomment-612078761 was to create a lock to ensure multiple workers don’t spawn simultaneously, which will likely increase a bit the spawn time but seems to be the only safe option in that situation.

Any thoughts here @crusaderky (original author of #6681)?

cc’ing @quasiben @kkraus14 @mrocklin for vis as well, who were active on the https://github.com/dask/distributed/issues/3682 discussion.

Issue Analytics

State:
Created a year ago
Comments:18 (12 by maintainers)

Top GitHub Comments

1reaction

gjoseph92commented, Jul 22, 2022

To be clear, I’m definitely not suggesting reverting and leaving it reverted. I was only suggesting reverting for today to make the release, then in the next week or two adding a different solution we’re all happy with (like @crusaderky’s proposal). It feels like a safer path to me, since we know it won’t break things for other users using env vars in similar ways, even though it would delay getting MALLOC_TRIM_THRESHOLD_ in the hands of users even longer, which I’d be sad about.

1reaction

pentschevcommented, Jul 21, 2022

I think I was now able to work around that in https://github.com/rapidsai/dask-cuda/pull/955 . I’ll just wait for confirmation until tomorrow morning, but unless some other problem emerges regarding that we should be fine with the release going out as is.

Top Results From Across the Web

Set `MALLOC_TRIM_THRESHOLD_` before interpreter start

WorkerProcess leaks environment variables to parent process #6749 ... to enable/disable setting environment variables before process startup ...

Bat, How to prevent environment variables leaking into parent ...

I thought that the environment variables should not be saved between runs. Is there any way to make the variables not leak into...

Is it possible to pass environment variables from child to ...

No, it's not possible. Not without some kind of workaround. Environment variables can only be passed from parent to child (as part of ......

Core functionality - Nginx.org

By default, nginx removes all environment variables inherited from its parent process except the TZ variable. This directive allows preserving some of the ......

A process inherits its environment from its parent, and the ...

After setting the environment variable a few more times, eventually the new values started showing up in the command prompt.” The first theory ......