Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[feature request] Import MPI only when needed

See original GitHub issue

Describe the bug Only four (DDPG, GAIL, TRPO and PPO1) of the twelve algorithms implemented in Stable Baselines use MPI. However, importing stable_baselines – even if you do not use any of the algorithms – indirectly executes from mpi4py import MPI.

This has a number of issues:

mpi4py has a binary dependency on OpenMPI, which often needs to be installed especially.
In some installations of OpenMPI, any code that uses MPI must be executed under mpirun, even if you only use a single process.
OpenMPI is unreliable, and interacts particularly poorly with multi-threading (e.g. TensorFlow) and multi-processing (e.g. SubprocVecEnv).

Code example I’ve run into many issues with OpenMPI over time. Never tracked down the root cause satisfactorily. As a recent example, https://github.com/HumanCompatibleAI/imitation/blob/master/src/imitation/scripts/train.py deadlocked inside OpenMPI on Ubuntu 18.04 with OpenMPI 2. It worked fine on a recent Mac OS X install. Previously I’ve also had issues with OpenMPI 4, and found OpenMPI 3 seems to be the most reliable version.

@shwang @qxcv may have more details on this failure mode.

Suggested Resolution The two main options I see are:

Change stable_baselines/__init__.py to not automatically import all the algorithms. This seems the cleanest, but would introduce a breaking API change: users would need to do e.g. from stable_baselines.ppo1 import PPO1 rather than from stable_baselines import PPO1. It would speed up importing stable_baselines though, which right now takes a while.
Change all files that use MPI to import it lazily. This should be fairly easily: most the time it’s only used in one function in each file. It would make things a bit more fragile: if mpi4py wasn’t installed, you would only get an error when you actually run the algorithm.

I’m happy to open a PR on this if there’s agreement on if/how to resolve this.

Issue Analytics

State:
Created 4 years ago
Reactions:1
Comments:6

Top GitHub Comments

2reactions

megsanocommented, Jun 28, 2020

Hello! I’m encountering a similar issue. I am using PPO1 and GAIL, so I need mpi. I installed OpenMPI and ran pip install stable-baselines[mpi]. However running import stable_baselines in my Python interpreter leads to hanging. When I run import stable_baselines after uninstalling mpi4py, I am able to import successfully.

I am using TF 1.15 and Debian stretch. Any ideas for how to fix this would be appreciated. Thanks!

0reactions

megsanocommented, Jul 2, 2020