question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[feature request] Import MPI only when needed

See original GitHub issue

Describe the bug Only four (DDPG, GAIL, TRPO and PPO1) of the twelve algorithms implemented in Stable Baselines use MPI. However, importing stable_baselines – even if you do not use any of the algorithms – indirectly executes from mpi4py import MPI.

This has a number of issues:

  • mpi4py has a binary dependency on OpenMPI, which often needs to be installed especially.
  • In some installations of OpenMPI, any code that uses MPI must be executed under mpirun, even if you only use a single process.
  • OpenMPI is unreliable, and interacts particularly poorly with multi-threading (e.g. TensorFlow) and multi-processing (e.g. SubprocVecEnv).

Code example I’ve run into many issues with OpenMPI over time. Never tracked down the root cause satisfactorily. As a recent example, https://github.com/HumanCompatibleAI/imitation/blob/master/src/imitation/scripts/train.py deadlocked inside OpenMPI on Ubuntu 18.04 with OpenMPI 2. It worked fine on a recent Mac OS X install. Previously I’ve also had issues with OpenMPI 4, and found OpenMPI 3 seems to be the most reliable version.

@shwang @qxcv may have more details on this failure mode.

Suggested Resolution The two main options I see are:

  • Change stable_baselines/__init__.py to not automatically import all the algorithms. This seems the cleanest, but would introduce a breaking API change: users would need to do e.g. from stable_baselines.ppo1 import PPO1 rather than from stable_baselines import PPO1. It would speed up importing stable_baselines though, which right now takes a while.
  • Change all files that use MPI to import it lazily. This should be fairly easily: most the time it’s only used in one function in each file. It would make things a bit more fragile: if mpi4py wasn’t installed, you would only get an error when you actually run the algorithm.

I’m happy to open a PR on this if there’s agreement on if/how to resolve this.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:6

github_iconTop GitHub Comments

2reactions
megsanocommented, Jun 28, 2020

Hello! I’m encountering a similar issue. I am using PPO1 and GAIL, so I need mpi. I installed OpenMPI and ran pip install stable-baselines[mpi]. However running import stable_baselines in my Python interpreter leads to hanging. When I run import stable_baselines after uninstalling mpi4py, I am able to import successfully.

I am using TF 1.15 and Debian stretch. Any ideas for how to fix this would be appreciated. Thanks!

0reactions
megsanocommented, Jul 2, 2020

Does from mpi4py import MPI on its own work?

This hangs as well. I’ll ask on their forums, thanks.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Tutorial — MPI for Python 3.1.4 documentation
Wrapping with F2Py. Try it in the Python prompt: >>> from mpi4py import MPI >>> import helloworld >>> fcomm = MPI. COMM_WORLD.
Read more >
FAQ: Running MPI jobs - Open MPI
What prerequisites are necessary for running an Open MPI job? In general, Open MPI requires that its executables are in your PATH on...
Read more >
MPI for Python - Read the Docs
This document describes the MPI for Python package. MPI for Python provides Python bindings for the Message.
Read more >
Introduction to Parallel Programming with MPI - GitHub Pages
In MPI for Python (mpi4py), the initialization and finalization of MPI are handled by the library, and the user can perform MPI calls...
Read more >
Using SLURM and MPI(4PY): Cannot allocate requested ...
In your update question you have in your slurm.conf the line NodeName=desktop-comp CPUs=1 State=UNKNOWN. This tells slurm that you have only ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found