question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

When I ran with CPU it worked fine, but after install tensorflow-gpu, I got the error below. Perhaps need to share sessions across MPI processes? When I set num_cpu to 1, it worked fine.

2017-07-25 21:11:16.630413: E tensorflow/core/common_runtime/direct_session.cc:138] Internal: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimary
CtxRetain: CUDA_ERROR_OUT_OF_MEMORY; total memory reported: 11711807488
Traceback (most recent call last):
  File "run_atari.py", line 54, in <module>
    main()
  File "run_atari.py", line 51, in main
    train('PongNoFrameskip-v4', num_timesteps=40e6, seed=0, num_cpu=8)
  File "run_atari.py", line 23, in train
    sess = U.single_threaded_session()
  File "/home/ben/Documents/baselines/baselines/common/tf_util.py", line 233, in single_threaded_session
    return make_session(1)
  File "/home/ben/Documents/baselines/baselines/common/tf_util.py", line 228, in make_session
    return tf.Session(config=tf_config)
  File "/home/ben/miniconda3/envs/gym/lib/python3.5/site-packages/tensorflow/python/client/session
.py", line 1292, in __init__
    super(Session, self).__init__(target, graph, config=config)
  File "/home/ben/miniconda3/envs/gym/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 562, in __init__
    self._session = tf_session.TF_NewDeprecatedSession(opts, status)
  File "/home/ben/miniconda3/envs/gym/lib/python3.5/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/home/ben/miniconda3/envs/gym/lib/python3.5/site-packages/tensorflow/python/framework/erro
rs_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:1
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
andrewliao11commented, Jul 26, 2017

add the following snippet into make_session in tf_util.py

# add gpu growth flags
tf_config.gpu_options.allow_growth = True
tf_config.gpu_options.per_process_gpu_memory_fraction = 0.1
0reactions
olegklimovcommented, Feb 5, 2018

Well no, it’s not actually supposed to use the same GPU from several MPI workers. More that each MPI should use its own GPU on multi-GPU machine or multi-machine MPI.

Read more comments on GitHub >

github_iconTop Results From Across the Web

What is a PPO? Understanding PPO Insurance Plans - Humana
PPO stands for preferred provider organization. Just like an HMO, or health maintenance organization, a PPO plan offers a network of ...
Read more >
HMO vs. PPO vs. FFS Health Insurance: What's the Difference?
PPO plans offer more flexibility than HMO plans, but they come with higher costs. FFS plans allow you to see any doctor you'd...
Read more >
How Does My Out of Pocket Maximum Work? - eHealth
A copayment is an out of pocket payment that you make towards typical medical costs like doctor's office visits or an emergency room...
Read more >
HMO vs. PPO Insurance Plans - Medical Mutual
Differences between HMO (Health Maintenance Organization) and PPO (Preferred Provider Organization) plans include network size, ability to see specialists, ...
Read more >
PPO is using too much GPU memory - RLlib - Ray.io
Me: Hey guys,currently trying to optimise PPO memory usage on GPU with ... But if I increase sgd_minibatch_size I get a CUDA OOM...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found