question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

HER, out of memory using 3 or more CPUs and one GPU

See original GitHub issue

Running the following HER command on my machine (Ubuntu 16.04, Tensorflow 1.5.0, one Titan X GPU, Python 3.5.2, latest version of baselines as of today, etc.) seems to work:

(py3-tensorflow) daniel@computer-name:~/baselines$ python -m baselines.her.experiment.train --num_cpu 2
2018-03-11 10:42:00.828727: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-03-11 10:42:00.833988: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-03-11 10:42:01.035000: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:895] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-03-11 10:42:01.035688: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties: 
name: TITAN X (Pascal) major: 6 minor: 1 memoryClockRate(GHz): 1.531
pciBusID: 0000:01:00.0
totalMemory: 11.91GiB freeMemory: 10.72GiB
2018-03-11 10:42:01.035702: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: TITAN X (Pascal), pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-03-11 10:42:01.036552: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:895] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-03-11 10:42:01.036967: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties: 
name: TITAN X (Pascal) major: 6 minor: 1 memoryClockRate(GHz): 1.531
pciBusID: 0000:01:00.0
totalMemory: 11.91GiB freeMemory: 10.60GiB
2018-03-11 10:42:01.036979: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: TITAN X (Pascal), pci bus id: 0000:01:00.0, compute capability: 6.1)
Logging to /tmp/openai-2018-03-11-10-42-01-211699
Logging to /tmp/openai-2018-03-11-10-42-01-238422

after this the statistics and logs are reported which make sense and indicate improved performance.

I noticed while that was running, the nvidia-smi command shows that there are two python commands running but one use far more GPU memory than the other:

(py3-tensorflow) daniel@computer-name:~/baselines$ nvidia-smi 
Sun Mar 11 10:43:42 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.25                 Driver Version: 390.25                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN X (Pascal)    Off  | 00000000:01:00.0  On |                  N/A |
| 29%   52C    P2    74W / 250W |  12035MiB / 12194MiB |     34%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     15162      G   /usr/lib/xorg/Xorg                           625MiB |
|    0     15613      G   compiz                                       371MiB |
|    0     16006      G   /usr/lib/firefox/firefox                       2MiB |
|    0     16308      C   ...l/seita-venvs/py3-tensorflow/bin/python   547MiB |
|    0     16309      C   ...l/seita-venvs/py3-tensorflow/bin/python 10449MiB |
|    0     18716      G   /usr/lib/firefox/firefox                       2MiB |
+-----------------------------------------------------------------------------+
(py3-tensorflow) daniel@computer-name:~/baselines$ python -m baselines.her.experiment.train --num_cpu 3
2018-03-11 10:43:55.864451: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-03-11 10:43:55.872111: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-03-11 10:43:55.872111: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-03-11 10:43:56.149303: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:895] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-03-11 10:43:56.149754: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties: 
name: TITAN X (Pascal) major: 6 minor: 1 memoryClockRate(GHz): 1.531
pciBusID: 0000:01:00.0
totalMemory: 11.91GiB freeMemory: 10.68GiB
2018-03-11 10:43:56.149813: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: TITAN X (Pascal), pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-03-11 10:43:56.153272: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:895] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-03-11 10:43:56.153800: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties: 
name: TITAN X (Pascal) major: 6 minor: 1 memoryClockRate(GHz): 1.531
pciBusID: 0000:01:00.0
totalMemory: 11.91GiB freeMemory: 10.44GiB
2018-03-11 10:43:56.153829: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: TITAN X (Pascal), pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-03-11 10:43:56.153863: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:895] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-03-11 10:43:56.154212: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties: 
name: TITAN X (Pascal) major: 6 minor: 1 memoryClockRate(GHz): 1.531
pciBusID: 0000:01:00.0
totalMemory: 11.91GiB freeMemory: 10.44GiB
2018-03-11 10:43:56.154239: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: TITAN X (Pascal), pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-03-11 10:43:56.333249: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 224.44M (235339776 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
Logging to /tmp/openai-2018-03-11-10-43-56-333366
Logging to /tmp/openai-2018-03-11-10-43-56-335464
Logging to /tmp/openai-2018-03-11-10-43-56-358276

and the out of memory error causes the program to abort.

I assumed naively that I could fix this by adjusting the ddpg.py file in HER:

    def _create_network(self, reuse=False):
        logger.info("Creating a DDPG agent with action space %d x %s..." % (self.dimu, self.max_u))

        #self.sess = tf.get_default_session()
        # Add these instead of the default session
        config = tf.ConfigProto()
        config.gpu_options.per_process_gpu_memory_fraction = 0.2 
        self.sess = tf.Session(config=config)

        if self.sess is None:
            self.sess = tf.InteractiveSession()

Unfortunately this does not seem to work due to un-initialized variables. (I can post the full error message if it helps.)

The closest existing issue seems to be this one https://github.com/openai/baselines/issues/70 but where @olegklimov suggests that “it’s [PPO] supposed to use the same GPU from several MPI workers. More that each MPI should use its own GPU on multi-GPU machine or multi-machine MPI.” but I only have one GPU on this machine, and I’m not sure if there are subtle differences with PPO vs HER implementations.

Any advice would be appreciated. Thanks!

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:8 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
DanielTakeshicommented, Mar 11, 2018

Thanks for the information.

It might be useful to add to the HER README the machine and specs that OpenAI uses to run these commands.

0reactions
matthiasplappertcommented, Mar 26, 2018

I’ve updated the HER readme. We used D15v2 instances on Azure for all experiments.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Change the Memory Allocated to a Graphics Card
To change the amount of memory allocated to the onboard video card, you must change settings in the system BIOS. To enter the...
Read more >
Top 5 Fixes to "out of Video Memory Trying to Allocate a Texture"
Ran into “out of video memory trying to allocate a texture”? Read the post where you will find some quick fixes to this...
Read more >
"System is out of GPU memory" On New Well-Capable ...
Render using CUDA instead of OptiX for this scene. The scene fits into main memory but not VRAM. Blender has the ability to...
Read more >
How much GPU memory do I need? | Digital Trends
Dedicated GPUs have their own memory, called VRAM, and not having enough VRAM can cause performance problems. How much you need depends on ......
Read more >
How to Use Less Memory, GPU, and CPU With Steam
Steam is a popular gaming client with a lot of features. All those features use a lot of memory, CPU, and GPU. Here's...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found