"Out of memory" error. Can anything be done?
See original GitHub issuers train and rs predict fail with CUDA error: out of memory.
I have a nVidia GeForce GT 710 which is admittedly pretty low end. The specs say it has 2GB of memory, but nvidia-smi only says 978MB (~1 “giga” byte) 🤔.
When I set batch_size = 1, and images_size = 256 (and in rs download I download the 256x256 (ie. no @2x suffix)), I still get the same error. It takes a few seconds before I get a python error of RuntimeError: CUDA error: out of memory``, rather than ~1 second, so it feels like it's lasting more before OOMing. But it still fails. This happens on rs trainandrs predict`.
Is there any way to make robosat use less memory so that I can at least run this on my GPU rather than my CPU? Or must I just accept that my hardware isn’t good enough? I know very little about graphics cards, cuda, or torch, or computer vision stuff.
I run it on my CPU with by installing torch with pip install --upgrade https://download.pytorch.org/whl/cpu/torch-0.4.0-cp36-cp36m-linux_x86_64.whl, and setting cuda = false in model.toml. It works, and for tiny areas, I can get results in a few hours. But I’d love if it could be faster.
$ nvidia-smi
Wed Oct 10 18:21:07 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.87 Driver Version: 390.87 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GT 710 Off | 00000000:01:00.0 N/A | N/A |
| 40% 38C P8 N/A / N/A | 177MiB / 978MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+
and
$ nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85
My OS is Ubuntu 18.04.
Issue Analytics
- State:
- Created 5 years ago
- Comments:10

Top Related StackOverflow Question
I had “success” with splitting the tiles up. With a batch=1, I was able to run
rs trainon my CPU and the speed was about 6sec per 512×512 tile. By splitting it 4 times, I got 32×32 images, and was just about able to squeeze that into 1GB of memory. It was ~930MB IIRC (batch=1 ofc). I got 0.6s per tile then. 10 tiles faster! But there are 64 times more images… So ~6½ times slower! 🤦I can run
rs predictusing the graphics card, and that’s faster than the CPU, and only takes ~300MB of memory on 512×512 images, and does ~20 tiles per second. So I can train on CPU, but predict on GPU.I have noticed that I have a laptop with a 2GB nvidia card. I’ll try that over the karlsruhe hack weekend.
For those interested, here is the script to split tiles. Call it like
./split_tiles.sh ./tiles/ 18.