Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

"Out of memory" error. Can anything be done?

See original GitHub issue

rs train and rs predict fail with CUDA error: out of memory.

I have a nVidia GeForce GT 710 which is admittedly pretty low end. The specs say it has 2GB of memory, but nvidia-smi only says 978MB (~1 “giga” byte) 🤔.

When I set batch_size = 1, and images_size = 256 (and in rs download I download the 256x256 (ie. no @2x suffix)), I still get the same error. It takes a few seconds before I get a python error of RuntimeError: CUDA error: out of memory``, rather than ~1 second, so it feels like it's lasting more before OOMing. But it still fails. This happens on rs trainandrs predict`.

Is there any way to make robosat use less memory so that I can at least run this on my GPU rather than my CPU? Or must I just accept that my hardware isn’t good enough? I know very little about graphics cards, cuda, or torch, or computer vision stuff.

I run it on my CPU with by installing torch with pip install --upgrade https://download.pytorch.org/whl/cpu/torch-0.4.0-cp36-cp36m-linux_x86_64.whl, and setting cuda = false in model.toml. It works, and for tiny areas, I can get results in a few hours. But I’d love if it could be faster.

$ nvidia-smi
Wed Oct 10 18:21:07 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.87                 Driver Version: 390.87                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GT 710      Off  | 00000000:01:00.0 N/A |                  N/A |
| 40%   38C    P8    N/A /  N/A |    177MiB /   978MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
+-----------------------------------------------------------------------------+

and

$ nvcc 
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

My OS is Ubuntu 18.04.

Issue Analytics

State:
Created 5 years ago
Comments:10

Top GitHub Comments

1reaction

amandasauruscommented, Oct 19, 2018

I had “success” with splitting the tiles up. With a batch=1, I was able to run rs train on my CPU and the speed was about 6sec per 512×512 tile. By splitting it 4 times, I got 32×32 images, and was just about able to squeeze that into 1GB of memory. It was ~930MB IIRC (batch=1 ofc). I got 0.6s per tile then. 10 tiles faster! But there are 64 times more images… So ~6½ times slower! 🤦

I can run rs predict using the graphics card, and that’s faster than the CPU, and only takes ~300MB of memory on 512×512 images, and does ~20 tiles per second. So I can train on CPU, but predict on GPU.

I have noticed that I have a laptop with a 2GB nvidia card. I’ll try that over the karlsruhe hack weekend.

0reactions

amandasauruscommented, Oct 20, 2018

For those interested, here is the script to split tiles. Call it like ./split_tiles.sh ./tiles/ 18.

#! /bin/bash

set -o nounset
set -o errexit

TILEDIR=$(realpath $1)
ORIG_ZOOM=$2
NEW_ZOOM=$(( $ORIG_ZOOM + 1 ))

if [[ ! -d $TILEDIR/$ORIG_ZOOM ]] ; then
    exit
fi

NUM_FILES=$(find $TILEDIR/$ORIG_ZOOM -mindepth 2 -maxdepth 2 -type f | wc -l)
find $TILEDIR/$ORIG_ZOOM -mindepth 2 -maxdepth 2 ! -name '*sub*' -type f -printf "%P\n" | while read TILE ; do
    X=${TILE%%/*}
    Y=${TILE##*/}
    Y=${Y%%.*}
    mkdir -p $TILEDIR/$NEW_ZOOM/$(($X* 2))/
    mkdir -p $TILEDIR/$NEW_ZOOM/$(($X*2+1))/
    # Imagemagick can do -crop 50%x50% image_%d.png which writes 4 files, but
    # that doesn't work with webp
    convert $TILEDIR/$ORIG_ZOOM/$TILE +repage -crop 50%x50% $TILEDIR/$ORIG_ZOOM/$X/${Y}_sub%d.png 2>/dev/null
    mv $TILEDIR/$ORIG_ZOOM/$X/${Y}_sub0.png $TILEDIR/$NEW_ZOOM/$(($X*2))/$(($Y*2)).png || (ls -lh $TILEDIR/$ORIG_ZOOM/$X/ ; exit 1)
    mv $TILEDIR/$ORIG_ZOOM/$X/${Y}_sub1.png $TILEDIR/$NEW_ZOOM/$(($X*2))/$(($Y*2+1)).png || (ls -lh $TILEDIR/$ORIG_ZOOM/$X/ ; exit 1)
    mv $TILEDIR/$ORIG_ZOOM/$X/${Y}_sub2.png $TILEDIR/$NEW_ZOOM/$(($X*2+1))/$(($Y*2)).png || (ls -lh $TILEDIR/$ORIG_ZOOM/$X/ ; exit 1)
    mv $TILEDIR/$ORIG_ZOOM/$X/${Y}_sub3.png $TILEDIR/$NEW_ZOOM/$(($X*2+1))/$(($Y*2+1)).png || (ls -lh $TILEDIR/$ORIG_ZOOM/$X/ ; exit 1)
    rm -v $TILEDIR/$ORIG_ZOOM/$TILE
done | pv -l -s $NUM_FILES -N "splttiing $ORIG_ZOOM->$NEW_ZOOM" >/dev/null
find $TILEDIR/$ORIG_ZOOM -type d -empty -delete