question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

"Out of memory" error. Can anything be done?

See original GitHub issue

rs train and rs predict fail with CUDA error: out of memory.

I have a nVidia GeForce GT 710 which is admittedly pretty low end. The specs say it has 2GB of memory, but nvidia-smi only says 978MB (~1 “giga” byte) 🤔.

When I set batch_size = 1, and images_size = 256 (and in rs download I download the 256x256 (ie. no @2x suffix)), I still get the same error. It takes a few seconds before I get a python error of RuntimeError: CUDA error: out of memory``, rather than ~1 second, so it feels like it's lasting more before OOMing. But it still fails. This happens on rs trainandrs predict`.

Is there any way to make robosat use less memory so that I can at least run this on my GPU rather than my CPU? Or must I just accept that my hardware isn’t good enough? I know very little about graphics cards, cuda, or torch, or computer vision stuff.


I run it on my CPU with by installing torch with pip install --upgrade https://download.pytorch.org/whl/cpu/torch-0.4.0-cp36-cp36m-linux_x86_64.whl, and setting cuda = false in model.toml. It works, and for tiny areas, I can get results in a few hours. But I’d love if it could be faster.


$ nvidia-smi
Wed Oct 10 18:21:07 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.87                 Driver Version: 390.87                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GT 710      Off  | 00000000:01:00.0 N/A |                  N/A |
| 40%   38C    P8    N/A /  N/A |    177MiB /   978MiB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0                    Not Supported                                       |
+-----------------------------------------------------------------------------+

and

$ nvcc 
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

My OS is Ubuntu 18.04.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:10

github_iconTop GitHub Comments

1reaction
amandasauruscommented, Oct 19, 2018

I had “success” with splitting the tiles up. With a batch=1, I was able to run rs train on my CPU and the speed was about 6sec per 512×512 tile. By splitting it 4 times, I got 32×32 images, and was just about able to squeeze that into 1GB of memory. It was ~930MB IIRC (batch=1 ofc). I got 0.6s per tile then. 10 tiles faster! But there are 64 times more images… So ~6½ times slower! 🤦

I can run rs predict using the graphics card, and that’s faster than the CPU, and only takes ~300MB of memory on 512×512 images, and does ~20 tiles per second. So I can train on CPU, but predict on GPU.

I have noticed that I have a laptop with a 2GB nvidia card. I’ll try that over the karlsruhe hack weekend.

0reactions
amandasauruscommented, Oct 20, 2018

For those interested, here is the script to split tiles. Call it like ./split_tiles.sh ./tiles/ 18.


#! /bin/bash

set -o nounset
set -o errexit

TILEDIR=$(realpath $1)
ORIG_ZOOM=$2
NEW_ZOOM=$(( $ORIG_ZOOM + 1 ))

if [[ ! -d $TILEDIR/$ORIG_ZOOM ]] ; then
    exit
fi

NUM_FILES=$(find $TILEDIR/$ORIG_ZOOM -mindepth 2 -maxdepth 2 -type f | wc -l)
find $TILEDIR/$ORIG_ZOOM -mindepth 2 -maxdepth 2 ! -name '*sub*' -type f -printf "%P\n" | while read TILE ; do
    X=${TILE%%/*}
    Y=${TILE##*/}
    Y=${Y%%.*}
    mkdir -p $TILEDIR/$NEW_ZOOM/$(($X* 2))/
    mkdir -p $TILEDIR/$NEW_ZOOM/$(($X*2+1))/
    # Imagemagick can do -crop 50%x50% image_%d.png which writes 4 files, but
    # that doesn't work with webp
    convert $TILEDIR/$ORIG_ZOOM/$TILE +repage -crop 50%x50% $TILEDIR/$ORIG_ZOOM/$X/${Y}_sub%d.png 2>/dev/null
    mv $TILEDIR/$ORIG_ZOOM/$X/${Y}_sub0.png $TILEDIR/$NEW_ZOOM/$(($X*2))/$(($Y*2)).png || (ls -lh $TILEDIR/$ORIG_ZOOM/$X/ ; exit 1)
    mv $TILEDIR/$ORIG_ZOOM/$X/${Y}_sub1.png $TILEDIR/$NEW_ZOOM/$(($X*2))/$(($Y*2+1)).png || (ls -lh $TILEDIR/$ORIG_ZOOM/$X/ ; exit 1)
    mv $TILEDIR/$ORIG_ZOOM/$X/${Y}_sub2.png $TILEDIR/$NEW_ZOOM/$(($X*2+1))/$(($Y*2)).png || (ls -lh $TILEDIR/$ORIG_ZOOM/$X/ ; exit 1)
    mv $TILEDIR/$ORIG_ZOOM/$X/${Y}_sub3.png $TILEDIR/$NEW_ZOOM/$(($X*2+1))/$(($Y*2+1)).png || (ls -lh $TILEDIR/$ORIG_ZOOM/$X/ ; exit 1)
    rm -v $TILEDIR/$ORIG_ZOOM/$TILE
done | pv -l -s $NUM_FILES -N "splttiing $ORIG_ZOOM->$NEW_ZOOM" >/dev/null
find $TILEDIR/$ORIG_ZOOM -type d -empty -delete
Read more comments on GitHub >

github_iconTop Results From Across the Web

How to fix Out of Memory error in windows 10 - TechCult
To resolve this problem yourself, modify the desktop heap size. To do this, follow these steps: 1.Click Start, type regedit in the Start...
Read more >
Steps to Fix Out of Memory Error in Windows 10, 8.1, 8, 7
Out-of-memory error is an often unwanted state of computer performance. Where no additional memory can be assigned for use by applications ...
Read more >
What Does "out of Memory" Mean? - EasyTechJunkie
An out of memory error causes programs — or even the entire computer — to power down. This problem is typically caused either...
Read more >
Fix out of Error Memory Error in Windows 10 - YouTube
Fix out of Error Memory Error in Windows 10.One common error issued by the Windows 8 / Windows 8.1/ Windows 10 system (which...
Read more >
[SOLVED] How to Fix Out Of Memory Error Problem Issue
Various factors affect the memory of RAM, mainly its the OS, how much system apps and services taking the memory, furthermore your third...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found