question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error: Invalid TF_Status: 8, Message: OOM when allocating tensor

See original GitHub issue

My GPU is the NVIDIA GTX 970, and if it isn’t clear by the issue, I am using tfjs-node-gpu

TensorFlow.js version

tfjs:“1.5.2” tfjs-converter:“1.5.2” tfjs-core:“1.5.2” tfjs-data:“1.5.2” tfjs-layers:“1.5.2” tfjs-node:“1.5.2”

Browser version

ares:“1.15.0” brotli:“1.0.7” chrome:“78.0.3904.130” electron:“7.1.7” http_parser:“2.8.0” icu:“64.2” llhttp:“1.1.4” modules:“75” napi:“4” nghttp2:“1.39.2” node:“12.8.1” openssl:“1.1.0”

Describe the problem or feature request

I get the following stacktrace after a while of using face-api.js’s detectAllFaces method which utilizes tensorflow.js:

(node:18420) UnhandledPromiseRejectionWarning: Error: Invalid TF_Status: 8
Message: OOM when allocating tensor with shape[1,3688,3688,3] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
    at Object.<anonymous> (<anonymous>)
    at NodeJSKernelBackend.executeSingleOutput (C:\Users\Infinity\Documents\Code\NodeJS\face-rec\node_modules\@tensorflow\tfjs-node-gpu\dist\nodejs_kernel_backend.js:193:43)
    at NodeJSKernelBackend.concat (C:\Users\Infinity\Documents\Code\NodeJS\face-rec\node_modules\@tensorflow\tfjs-node-gpu\dist\nodejs_kernel_backend.js:405:21)
    at C:\Users\Infinity\Documents\Code\NodeJS\face-rec\node_modules\face-api.js\node_modules\@tensorflow\tfjs-core\dist\ops\concat_split.js:184:78
    at C:\Users\Infinity\Documents\Code\NodeJS\face-rec\node_modules\@tensorflow\tfjs-node-gpu\node_modules\@tensorflow\tfjs-core\dist\engine.js:528:55
    at C:\Users\Infinity\Documents\Code\NodeJS\face-rec\node_modules\@tensorflow\tfjs-node-gpu\node_modules\@tensorflow\tfjs-core\dist\engine.js:388:22
    at Engine.scopedRun (C:\Users\Infinity\Documents\Code\NodeJS\face-rec\node_modules\@tensorflow\tfjs-node-gpu\node_modules\@tensorflow\tfjs-core\dist\engine.js:398:23)
    at Engine.tidy (C:\Users\Infinity\Documents\Code\NodeJS\face-rec\node_modules\@tensorflow\tfjs-node-gpu\node_modules\@tensorflow\tfjs-core\dist\engine.js:387:21)
    at kernelFunc (C:\Users\Infinity\Documents\Code\NodeJS\face-rec\node_modules\@tensorflow\tfjs-node-gpu\node_modules\@tensorflow\tfjs-core\dist\engine.js:528:29)
    at C:\Users\Infinity\Documents\Code\NodeJS\face-rec\node_modules\@tensorflow\tfjs-node-gpu\node_modules\@tensorflow\tfjs-core\dist\engine.js:539:27
(node:18420) UnhandledPromiseRejectionWarning: Error: Invalid TF_Status: 8
Message: OOM when allocating tensor with shape[1,3688,3688,3] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
    at Object.<anonymous> (<anonymous>)
    at NodeJSKernelBackend.executeSingleOutput (C:\Users\Infinity\Documents\Code\NodeJS\face-rec\node_modules\@tensorflow\tfjs-node-gpu\dist\nodejs_kernel_backend.js:193:43)
    at NodeJSKernelBackend.concat (C:\Users\Infinity\Documents\Code\NodeJS\face-rec\node_modules\@tensorflow\tfjs-node-gpu\dist\nodejs_kernel_backend.js:405:21)
    at C:\Users\Infinity\Documents\Code\NodeJS\face-rec\node_modules\face-api.js\node_modules\@tensorflow\tfjs-core\dist\ops\concat_split.js:184:78
    at C:\Users\Infinity\Documents\Code\NodeJS\face-rec\node_modules\@tensorflow\tfjs-node-gpu\node_modules\@tensorflow\tfjs-core\dist\engine.js:528:55
    at C:\Users\Infinity\Documents\Code\NodeJS\face-rec\node_modules\@tensorflow\tfjs-node-gpu\node_modules\@tensorflow\tfjs-core\dist\engine.js:388:22
    at Engine.scopedRun (C:\Users\Infinity\Documents\Code\NodeJS\face-rec\node_modules\@tensorflow\tfjs-node-gpu\node_modules\@tensorflow\tfjs-core\dist\engine.js:398:23)
    at Engine.tidy (C:\Users\Infinity\Documents\Code\NodeJS\face-rec\node_modules\@tensorflow\tfjs-node-gpu\node_modules\@tensorflow\tfjs-core\dist\engine.js:387:21)
    at kernelFunc (C:\Users\Infinity\Documents\Code\NodeJS\face-rec\node_modules\@tensorflow\tfjs-node-gpu\node_modules\@tensorflow\tfjs-core\dist\engine.js:528:29)
    at C:\Users\Infinity\Documents\Code\NodeJS\face-rec\node_modules\@tensorflow\tfjs-node-gpu\node_modules\@tensorflow\tfjs-core\dist\engine.js:539:27

I’ve tried looking around for a possible solution, but in all honesty, I am unaware how to implement the solutions offered. The solutions were involving processing the images in batch, but I am using an API that is built upon tfjs, so I don’t even know if I could implement the solution, let alone how to. That being said, I read elsewhere that setting the TF_FORCE_GPU_ALLOW_GROWTH environmental variable to true could potentially fix the solution; however, the only issue that solved in the past was “Invalid TF_Status: 2. Message: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize”.

FWIW: Invalid TF_Status: 8, Message: OOM when allocating tensor with shape occurrs on the 78th image. Additionally, I noticed once that it went ~5 images past that on, so around the 83rd image or so, on one random run.

Code to reproduce the bug / link to feature request

const scrapped = [];
const faceapiOptions = new faceapi.SsdMobilenetv1Options({ minConfidence: 0.9 });
// file = json object {name, id} | allImagesInFolder = array of files (type object {name, id})
for (const file of allImagesInFolder) {
	// Get the image as a Buffer from Google Drive, and decode it using tensorflow
	const image = tf.node.decodeImage(await googleUtils.getImageAsBuffer(file.id));
	console.log(`Detecting faces in ${file.id} of size ${image.size}`);
	// Detect faces within the image and the landmarks for each face
	const detections = await faceapi.detectAllFaces(image, faceapiOptions).withFaceLandmarks(); // ERROR HERE
	if (detections.length == 1) {
		scrapped.push(detections[0]);
	}
}

For what it’s worth, I created a simple benchmark for myself in order to test the time it took to detect faces using face-api.js. The following is a snippet that runs just fine and doesn’t result in Error: Invalid TF_Status: 8. Why is it that the following snippet executes with no errors, but the snippet above does?

const testingImg = tf.node.decodeImage(await googleUtils.getImageAsBuffer('1wBKoDbjr8qGfVrzyhzFkiZ1i-O4FLhU3'));
const detection = await faceapi.detectAllFaces(testingImg).withFaceLandmarks();
console.log(detection);
const withoutPromise = [];
const withPromise = [];

for (let i = 0; i < 1000; i++) {
	withoutPromise.push(faceapi.detectAllFaces(img));
	withPromise.push(faceapi.detectAllFaces(img).withFaceLandmarks());
}

const start = new moment();
Promise.all(withoutPromise).then(() => {
	console.log(`Time to detect all faces w/o landmarks: ${moment.duration(new moment().diff(start)).asMilliseconds()} ms`);
});

Promise.all(withPromise).then(() => {
	console.log(`Time to detect all faces w/ landmarks: ${moment.duration(new moment().diff(start)).asMilliseconds()} ms`);
});

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
tafsiricommented, Feb 26, 2020

@Infinitay try adding image.dispose() after calling faceapi.detectAllFaces. It might free up intermediate tensors faster.

1reaction
tafsiricommented, Feb 27, 2020

I would most likely interpret that graph as saying your program doesn’t saturate the GPU. If face-api takes images in batches you might be able to pass in more images at a time and thus utilize more of your GPU (you would need to investigate the face-api api to find out if this is doable).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Tensorflow Error: OOM when allocating tensor with strange ...
In my model, as we can see in the model.summary() , the only 4096 is at the end, in the Dense layer, after...
Read more >
How to solve Error of ResourceExhaustedError in Tensorflow
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Read more >
Keras OOM when allocating tensor with shape
I think there's something wrong with my input shapes (which is unknowingly being to set to very large) and that is causing the...
Read more >
OOM when allocating tensor with shape[128,8,21]....
I was on Epoch 1 / 100 and 2054 / 20736 iterations when it crashed with this message. OS: Windows 10. CUDA v10....
Read more >
sitemap.xml - Intellipaat
... community/496/how-to-assign-a-value-to-a-tensorflow-variable 0.10805189833509 ... /4733/webapis-message-an-error-has-occurred-on-iis7-not-in-iis-express ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found