Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Memory chunks overflow when using tf.nn.seq2seq.embedding_attention_seq2seq

See original GitHub issue

Hi,

I’m running the project from source (master) using Python 3.5, and when I change the model from:

tf.nn.seq2seq.embedding_rnn_seq2seq to tf.nn.seq2seq.embedding_attention_seq2seq

on line 160 of model.py, it blows up, and I get this message: . . . I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 7681919232 totalling 7.15GiB

My GPU (a GTX 1080) only has roughly 5 GB of RAM. I tried decreasing the batch size; but even with a batchSize of 5, I still get the same error.

It appears that the chunks are simply too large. How do I decrease them? Also, what exactly does each chunk represent? Is it a vector embedding? Or something else?

Issue Analytics

State:
Created 7 years ago
Comments:12 (5 by maintainers)

Top GitHub Comments

1reaction

Conchylicultorcommented, Jan 12, 2017

Try to set the softmaxSamples parameter: --softmaxSamples 512. That may helps.

Also I’ll say that the vocabulary size is probably too big as described here https://github.com/Conchylicultor/DeepQA/issues/29#issuecomment-267771058.

0reactions

raunak95commented, Oct 31, 2017

Hi, I am trying to train on the Cornell movie data corpus using attention as well. I am executing the code on a GPU. As can be seen in the image, this is the step where the execution is stuck and not moving ahead. What could be the problem? I checked again by training it without attention, it executes smoothly. As soon as I change the bit for having attention, the execution gets stuck here.