seems meet the GPU memory leak problem
See original GitHub issueI wrap the ``BertModel’’ as a persistent object and init it once, then iteratively use it as the feature extractor to generate the feature of data batch, while it seems I met the GPU memory leak problem. After starting the program, the GPU memory usage keeps increasing until ‘out-of-memory’. Some key codes are as following! Every ‘self.bert_model.get_bert_feature()’ executed, the GPU memory increased. I did simple debugging, and maybe the problem caused by the ‘class BertEmbeddings.forward()’. My pytorch version is 0.4.0, py3. Waiting for your reply, thanks very much!
class BertModel(PreTrainedBertModel):
def __init__(self, config):
super(BertModel, self).__init__(config)
self.embeddings = BertEmbeddings(config)
self.encoder = BertEncoder(config)
self.pooler = BertPooler(config)
self.apply(self.init_bert_weights)
def forward(self, input_ids, token_type_ids=None, attention_mask=None, output_all_encoded_layers=False):
#logger.info('bert forward')
if attention_mask is None:
attention_mask = torch.ones_like(input_ids)
if token_type_ids is None:
token_type_ids = torch.zeros_like(input_ids)
# We create a 3D attention mask from a 2D tensor mask.
# Sizes are [batch_size, 1, 1, to_seq_length]
# So we can broadcast to [batch_size, num_heads, from_seq_length, to_seq_length]
# this attention mask is more simple than the triangular masking of causal attention
# used in OpenAI GPT, we just need to prepare the broadcast dimension here.
extended_attention_mask = attention_mask.unsqueeze(1).unsqueeze(2)
# Since attention_mask is 1.0 for positions we want to attend and 0.0 for
# masked positions, this operation will create a tensor which is 0.0 for
# positions we want to attend and -10000.0 for masked positions.
# Since we are adding it to the raw scores before the softmax, this is
# effectively the same as removing these entirely.
extended_attention_mask = extended_attention_mask.to(dtype=next(self.parameters()).dtype) # fp16 compatibility
extended_attention_mask = (1.0 - extended_attention_mask) * -10000.0
embedding_output = self.embeddings(input_ids, token_type_ids)
encoded_layers = self.encoder(embedding_output,
extended_attention_mask,
output_all_encoded_layers=output_all_encoded_layers)
return encoded_layers
class Bert_Instance(object):
def __init__(self, vocab_file, bert_model_path, device):
#tokenizer = BertTokenizer.from_pretrained(args.bert_model, do_lower_case=args.do_lower_case)
self.tokenizer = BertTokenizer(vocab_file)
self.model = BertModel.from_pretrained(bert_model_path)
self.device = device
print ('bert_device=', self.device)
self.model.to(self.device)
self.model.eval()
for para in self.model.parameters():
para.requires_grad = False
def get_feature(self, text_list, max_seq_length=50, layer=-1):
'''
Args:
text_list is a list to store the sentences, length is the sentence_number
Return:
(batch_size, seq_len+2, hidden_size)
'''
# a list, each dict element key is (ex_index, tokens, input_ids, input_mask, input_type_ids)
all_features = convert_examples_to_features(examples=text_list,
max_seq_length=max_seq_length,
tokenizer=self.tokenizer)
all_input_ids = torch.tensor([f['input_ids'] for f in all_features]).type(torch.cuda.LongTensor).to(self.device)
all_input_mask = torch.tensor([f['input_mask'] for f in all_features]).type(torch.cuda.LongTensor).to(self.device)
all_encoder_layers = self.model(all_input_ids,
token_type_ids=None,
attention_mask=all_input_mask)
return all_encoder_layers, all_input_mask
class Bert_Model(object):
def __init__(self, device):
self.bert_model = Bert_Instance(BERT_VOCAB, BERT_MODEL, device)
self.device = device
self.zp_pre_cache = {}
self.zp_post_cache = {}
self.candi_np = {}
self.cache = {'zp_pre': self.zp_pre_cache,
'zp_post': self.zp_post_cache,
'candi_np': self.candi_np}
def get_bert_feature(self, text_list, cache_name, batch_id, max_seq_length=30, layer=-1):
if batch_id in self.cache[cache_name].keys():
#res = torch.tensor(self.cache[cache_name][batch_id]).type(torch.cuda.FloatTensor).to(self.device)
res = self.cache[cache_name][batch_id]
return res
else:
res = self.bert_model.get_feature(text_list, max_seq_length, layer)
self.cache[cache_name][batch_id] = res
return res
class Experiment(object):
def __init__(self):
# load training data
with open(DIR+"data/train_data", "rb") as fin1, \
open(DIR+"data/emb","rb") as fin2:
self.train_generator = cPickle.load(fin1)
self.embedding_matrix, _ , _ = cPickle.load(fin2, encoding='iso-8859-1')
# load test data
self.test_generator = DataGenerator("test", 256)
self.dev_data = self.train_generator.generate_dev_data()
self.test_data = self.test_generator.generate_data()
# declare model architecture
self.model = Network(nnargs["embedding_size"], nnargs["embedding_dimension"], self.embedding_matrix, nnargs["hidden_dimension"], 2).to(NET_DEVICE)
self.bert_model = Bert_Model(BERT_DEVICE)
this_lr = 0.003
self.optimizer = optim.Adagrad(self.model.parameters(), lr = this_lr)
self.best = {"sum":0.0, "test_f":0.0, "best_test_f":0.0}
self.dropout = nnargs["dropout"]
def forward_step(self, data, mode, dropout=0.0):
zp_relative_index, zp_pre, zp_pre_mask, zp_post, zp_post_mask, candi_np, candi_np_mask, feature, zp_pre_words, zp_post_words, candi_np_words, batch_id = data2tensor(data)
batch_id = mode + '_' + str(batch_id)
zp_pre_bert, _ = self.bert_model.get_bert_feature(zp_pre_words, 'zp_pre', batch_id)
zp_post_bert, _ = self.bert_model.get_bert_feature(zp_post_words, 'zp_post', batch_id)
candi_np_bert, _ = self.bert_model.get_bert_feature(candi_np_words, 'candi_np', batch_id)
.....
Issue Analytics
- State:
- Created 5 years ago
- Comments:17 (4 by maintainers)
Top Results From Across the Web
Seems like there's a memory leak after the expansion update.
I have 8gb RAM(dual channel), i5 7500 CPU, GTX 1070 GPU, VS550 PSU, D2 installed on 500GB SSD. RAM usage increases after playing...
Read more >How to debug causes of GPU memory leaks? - PyTorch Forums
I have wrote a simple line profiler that examines amount of GPU memory on each step, and it seems like on each loss.backwards()...
Read more >How do I check for memory leaks, and what should I do to stop ...
Using Window's Resource Monitor. To find a memory leak, you've got to look at the system's RAM usage. This can be accomplished in...
Read more >Desktop Windows Manager (DWM) memory leakage in ...
The infamous dwm.exe hogging lots of memory. FYI this is not a specific program issue. It just happens with any programs, ...
Read more >How to avoid memory leak on segfault with Cuda
Definitely try latest CUDA and driver. Also, if for some reason the host process associated with the program that segfaulted does not actually ......
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

I have the newest version of pytorch and transformers, yes.
I have been monitoring the memory usage over 24h when I made ~ 300.000 requests. It seems that the memory increases constantly for quite some time but also seems to stabilize at a certain maximum. So the application started using ~2.5GB RAM and now stays at ~4.3GB.
Maybe it has something to do with varying lengths of the texts I process? So that the longest texts are processed at a later point in time which then require the most RAM. Then, any subsequent text cannot need more so it stabilizes. Though this is just a thought.
Thanks already for your help, I’m off to Christmas vacations for now and will have a look at the issue in January again. I’ll see if memory usage increases by then.
So I tried it with
bert-base-multilingual-uncasedas well and it is the same behavior. I do not understand, why memory constantly grows on inference. To my understanding, I only push data through the network and then use the result layer’s output. Before using the transformers, I had been using custom word embeddings trained in own keras models and I did not have this behavior. What am I missing here?