question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Question] After updating to 0.5.1 multi-gpu support have not been working

See original GitHub issue

Hi! Thank you for the great lib!

After updating from 0.2.4 to 0.5.1 only one gpu out of two is used. The code almost the same:

hyper_parameters = BLSTMModel.get_default_hyper_parameters()
hyper_parameters['layer_bi_lstm']['units'] = 1024

model = BLSTMModel(embedding, hyper_parameters=hyper_parameters)
model.build_model(X_train, y_train)
model.build_multi_gpu_model(gpus=2, x_train=X_train, y_train=y_train, x_validate=X_valid, y_validate=y_valid)
model.fit(X_train, y_train, epochs=15, batch_size=512, x_validate=X_valid, y_validate=y_valid, callbacks=[tf_board_callback, checkpoint_callback])

NVtop and nvidia-smi showing that only one gpu is working. In the previous version I have both of my gpus been used.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

6reactions
jeshurencommented, Jul 23, 2019

I am also facing the same issue. Would be great if someone could help.

UPDATE:

Try without calling the build_model() method. It works!

model = BLSTMModel(embedding, hyper_parameters=hyper_parameters)

model.build_multi_gpu_model(gpus=2, x_train=X_train, y_train=y_train, x_validate=X_valid, y_validate=y_valid)

model.fit(X_train, y_train, epochs=15, batch_size=512, x_validate=X_valid, y_validate=y_valid, callbacks=[tf_board_callback, checkpoint_callback])
0reactions
sldcommented, Jul 24, 2019
Read more comments on GitHub >

github_iconTop Results From Across the Web

multi-GPU "alloc failed" error arises after updating to rel-0.5.0
On a multi-GPU system, if a model session is created on GPUs other than GPU0 (i.e. GPU1), then "alloc failed" error will be...
Read more >
tensorflow-macos slow (Could not i… | Apple Developer Forums
When i switch to CPU, it works properly and there is no NUMA node error message, but of course CPUs are slower compared...
Read more >
NEW 2022! Big Sur on Unsupported Macs [2008 ... - YouTube
I would not recommend installing on a system that you rely on for work ... to Patched Big Sur makes your Mac safer...
Read more >
OpenCore Legacy Patcher 0.4.6 & .7 Update + Ventura ...
macOS Ventura and OpenCore Legacy Patcher Support Updates ... upgrading to Patched Monterey makes your Mac safer since Apple is no longer ...
Read more >
Using MATLAB Graphics
Graphics. 1. MATLAB Plotting Tools. Anatomy of a Graph . ... When no object is selected, the Property Editor displays the figure's properties....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found