training speed is about 2x slower than JAX trainable version (Uni-Fold)
See original GitHub issuedevice: 1 A100 with 40GB memory
cuda: 11.3
Compared with https://github.com/dptech-corp/Uni-Fold, using model_2
setting, and the same data (only use one sample, and use DummyDataLoader
in openfold).
And I follow this issue, https://github.com/aqlaboratory/openfold/issues/19, disabled clear_cache_between_blocks
and deepspeed
for cpu offload.
The commit I used is https://github.com/aqlaboratory/openfold/commit/c4d9f57f9005f3e9e0325eff97b8232e328b4813
speed per example:
FP32 | FP16 | |
---|---|---|
openfold | 24.5 s | 17 s |
Uni-Fold | 13.25 s | 8.9 s |
Is that expected? any tricks that I can get further speed-up?
Issue Analytics
- State:
- Created 2 years ago
- Comments:43 (8 by maintainers)
Top Results From Across the Web
Training duration & NaNs during training · Issue #19 - GitHub
I'm wondering what training times I can expect for a single target. ... training speed is about 2x slower than JAX trainable version...
Read more >Need for Speed: JAX. Training your neural network ten times…
Training your neural network ten times faster using Jax on a TPU. All the cool kids seem to be raving about JAX these...
Read more >System Optimizations Enable Training Deep Learning Models ...
DeepSpeed carefully designs a 3D parallelism strategy to train a massive-scale language model with 17 billion parameters [21] .
Read more >Training a subset of parameters - Haiku Documentation
Training a subset of parameters#. Sometimes when training a neural network it is useful to hold some parameters of your network fixed while...
Read more >@graphml - تمام پست های تلگرام کانال Graph Machine Learning
The denoising model is the same equivariant EGNN. Interestingly, DiffLinked has an additional module to predict the linker size (number of molecules) so...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
BTW @guolinke the recycling number bug is now fixed. The fix requires a little bit of extra data processing, and so it comes with a performance penalty of about half a second. I’m trying to think of ways to improve it.
@lhatsk would you mind moving this bfloat16 stuff into a new issue?