question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Train on datasets larger than memory

See original GitHub issue

I wish to train a MaskRCNN model, but I can’t fit all training set annotations as a list[dict] in memory (as I understand, this is needed when using DatasetCatalog).

How can we train really large models, where the annotations alone are vastly larger than the memory size?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

2reactions
zhanghang1989commented, Mar 25, 2022

@zhanghang1989 Hey, does D2Go cache option require training from the more limited D2Go Model Zoo? If so, is there any way to train on datasets where annotations don’t fit into memory and still be able to choose from any detectron2 model?

(I have a very large dataset to train, but am not interested in a model optimized for mobile deployment)

D2Go should support training for all Detectron2 model configs.

0reactions
austinmwcommented, Mar 24, 2022

@zhanghang1989 Hey, does D2Go cache option require training from the more limited D2Go Model Zoo? If so, is there any way to train on datasets where annotations don’t fit into memory and still be able to choose from any detectron2 model?

(I have a very large dataset to train, but am not interested in a model optimized for mobile deployment)

Read more comments on GitHub >

github_iconTop Results From Across the Web

Training on Large Datasets That Don't Fit In Memory in Keras
Training your Deep Learning algorithms on a huge dataset that is too large to fit in memory? If yes, this article will be...
Read more >
tensorflow - Training on datasets too big to fit in RAM
I am using TensorFlow to train on a very large dataset, which is too large to fit in RAM. Therefore, I have split...
Read more >
Performance tips | TensorFlow Datasets
Large datasets are sharded (split in multiple files) and typically do not fit in memory, so they should not be cached. Shuffle and...
Read more >
Training models when data doesn't fit in memory
The data is not huge and it actually fits in memory, but it's big enough so we can demonstrate memory usage gains with...
Read more >
Train Models on Large Datasets
Estimators implemented in Dask-ML work well with Dask Arrays and DataFrames. This can be much larger than a single machine's RAM. They can...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found