Add option to limit number of CPUs for data preprocessing
See original GitHub issueSubject of the feature
Currently, we are using AUTOTUNE in data preprocessing, e.g. https://github.com/DeepRegNet/DeepReg/blob/main/deepreg/dataset/loader/interface.py#L113
However, it may take too many CPUs and thus also memories and this is not ideal on clusters.
Therefore we need to be able to configure this num_parallel_calls
.
The fix can be,
- when loading config, we will set
num_parallel_calls
to the given value if provided, otherwisetf.data.experimental.AUTOTUNE
. - then we pass this
num_parallel_calls
to all funcs using it.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:2
- Comments:6 (6 by maintainers)
Top Results From Across the Web
Preprocess - Hugging Face
A tokenizer splits text into tokens according to a set of rules. The tokens are converted into numbers and then tensors, which become...
Read more >Overcoming Data Preprocessing Bottlenecks with TensorFlow ...
A CPU bottleneck occurs when the GPU resource is under utilized as a result of one, or more of the CPUs, having reached...
Read more >Data preprocessing for ML: options and recommendations
This document highlights the challenges of preprocessing data for ML, and it describes the options and scenarios for performing data ...
Read more >Tensorflow 2.0 utilize all CPU cores 100% - Stack Overflow
Just setting the set_intra_op_parallelism_threads and ... piece of code worked for me in limiting the CPU usage of tensorflow below 500%:
Read more >FAQ - Frequently Asked Questions - fMRIPrep
Should I run quality control of my images before running fMRIPrep? ... Or you can add -no-isrunning to the recon-all command-line. The contents...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
regarding the concern of @zacbaum the optimization options exist in 2.3
https://www.tensorflow.org/versions/r2.3/api_docs/python/tf/data/experimental/OptimizationOptions
But
autotune_ram_budget
option in 2.4 https://www.tensorflow.org/api_docs/python/tf/data/experimental/OptimizationOptions does not existAnyway, this issue aims to solve the CPU problem, not memory problem. If memory problem is not solved, we do a new issue.
num_cpus=1
andnum_parallel_calls=1
num_cpus=-1
andnum_parallel_calls=1
num_cpus=-1
andnum_parallel_calls=-1
OK… it only confirms the fix on number of cpus/threads used, the memory seems to be not impacted?