Dev Observability
Product
Pricing
Docs
Resources
Blog
Company
Debug Wordle

question-mark

Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add option to limit number of CPUs for data preprocessing

See original GitHub issue

Subject of the feature

Currently, we are using AUTOTUNE in data preprocessing, e.g. https://github.com/DeepRegNet/DeepReg/blob/main/deepreg/dataset/loader/interface.py#L113

However, it may take too many CPUs and thus also memories and this is not ideal on clusters. Therefore we need to be able to configure this num_parallel_calls.

The fix can be,

when loading config, we will set num_parallel_calls to the given value if provided, otherwise tf.data.experimental.AUTOTUNE.
then we pass this num_parallel_calls to all funcs using it.

FYI @YipengHu @zacbaum @fepegar

Issue Analytics

State:
Created 2 years ago
Reactions:2
Comments:6 (6 by maintainers)

Top GitHub Comments

2reactions

mathpluscodecommented, Mar 27, 2021

regarding the concern of @zacbaum the optimization options exist in 2.3

https://www.tensorflow.org/versions/r2.3/api_docs/python/tf/data/experimental/OptimizationOptions

But autotune_ram_budget option in 2.4 https://www.tensorflow.org/api_docs/python/tf/data/experimental/OptimizationOptions does not exist

Anyway, this issue aims to solve the CPU problem, not memory problem. If memory problem is not solved, we do a new issue.

2reactions

mathpluscodecommented, Mar 27, 2021

num_cpus=1 and num_parallel_calls=1

num_cpus=-1 and num_parallel_calls=1

num_cpus=-1 and num_parallel_calls=-1

OK… it only confirms the fix on number of cpus/threads used, the memory seems to be not impacted?

Read more comments on GitHub >

Top Results From Across the Web

Preprocess - Hugging Face

A tokenizer splits text into tokens according to a set of rules. The tokens are converted into numbers and then tensors, which become...

Overcoming Data Preprocessing Bottlenecks with TensorFlow ...

A CPU bottleneck occurs when the GPU resource is under utilized as a result of one, or more of the CPUs, having reached...

Data preprocessing for ML: options and recommendations

This document highlights the challenges of preprocessing data for ML, and it describes the options and scenarios for performing data ...

Tensorflow 2.0 utilize all CPU cores 100% - Stack Overflow

Just setting the set_intra_op_parallelism_threads and ... piece of code worked for me in limiting the CPU usage of tensorflow below 500%:

FAQ - Frequently Asked Questions - fMRIPrep

Should I run quality control of my images before running fMRIPrep? ... Or you can add -no-isrunning to the recon-all command-line. The contents...

Top Related Medium Post

No results found

Top Related StackOverflow Question

No results found

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Top Related Reddit Thread

No results found

Top Related Hackernoon Post

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Top Related Hashnode Post

No results found

Sharing DeepReg models through Hugging Face Hub

Separate data config for different splits