question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Compute metric per image and handler mutex in DistributedDataParallel

See original GitHub issue

❓ Questions/Help/Support

Hi @vfdev-5 ,

I am writing an ignite handler to write the segmentation metrics of every image into 1 CSV file as the summary, for example:

metrics.csv:
/data/spleen/image_1    0.85
/data/spleen/image_2    0.87
/data/spleen/image_3    0.91
... ...

The problems are that:

  1. I tried to add logic to metrics.update() to cache every record and write to CSV in metrics.complete(), but ignite.metrics only accepts output_transform, so I can’t extract the filenames from engine.state.batch.
  2. Then I changed to write a separate handler for this feature, but ignite metrics only saves the final average metrics into engine.state.metrics, handler is not easy to get every metric value corresponding to every image.
  3. Another problem is the DistributedDataParallel, when I run the handler in multi-processsing, how do you usually use the multi-processing lock to save content into 1 CSV in both unix and windows OS?

Thanks.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:16 (9 by maintainers)

github_iconTop GitHub Comments

2reactions
Nic-Macommented, Jan 25, 2021

Hi @vfdev-5 and @sdesrozis ,

Thanks for your discussion. I added self.engine to the metrics in this draft MONAI PR: https://github.com/Project-MONAI/MONAI/pull/1497. Will delete it in the future when you guys added engine to the Metric base class.

Thanks.

1reaction
vfdev-5commented, Jan 23, 2021

@sdesrozis the problem here is that we already have existing API for detach and is_attached method that always require engine and adding Metric.engine would require an update for it too, I think.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Distributed Training in PyTorch (Distributed Data Parallel)
Today we will be covering Distributed Data Parallel in PyTorch which can be used to distribute data across GPUs to train the model...
Read more >
Distributed data parallel training in Pytorch
DataParallel is easier to use (just wrap the model and run your training script). However, because it uses one process to compute the...
Read more >
TensorBow: Supporting Small-Batch Training in TensorFlow
Deep neural networks are trained using mini-batch Stochastic Gradient Descent (SGD) on specialised hardware accelerators such as a GPU.
Read more >
9th USENIX Symposium on Operating Systems Design and ...
mutex per mapping. 5 EVALUATION. This section evaluates the MOSBENCH applications on the most recent Linux kernel at the time of writing. (Linux...
Read more >
Lecture Notes in Computer Science 6272 - Springer Link
In such approaches images and video frames are scattered among the available compute nodes, such that each node calculates over a par-.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found