ClearrMLLogger freezes logging in distributed env
See original GitHub issue🐛 Bug description
Instantiating a ClearMLLogger
instance in a distributed training freezes the logging of the subprocesses. The bug can be reproduced by executing the following script: https://github.com/H4dr1en/trains/commit/642c1130ad1f76db10ed9b8e1a4ff0fd7e45b3cc
I will open a PR to fix it.
Environment
- PyTorch Version (e.g., 1.7.1):
- Ignite Version (e.g., 0.5.0):
- OS (e.g., Linux): linux
- How you installed Ignite (
conda
,pip
, source): pip/source - Python version: 3.8.5
- Any other relevant information:
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Logger | ClearML
The Logger class is the ClearML console log and metric statistics interface, and contains methods for explicit reporting. Explicit reporting extends ClearML ......
Read more >fileio — mmcv 1.7.0 documentation - Read the Docs
logging.Logger. mmcv.utils.has_method(obj: object, method: str) → bool[source] ... in a given interval when performing in distributed environment.
Read more >Distributed training with CPU's - ignite - PyTorch Forums
I tried using ignite.distributed with the gloo backend, but when … ... with_clearml (bool): if True, experiment ClearML logger is setup.
Read more >clearml Changelog - pyup.io
Fix `clearml` logger default level cannot be changed (741) ... Log environment variables starting with `*` in `environ_bind.py` (459) - Pipeline
Read more >elasticsearch: docs/reference/release-notes.asciidoc - Fossies
The {es} audit log could contain sensitive information, such as password hashes ... SQL: passing an input to the CLI "freezes" the CLI...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@H4dr1en you are correct 😄 The first “main process” creates the actual Task and sets an OS environment variable specifying the newly created Task. When the child processes call the “same”
Task.init
they inherit the environment variable from the main process, so it knows to connect with the main Task and log everything back to it, instead of creating a new Task, specifically here, I think the issue is that the secondTask.init
call should have the same project/experiment name, which it does not have, hence the issue.I can verify I was able to reproduce the issue, and I can also verify the PR solved it 🎉 🎊
@vfdev-5 To the best of my knowledge, it happens like that:
Task.init
functionTask.init
, will contact the main process to retrieve the task. They don’t recreate a new task, but get a copy of the task created by the main process.@bmartinn please correct me if I am wrong 👍