question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ClearrMLLogger freezes logging in distributed env

See original GitHub issue

🐛 Bug description

Instantiating a ClearMLLoggerinstance in a distributed training freezes the logging of the subprocesses. The bug can be reproduced by executing the following script: https://github.com/H4dr1en/trains/commit/642c1130ad1f76db10ed9b8e1a4ff0fd7e45b3cc

I will open a PR to fix it.

Environment

  • PyTorch Version (e.g., 1.7.1):
  • Ignite Version (e.g., 0.5.0):
  • OS (e.g., Linux): linux
  • How you installed Ignite (conda, pip, source): pip/source
  • Python version: 3.8.5
  • Any other relevant information:

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
bmartinncommented, Dec 11, 2021

@H4dr1en you are correct 😄 The first “main process” creates the actual Task and sets an OS environment variable specifying the newly created Task. When the child processes call the “same” Task.init they inherit the environment variable from the main process, so it knows to connect with the main Task and log everything back to it, instead of creating a new Task, specifically here, I think the issue is that the second Task.init call should have the same project/experiment name, which it does not have, hence the issue.

I can verify I was able to reproduce the issue, and I can also verify the PR solved it 🎉 🎊

1reaction
H4dr1encommented, Dec 10, 2021

@vfdev-5 To the best of my knowledge, it happens like that:

  • The main process creates the task, the first time it calls that Task.init function
  • The subprocesses, when calling Task.init, will contact the main process to retrieve the task. They don’t recreate a new task, but get a copy of the task created by the main process.
  • Any logging done on the subprocesses will be tunnelled to the main process, that will take care of logging it to the clearml-server

@bmartinn please correct me if I am wrong 👍

Read more comments on GitHub >

github_iconTop Results From Across the Web

Logger | ClearML
The Logger class is the ClearML console log and metric statistics interface, and contains methods for explicit reporting. Explicit reporting extends ClearML ......
Read more >
fileio — mmcv 1.7.0 documentation - Read the Docs
logging.Logger. mmcv.utils.has_method(obj: object, method: str) → bool[source] ... in a given interval when performing in distributed environment.
Read more >
Distributed training with CPU's - ignite - PyTorch Forums
I tried using ignite.distributed with the gloo backend, but when … ... with_clearml (bool): if True, experiment ClearML logger is setup.
Read more >
clearml Changelog - pyup.io
Fix `clearml` logger default level cannot be changed (741) ... Log environment variables starting with `*` in `environ_bind.py` (459) - Pipeline
Read more >
elasticsearch: docs/reference/release-notes.asciidoc - Fossies
The {es} audit log could contain sensitive information, such as password hashes ... SQL: passing an input to the CLI "freezes" the CLI...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found