question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

feature request: allow setting the current step without calling `logger.report_*`

See original GitHub issue

Currently, automatic reporting (e.g. CPU / GPU / memory utilization) relies on the last reported step when logger.report_* was called. If that doesn’t happen for some time, it falls back to using milliseconds since start for the x-axis: https://github.com/allegroai/clearml/blob/5a9155b2039413280f13dfded1121470c4c4323d/clearml/utilities/resource_monitor.py#L110-L111

This is problematic in a few scenarios:

  1. the user doesn’t want to artifically create dummy reports and the first desired report (if any) would take longer than the time-out
  2. the worker has died and gets restarted. This is affected by #439.

In both cases, it would be nice to be able to do something like

clearml_task.set_iteration(current_iteration)

If the worker resumes and recovers (externally, maybe from a checkpoint on disk) from iteration 433, we call

clearml_task.set_iteration(433)

Or in the default use-case:

for iteration in range(max_steps):
    clearml_task.set_iteration(iteration)
    # do some work
    loss = train_step()
    if iteration % 1000:  # very infrequent reporting
        clearml_task.report_scalar(loss, "loss", iteration)

Links:

  • automatic logging is configured through Task.init

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
bmartinncommented, Aug 31, 2021

Regarding contribution: could you point me to where I should start looking and contribution instructions.

https://github.com/allegroai/clearml/blob/master/docs/contributing.md

I can only work on client-side code.

Client SDK will be great! (server-side is definitely more complicated to PR to 😃

0reactions
patzmcommented, Aug 30, 2021

It is slightly different. If we follow through with your proposal in https://github.com/allegroai/clearml/issues/439#issuecomment-907882223 (either (1) or (2)), then I think we would implement two methods:

  1. set_last_checkpoint
  2. set_iteration: this is like reporting without a reporting payload.

And maybe also getters for the two.

Regarding contribution: could you point me to where I should start looking and contribution instructions. Ideally I can only work on client-side code. I think setting up the server backend as well would be too much. I don’t have much experience with it and also not a lot of time 😉.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Enabling debug logging - GitHub Docs
To enable runner diagnostic logging, set the following secret in the repository that contains the workflow: ACTIONS_RUNNER_DEBUG to true . To download runner ......
Read more >
Apache log4j 1.2 - Short introduction to log4j
Log4j allows logging requests to print to multiple destinations. In log4j speak, an output destination is called an appender. Currently, appenders exist for ......
Read more >
Error handling in Step Functions - AWS Documentation
A non-empty array of strings that match error names. When a state reports an error, Step Functions scans through the retriers. When the...
Read more >
Solving Your Logging Problems with Logback - Stackify
If no custom configuration is defined, Logback provides a simple, automatic configuration on its own. By default, this ensures that log ...
Read more >
Issue tracking system and product feature requests
Discover how to use Google Cloud's issue tracking system to report issues, submit and vote for product feature requests from the issue tracker...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found