feature request: allow setting the current step without calling `logger.report_*`
See original GitHub issueCurrently, automatic reporting (e.g. CPU / GPU / memory utilization) relies on the last reported step when logger.report_*
was called. If that doesn’t happen for some time, it falls back to using milliseconds since start for the x-axis: https://github.com/allegroai/clearml/blob/5a9155b2039413280f13dfded1121470c4c4323d/clearml/utilities/resource_monitor.py#L110-L111
This is problematic in a few scenarios:
- the user doesn’t want to artifically create dummy reports and the first desired report (if any) would take longer than the time-out
- the worker has died and gets restarted. This is affected by #439.
In both cases, it would be nice to be able to do something like
clearml_task.set_iteration(current_iteration)
If the worker resumes and recovers (externally, maybe from a checkpoint on disk) from iteration 433
, we call
clearml_task.set_iteration(433)
Or in the default use-case:
for iteration in range(max_steps):
clearml_task.set_iteration(iteration)
# do some work
loss = train_step()
if iteration % 1000: # very infrequent reporting
clearml_task.report_scalar(loss, "loss", iteration)
Links:
- automatic logging is configured through
Task.init
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Enabling debug logging - GitHub Docs
To enable runner diagnostic logging, set the following secret in the repository that contains the workflow: ACTIONS_RUNNER_DEBUG to true . To download runner ......
Read more >Apache log4j 1.2 - Short introduction to log4j
Log4j allows logging requests to print to multiple destinations. In log4j speak, an output destination is called an appender. Currently, appenders exist for ......
Read more >Error handling in Step Functions - AWS Documentation
A non-empty array of strings that match error names. When a state reports an error, Step Functions scans through the retriers. When the...
Read more >Solving Your Logging Problems with Logback - Stackify
If no custom configuration is defined, Logback provides a simple, automatic configuration on its own. By default, this ensures that log ...
Read more >Issue tracking system and product feature requests
Discover how to use Google Cloud's issue tracking system to report issues, submit and vote for product feature requests from the issue tracker...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
https://github.com/allegroai/clearml/blob/master/docs/contributing.md
Client SDK will be great! (server-side is definitely more complicated to PR to 😃
It is slightly different. If we follow through with your proposal in https://github.com/allegroai/clearml/issues/439#issuecomment-907882223 (either (1) or (2)), then I think we would implement two methods:
set_last_checkpoint
set_iteration
: this is like reporting without a reporting payload.And maybe also getters for the two.
Regarding contribution: could you point me to where I should start looking and contribution instructions. Ideally I can only work on client-side code. I think setting up the server backend as well would be too much. I don’t have much experience with it and also not a lot of time 😉.