Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

feature request: allow setting the current step without calling `logger.report_*`

See original GitHub issue

Currently, automatic reporting (e.g. CPU / GPU / memory utilization) relies on the last reported step when logger.report_* was called. If that doesn’t happen for some time, it falls back to using milliseconds since start for the x-axis: https://github.com/allegroai/clearml/blob/5a9155b2039413280f13dfded1121470c4c4323d/clearml/utilities/resource_monitor.py#L110-L111

This is problematic in a few scenarios:

the user doesn’t want to artifically create dummy reports and the first desired report (if any) would take longer than the time-out
the worker has died and gets restarted. This is affected by #439.

In both cases, it would be nice to be able to do something like

clearml_task.set_iteration(current_iteration)

If the worker resumes and recovers (externally, maybe from a checkpoint on disk) from iteration 433, we call

clearml_task.set_iteration(433)

Or in the default use-case:

for iteration in range(max_steps):
    clearml_task.set_iteration(iteration)
    # do some work
    loss = train_step()
    if iteration % 1000:  # very infrequent reporting
        clearml_task.report_scalar(loss, "loss", iteration)

Links:

automatic logging is configured through Task.init

Issue Analytics

State:
Created 2 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

bmartinncommented, Aug 31, 2021

Regarding contribution: could you point me to where I should start looking and contribution instructions.

https://github.com/allegroai/clearml/blob/master/docs/contributing.md

I can only work on client-side code.

Client SDK will be great! (server-side is definitely more complicated to PR to 😃

0reactions

patzmcommented, Aug 30, 2021

It is slightly different. If we follow through with your proposal in https://github.com/allegroai/clearml/issues/439#issuecomment-907882223 (either (1) or (2)), then I think we would implement two methods:

set_last_checkpoint
set_iteration: this is like reporting without a reporting payload.

And maybe also getters for the two.

Regarding contribution: could you point me to where I should start looking and contribution instructions. Ideally I can only work on client-side code. I think setting up the server backend as well would be too much. I don’t have much experience with it and also not a lot of time 😉.

Top Results From Across the Web

Enabling debug logging - GitHub Docs

To enable runner diagnostic logging, set the following secret in the repository that contains the workflow: ACTIONS_RUNNER_DEBUG to true . To download runner ......

Apache log4j 1.2 - Short introduction to log4j

Log4j allows logging requests to print to multiple destinations. In log4j speak, an output destination is called an appender. Currently, appenders exist for ......

Error handling in Step Functions - AWS Documentation

A non-empty array of strings that match error names. When a state reports an error, Step Functions scans through the retriers. When the...

Solving Your Logging Problems with Logback - Stackify

If no custom configuration is defined, Logback provides a simple, automatic configuration on its own. By default, this ensures that log ...

Issue tracking system and product feature requests

Discover how to use Google Cloud's issue tracking system to report issues, submit and vote for product feature requests from the issue tracker...