question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

API usage logging within TorchVision

See original GitHub issue

Goal

To understand TorchVision usage within an organization(e.g. Meta).

The events give insights into torchvision usage with regards to individual callsites, workflows etc. The organization could also learn the trending APIs, which could be used to guide component development/deprecation etc.

Policy

  • Usage should be recorded only once for the same API within a process;
  • We should record events as broadly as possible, duplicated events(e.g. module and function log the same thing) is OK and can be dedup in downstream pipelines.
  • For modules, API usage should be recorded at the beginning of constructor of the main class. For example __init__ of RegNet, but not on ones of submodules(e.g. ResBottleneckBlock)
  • For functions, API usage should be recorded at the beginning of the method;
  • For torchvision.io, the logging must be added both on the Python and the C++ (using the csrc submodule as mentioned) side.
  • On torchvision.ops, the calls should be added both on the main class of the operator (eg StochasticDepth) and on its functional equivalent (eg stochastic_depth) if available.
  • On torchvision.transforms, the calls should be placed on the constructors of the Transform classes, the Auto-Augment classes and the functional methods.
  • On torchvision.datasets, the calls are placed once on the constructor of VisionDataset so we don’t need to add them individually on each dataset.
  • On torchvision.utils, call should be added to the top of each public method.

Event Format

Full qualified name of the component is logged as the event. For example: torchvision.models.resnet.ResNet Note: for events from C++ APIs, “.csrc” should be added after torchvision, for example: torchvision.csrc.ops.nms.nms

Usage Log API

  • C++: C10_LOG_API_USAGE_ONCE()
  • Python:
from ..utils import _log_api_usage_once
# for class
_log_api_usage_once(self)
# for method
if not torch.jit.is_scripting() and not torch.jit.is_tracing():
  _log_api_usage_once(nms) 

Above APIs are lightweight. By default, they are just no-op. It’s guaranteed that the same event is only recorded once within a process. Please note that 8 GPUs will still lead to 8 events, so the events should be dedup by a unique identifier like workflow job_id.

Implementation

def _log_api_usage_once(obj: Any) -> None:
    if not obj.__module__.startswith("torchvision"):
        return
    name = obj.__class__.__name__
    if isinstance(obj, FunctionType):
        name = obj.__name__
    torch._C._log_api_usage_once(f"{obj.__module__}.{name}")

Also considered

  • log usage in base class
    • Create a base class for all models, datasets, transforms and log usage in the init of base class
    • Introducing extra abstraction only for logging seems overkill. In #4569, we couldn’t find any other features to be added to model base class; In addition, we also need a way to log non-class usage;
  • use decorator
    • For example: @log_api_usage in #4976
    • doesn’t work with TorchScript since decorator needs to use kwargs, which is not supported in TorchScript;
  • use function’s __module__:
    • For example: _log_api_usage(nms.__module__, "nms")
    • doesn’t work with TorchScript: attribute lookup is not defined on function
  • use global constant for module
    • For example: _log_api_usage(MODULE, "nms")
    • doesn’t work with TorchScript;
  • use flat namespace
    • For example: log events as “torchvision.{models|transforms|datasets}.{class or function name}”
    • there might be name collisions;
  • use object or function as param in logging API
    • For example: _log_api_usage_once(self)
    • doesn’t work with TorchScript;
  • log fully qualified name with __qualname__ for class, string for function
    • For example: #5096
    • too cumbersome for functions

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:2
  • Comments:19 (19 by maintainers)

github_iconTop GitHub Comments

1reaction
datumboxcommented, Dec 21, 2021

@kazhang Thanks for the changes. Given that on the #5007 we didn’t add calls for transforms such as RandomSizedCrop that use inheritance, I would be OK omitting the extra calls on Quantization and Segmentation. If we do add them, we need them also on transforms that inherit. No strong opinions, but I’m leaning towards omitting them, thoughts?

1reaction
datumboxcommented, Dec 14, 2021

It would be very nice to be able to have simple API as the one you proposed below:

_log_api_usage_once(self)  # for class
_log_api_usage_once(nms) # for methods

I modified slightly your code at #5095:

def _log_api_usage_once(obj):
    name = obj.__class__.__name__
    if name == "function":
        name = obj.__name__
    torch._C._log_api_usage_once(f"{obj.__module__}.{name}")



def box_area(boxes: Tensor) -> Tensor:
    if not torch.jit.is_scripting() and not torch.jit.is_tracing():
        _log_api_usage_once(box_area)
    # ...

JIT tests seem to pass.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to use Loggers | PyTorch-Ignite
This how-to guide demonstrates the usage of loggers with Ignite. As part of this guide, we will be using the ClearML logger and...
Read more >
Features for large-scale deployments - PyTorch
API usage logging. When running in a broader ecosystem, for example in managed job scheduler, it's often useful to track which binaries invoke...
Read more >
Production - PyTorch Lightning
Newest PyTorch Lightning release includes the final API with better data decoupling, shorter logging syntax and tons of bug fixes.
Read more >
mlflow.pytorch — MLflow 2.0.1 documentation
The mlflow.pytorch module provides an API for logging and loading PyTorch models. This module exports PyTorch models with the following flavors:.
Read more >
torchvision - PyPI
We don't officially support building from source using pip, but if you do, you'll need to use the --no-build-isolation flag. In case building...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found