Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

API usage logging within TorchVision

See original GitHub issue

Goal

To understand TorchVision usage within an organization(e.g. Meta).

The events give insights into torchvision usage with regards to individual callsites, workflows etc. The organization could also learn the trending APIs, which could be used to guide component development/deprecation etc.

Policy

Usage should be recorded only once for the same API within a process;
We should record events as broadly as possible, duplicated events(e.g. module and function log the same thing) is OK and can be dedup in downstream pipelines.
For modules, API usage should be recorded at the beginning of constructor of the main class. For example __init__ of RegNet, but not on ones of submodules(e.g. ResBottleneckBlock)
For functions, API usage should be recorded at the beginning of the method;
For torchvision.io, the logging must be added both on the Python and the C++ (using the csrc submodule as mentioned) side.
On torchvision.ops, the calls should be added both on the main class of the operator (eg StochasticDepth) and on its functional equivalent (eg stochastic_depth) if available.
On torchvision.transforms, the calls should be placed on the constructors of the Transform classes, the Auto-Augment classes and the functional methods.
On torchvision.datasets, the calls are placed once on the constructor of VisionDataset so we don’t need to add them individually on each dataset.
On torchvision.utils, call should be added to the top of each public method.

Event Format

Full qualified name of the component is logged as the event. For example: torchvision.models.resnet.ResNet Note: for events from C++ APIs, “.csrc” should be added after torchvision, for example: torchvision.csrc.ops.nms.nms

Usage Log API

C++: C10_LOG_API_USAGE_ONCE()
Python:

from ..utils import _log_api_usage_once
# for class
_log_api_usage_once(self)
# for method
if not torch.jit.is_scripting() and not torch.jit.is_tracing():
  _log_api_usage_once(nms)

Above APIs are lightweight. By default, they are just no-op. It’s guaranteed that the same event is only recorded once within a process. Please note that 8 GPUs will still lead to 8 events, so the events should be dedup by a unique identifier like workflow job_id.

Implementation

def _log_api_usage_once(obj: Any) -> None:
    if not obj.__module__.startswith("torchvision"):
        return
    name = obj.__class__.__name__
    if isinstance(obj, FunctionType):
        name = obj.__name__
    torch._C._log_api_usage_once(f"{obj.__module__}.{name}")

Also considered

log usage in base class
- Create a base class for all models, datasets, transforms and log usage in the init of base class
- Introducing extra abstraction only for logging seems overkill. In #4569, we couldn’t find any other features to be added to model base class; In addition, we also need a way to log non-class usage;
use decorator
- For example: @log_api_usage in #4976
- doesn’t work with TorchScript since decorator needs to use kwargs, which is not supported in TorchScript;
use function’s __module__:
- For example: _log_api_usage(nms.__module__, "nms")
- doesn’t work with TorchScript: attribute lookup is not defined on function
use global constant for module
- For example: _log_api_usage(MODULE, "nms")
- doesn’t work with TorchScript;
use flat namespace
- For example: log events as “torchvision.{models|transforms|datasets}.{class or function name}”
- there might be name collisions;
use object or function as param in logging API
- For example: _log_api_usage_once(self)
- doesn’t work with TorchScript;
log fully qualified name with __qualname__ for class, string for function
- For example: #5096
- too cumbersome for functions

Issue Analytics

State:
Created 2 years ago
Reactions:2
Comments:19 (19 by maintainers)

Top GitHub Comments

1reaction

datumboxcommented, Dec 21, 2021

@kazhang Thanks for the changes. Given that on the #5007 we didn’t add calls for transforms such as RandomSizedCrop that use inheritance, I would be OK omitting the extra calls on Quantization and Segmentation. If we do add them, we need them also on transforms that inherit. No strong opinions, but I’m leaning towards omitting them, thoughts?

1reaction

datumboxcommented, Dec 14, 2021

It would be very nice to be able to have simple API as the one you proposed below:

_log_api_usage_once(self)  # for class
_log_api_usage_once(nms) # for methods

I modified slightly your code at #5095:

def _log_api_usage_once(obj):
    name = obj.__class__.__name__
    if name == "function":
        name = obj.__name__
    torch._C._log_api_usage_once(f"{obj.__module__}.{name}")



def box_area(boxes: Tensor) -> Tensor:
    if not torch.jit.is_scripting() and not torch.jit.is_tracing():
        _log_api_usage_once(box_area)
    # ...

JIT tests seem to pass.