Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reduce the import time of pytorch_lightning

See original GitHub issue

Proposed refactor

The current import time for the pytorch_lightning package on my machine is several seconds. There are some opportunities to improve this.

Motivation

High import times have an impact on the development and debugging speed.

Benchmark

I benchmarked the import time in two environments:

Fresh environment with pytorch lightning installed, no extras.
My currrent environment, with many extras installed such as loggers, horovod, etc.

To measure the import time, I created a simple file which only imports pytorch_lightning:

import pytorch_lightning as pl

Then I use the importtime package to measure the time and create a profile:

python -X importtime simple.py 2> import.log

Finally, I used tuna to visualize the profile:

tuna import.log

For the fresh environment, the total import time is <2 secs with the following profile:

>  pip freeze  | grep torch                                            (clean-pl-env) 

pytorch-lightning==1.6.1
torch==1.11.0
torchmetrics==0.7.3

For a full development environment, the total import time is >4 seconds:

The times vary a bit between multiple runs. However, I have observed that the time is consistently higher when running in an environment with extras installed. Looking at the profiles, it looks like the origin of a large waste of time is coming from our

pytorch_lightning.utilities.imports module where we evaluate some constants at import time:

https://github.com/PyTorchLightning/pytorch-lightning/blob/ae3226ced96e2bc7e62f298d532aaf2290e6ef34/pytorch_lightning/utilities/imports.py#L98-L124

It looks like if a 3rd party package is installed and takes a long time to import, this time gets added to our loading time as well, even if the package never ends up being used. This is because our _module_available and _package_available implementations attempt to import the modules to check their availability. This can be very costly.

Pitch

Evaluate the import checks lazily. Convert

_X_AVAILALBE = _module_available("x")

@lru_cache()
def _is_x_available() -> bool:
    return _module_available("x")

And investigate other opportunities to improve loading time given the above profile.

Additional context

If you enjoy Lightning, check out our other projects! ⚡

Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @justusschock @awaelchli @rohitgr7 @borda @akihironitta

Issue Analytics

State:
Created a year ago
Reactions:4
Comments:26 (25 by maintainers)

Top GitHub Comments

3reactions

Atharva-Phatakcommented, Apr 20, 2022

@awaelchli This makes sense. I will start working on this 😃

2reactions

carmoccacommented, Jul 1, 2022

@justusschock that’s basically the same as _RequirementAvailable. I suggested we could rename it to _RequirementCache to make it more explicit

Top Results From Across the Web

LightningModule - PyTorch Lightning - Read the Docs

Thus, to use Lightning, you just need to organize your code which takes about 30 minutes, (and let's be real, you probably should...

A short introduction to PyTorch Lightning | by ODSC

These will allow us to track the accuracy during training. Accuracy was imported from the torchmetrics module, which should be automatically installed with ......

Running test calculations in DDP mode with multiple GPUs ...

torch.distributed.reduce : This method collects and calculate tensors across distributed ... add code to do not overwrite file every time.

Modify a PyTorch Lightning Script - Amazon SageMaker

Learn how to modify a PyTorch Lightning training script to adapt the ... import pytorch_lightning as pl import smdistributed.dataparallel.torch.torch_smddp.

Use Pytorch Lightning to Decouple Science science code from ...

Pytorch lighting significantly reduces the boilerplate code by ... Similarly, when we use pytorch lightning, we import the class pl.