Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

v1.6 is slower than v1.5

See original GitHub issue

🐛 Bug

1.6.0 is 5 times slower than 1.5.0

I used the same code and parameters on version 1.5.0 and version 1.6.0. 1.6.0 is 5 times slower than 1.5.0.

To Reproduce

1.6.0

> pip install pytorch-lightning==1.6.0
...
> python run.py
{'optimizers': {'lr': 0.001, 'weight_decay': 0}, 'train': {'milestones': [20, 40, 60, 80, 100, 120, 160, 180, 200, 220], 'gamma': 0.5, 'batch_size': 16}, 'general': {'save_dir': 'logs'}, 'trainer': {'gpus': [0], 'accelerator': 'gpu', 'max_epochs': 1, 'val_check_interval': 1, 'limit_train_batches': 0.01, 'limit_val_batches': 0.01, 'profiler': 'simple'}, 'data': {'cls_type': 'cat'}, 'exp_name': 'test'}
Global seed set to 1234
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1)` was configured so validation will run after every batch.
Missing logger folder: logs/lightning_logs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name      | Type               | Params
-------------------------------------------------
0 | model     | PvnetModelResnet18 | 13.0 M
1 | vote_crit | SmoothL1Loss       | 0     
2 | seg_crit  | CrossEntropyLoss   | 0     
-------------------------------------------------
13.0 M    Trainable params
0         Non-trainable params
13.0 M    Total params
51.831    Total estimated model params size (MB)
loading annotations into memory...
Done (t=1.30s)
creating index...
index created!
Epoch 0: 100%|██████████████████████████████████████████████████████| 12/12 [00:25<00:00,  2.11s/it, loss=0.861, v_num=0, train_ver_loss=0.205, train_seg_loss=0.656]
FIT Profiler Report

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Action                                                                                                                                                                                                            |  Mean duration (s)    |  Num calls            |  Total time (s)       |  Percentage %         |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  Total                                                                                                                                                                                                             |  -                    |  574                  |  29.691               |  100 %                |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|  run_training_epoch                                                                                                                                                                                                |  25.136               |  1                    |  25.136               |  84.66                |
|  run_training_batch                                                                                                                                                                                                |  1.7876               |  12                   |  21.451               |  72.247               |
|  [LightningModule]LitPvnet.optimizer_step                                                                                                                                                                          |  1.787                |  12                   |  21.444               |  72.225               |
|  [Strategy]SingleDeviceStrategy.backward                                                                                                                                                                           |  1.1394               |  12                   |  13.673               |  46.05                |
|  [Strategy]SingleDeviceStrategy.training_step                                                                                                                                                                      |  0.6367               |  12                   |  7.6404               |  25.733               |
|  [Callback]ModelCheckpoint{'monitor': None, 'mode': 'min', 'every_n_train_steps': 0, 'every_n_epochs': 1, 'train_time_interval': None, 'save_on_train_epoch_end': True}.on_train_epoch_end                         |  0.23877              |  1                    |  0.23877              |  0.80419              |
|  on_train_batch_end                                                                                                                                                                                                |  0.0010752            |  12                   |  0.012903             |  0.043458             |

1.5.0

> pip install pytorch-lightning==1.5.0
...
> python run.py
{'optimizers': {'lr': 0.001, 'weight_decay': 0}, 'train': {'milestones': [20, 40, 60, 80, 100, 120, 160, 180, 200, 220], 'gamma': 0.5, 'batch_size': 16}, 'general': {'save_dir': 'logs'}, 'trainer': {'gpus': [0], 'accelerator': 'gpu', 'max_epochs': 1, 'val_check_interval': 1, 'limit_train_batches': 0.01, 'limit_val_batches': 0.01, 'profiler': 'simple'}, 'data': {'cls_type': 'cat'}, 'exp_name': 'test'}
Global seed set to 1234
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name      | Type               | Params
-------------------------------------------------
0 | model     | PvnetModelResnet18 | 13.0 M
1 | vote_crit | SmoothL1Loss       | 0     
2 | seg_crit  | CrossEntropyLoss   | 0     
-------------------------------------------------
13.0 M    Trainable params
0         Non-trainable params
13.0 M    Total params
51.831    Total estimated model params size (MB)
loading annotations into memory...
Done (t=1.37s)
creating index...
index created!
Epoch 0: 100%|██████████████████████████████████████████████████████| 12/12 [00:09<00:00,  1.23it/s, loss=0.861, v_num=0, train_ver_loss=0.205, train_seg_loss=0.656]
FIT Profiler Report

Action                                  |  Mean duration (s)    |Num calls              |  Total time (s)       |  Percentage %         |
--------------------------------------------------------------------------------------------------------------------------------------
Total                                   |  -                    |_                      |  14.094               |  100 %                |
--------------------------------------------------------------------------------------------------------------------------------------
run_training_epoch                      |  9.7956               |1                      |  9.7956               |  69.5                 |
run_training_batch                      |  0.44689              |12                     |  5.3626               |  38.048               |
get_train_batch                         |  0.23755              |13                     |  3.0882               |  21.911               |
fetch_next_train_batch                  |  0.23753              |13                     |  3.0879               |  21.909               |
optimizer_step_with_closure_0           |  0.2286               |12                     |  2.7432               |  19.463               |
training_step_and_backward              |  0.2224               |12                     |  2.6688               |  18.935               |
model_forward                           |  0.18607              |12                     |  2.2329               |  15.842               |
training_step                           |  0.18589              |12                     |  2.2306               |  15.827               |
backward                                |  0.035633             |12                     |  0.4276               |  3.0338               |
on_train_epoch_end                      |  0.4268               |1                      |  0.4268               |  3.0282               |
on_train_batch_end                      |  0.0063143            |12                     |  0.075772             |  0.53761              |

Environment

PyTorch Version: 1.8.1+cu111
Python version: 3.8.13
OS : Ubuntu
CUDA/cuDNN version: 11.4
How you installed PyTorch (conda, pip, source): conda

cc @borda @akihironitta

Issue Analytics

State:
Created a year ago
Reactions:1
Comments:10 (4 by maintainers)

Top GitHub Comments

7reactions

carmoccacommented, May 25, 2022

Hi @piraka9011! Sorry for the frustration caused.

We did not foresee the problems that could be caused by setting this flag by default. This was implemented in https://github.com/PyTorchLightning/pytorch-lightning/pull/11944 and mentioned in the “Changed” section of the release.

I am leaning towards defaulting to what torch defaults to which is False, meaning, reverting this change.

Does anybody from @PyTorchLightning/core-lightning object to this? This “fix” can be done by reviving this PR: https://github.com/PyTorchLightning/pytorch-lightning/pull/12020

2reactions

akihironittacommented, May 24, 2022

Hi @piraka9011, thank you for the detailed information provided in the linked issue.

One possible cause is that we enabled Trainer(benchmark=True) by default from v1.6. It can lead to slower training if the input size varies frequently, and it may also lead to more memory consumption depending on the algorithms.

refs:

Could you try disabling it (Trainer(benchmark=False)) and check the performance again?

@flyinghu123 @piraka9011 Do you see the regression when running without GPUs, too?