BUG: Error msg during training - Timestamp must be non-decreasing for series attribute
See original GitHub issueDescribe the bug
When running Neptune logger in PytorchLightning with ddp > 1gpus.
Then there are continuous errors reading Error occured during asynchronous operation processing. Timestamp must be non-decreasing for series attribute
If the Neptune logger is offline, or if neptune logger is removed then this error isn’t logged.
There are too many errors, and even the progress bar of the training is difficult to identify.
Reproduction
When i run with 4gpus, i was able to reproduce this. https://colab.research.google.com/drive/1TOadmpet63eSXz6LMHVvdM-D6Gy0LDxe?usp=sharing
Expected behavior
If this is a valid error message, then there is no hint of what actions needs to be taken. If they are harmless/not valid kindly suggest a way to suppress this print.
Traceback
Error occurred during asynchronous operation processing: Timestamp must be non-decreasing for series attribute: monitoring/stdout. Invalid point: 2021-10-15T13:25:02.767Z Error occurred during asynchronous operation processing: Timestamp must be non-decreasing for series attribute: monitoring/stdout. Invalid point: 2021-10-15T13:25:02.767Z
Environment
The output of pip list
:
PyTorch version: 1.9.0+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.2 LTS (x86_64) GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Clang version: Could not collect CMake version: version 3.21.3 Libc version: glibc-2.31
Python version: 3.8.11 (default, Aug 3 2021, 15:09:35) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.11.0-37-generic-x86_64-with-glibc2.17 Is CUDA available: True CUDA runtime version: Could not collect GPU models and configuration: GPU 0: RTX A6000 GPU 1: RTX A6000 GPU 2: RTX A6000 GPU 3: RTX A6000 GPU 4: RTX A6000 GPU 5: RTX A6000 GPU 6: RTX A6000 GPU 7: RTX A6000
Nvidia driver version: 460.91.03 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A
Versions of relevant libraries:
Versions of relevant libraries: [pip3] mypy==0.910 [pip3] mypy-extensions==0.4.3 [pip3] neptune-pytorch-lightning==0.9.7 [pip3] numpy==1.21.2 [pip3] pytorch-lightning==1.4.9 [pip3] torch==1.9.0+cu111 [pip3] torch-poly-lr-decay==0.0.1 [pip3] torchaudio==0.9.0 [pip3] torchmetrics==0.4.1 [conda] blas 1.0 mkl [conda] cudatoolkit 11.1.74 h6bb024c_0 nvidia [conda] ffmpeg 4.3 hf484d3e_0 pytorch [conda] mkl 2021.3.0 h06a4308_520 [conda] mkl-service 2.4.0 py38h7f8727e_0 [conda] mkl_fft 1.3.0 py38h42c9631_2 [conda] mkl_random 1.2.2 py38h51133e4_0 [conda] mypy 0.910 pypi_0 pypi [conda] mypy-extensions 0.4.3 pypi_0 pypi [conda] neptune-client 0.12.0 pypi_0 pypi [conda] neptune-contrib 0.27.3 pypi_0 pypi [conda] neptune-pytorch-lightning 0.9.7 pypi_0 pypi [conda] numpy 1.21.1 pypi_0 pypi [conda] numpy-base 1.21.2 py38h79a1101_0 [conda] pytorch-lightning 1.4.9 pypi_0 pypi [conda] torch 1.9.0+cu111 pypi_0 pypi [conda] torch-poly-lr-decay 0.0.1 pypi_0 pypi [conda] torchaudio 0.9.0 pypi_0 pypi [conda] torchmetrics 0.4.1 pypi_0 pypi
Additional context
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (7 by maintainers)
Top GitHub Comments
Hi @kamil-kaczmarek am now not having any issues with the suggested workaround.
Hi @stonelazy,
Prince Canuma here, a Data Scientist at Neptune.ai,
I want to personally inform you of the good news! This issue is now fixed on the latest release of PyTorch-Lightning v1.5.7 🎊 🥳
All you need to do is upgrade the library to the latest release👍
Happy Christmas and a prosperous New Year in advance!