Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: Error msg during training - Timestamp must be non-decreasing for series attribute

See original GitHub issue

Describe the bug

When running Neptune logger in PytorchLightning with ddp > 1gpus. Then there are continuous errors reading Error occured during asynchronous operation processing. Timestamp must be non-decreasing for series attribute If the Neptune logger is offline, or if neptune logger is removed then this error isn’t logged. There are too many errors, and even the progress bar of the training is difficult to identify.

Reproduction

When i run with 4gpus, i was able to reproduce this. https://colab.research.google.com/drive/1TOadmpet63eSXz6LMHVvdM-D6Gy0LDxe?usp=sharing

Expected behavior

If this is a valid error message, then there is no hint of what actions needs to be taken. If they are harmless/not valid kindly suggest a way to suppress this print.

Traceback

Error occurred during asynchronous operation processing: Timestamp must be non-decreasing for series attribute: monitoring/stdout. Invalid point: 2021-10-15T13:25:02.767Z Error occurred during asynchronous operation processing: Timestamp must be non-decreasing for series attribute: monitoring/stdout. Invalid point: 2021-10-15T13:25:02.767Z

Environment

The output of pip list: PyTorch version: 1.9.0+cu111 Is debug build: False CUDA used to build PyTorch: 11.1 ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.2 LTS (x86_64) GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Clang version: Could not collect CMake version: version 3.21.3 Libc version: glibc-2.31

Python version: 3.8.11 (default, Aug 3 2021, 15:09:35) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.11.0-37-generic-x86_64-with-glibc2.17 Is CUDA available: True CUDA runtime version: Could not collect GPU models and configuration: GPU 0: RTX A6000 GPU 1: RTX A6000 GPU 2: RTX A6000 GPU 3: RTX A6000 GPU 4: RTX A6000 GPU 5: RTX A6000 GPU 6: RTX A6000 GPU 7: RTX A6000

Nvidia driver version: 460.91.03 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries:

Versions of relevant libraries: [pip3] mypy==0.910 [pip3] mypy-extensions==0.4.3 [pip3] neptune-pytorch-lightning==0.9.7 [pip3] numpy==1.21.2 [pip3] pytorch-lightning==1.4.9 [pip3] torch==1.9.0+cu111 [pip3] torch-poly-lr-decay==0.0.1 [pip3] torchaudio==0.9.0 [pip3] torchmetrics==0.4.1 [conda] blas 1.0 mkl [conda] cudatoolkit 11.1.74 h6bb024c_0 nvidia [conda] ffmpeg 4.3 hf484d3e_0 pytorch [conda] mkl 2021.3.0 h06a4308_520 [conda] mkl-service 2.4.0 py38h7f8727e_0 [conda] mkl_fft 1.3.0 py38h42c9631_2 [conda] mkl_random 1.2.2 py38h51133e4_0 [conda] mypy 0.910 pypi_0 pypi [conda] mypy-extensions 0.4.3 pypi_0 pypi [conda] neptune-client 0.12.0 pypi_0 pypi [conda] neptune-contrib 0.27.3 pypi_0 pypi [conda] neptune-pytorch-lightning 0.9.7 pypi_0 pypi [conda] numpy 1.21.1 pypi_0 pypi [conda] numpy-base 1.21.2 py38h79a1101_0 [conda] pytorch-lightning 1.4.9 pypi_0 pypi [conda] torch 1.9.0+cu111 pypi_0 pypi [conda] torch-poly-lr-decay 0.0.1 pypi_0 pypi [conda] torchaudio 0.9.0 pypi_0 pypi [conda] torchmetrics 0.4.1 pypi_0 pypi

Additional context

Issue Analytics

State:
Created 2 years ago
Comments:11 (7 by maintainers)

Top GitHub Comments

1reaction

stonelazycommented, Nov 8, 2021

Hi @kamil-kaczmarek am now not having any issues with the suggested workaround.

0reactions

Blaizzycommented, Dec 24, 2021

Hi @stonelazy,

Prince Canuma here, a Data Scientist at Neptune.ai,

I want to personally inform you of the good news! This issue is now fixed on the latest release of PyTorch-Lightning v1.5.7 🎊 🥳

All you need to do is upgrade the library to the latest release👍

Happy Christmas and a prosperous New Year in advance!

Top Results From Across the Web

Reading 13: Abstraction Functions & Rep Invariants

Objectives. Today's reading introduces several ideas: invariants; representation exposure; abstraction functions; representation invariants. In this reading ...

Common parameters - CatBoost

Build the number of trees defined by the training parameters. Use the validation dataset to identify the iteration with the optimal value of...

lubridate: Make Dealing with Dates a Little Easier

as.interval changes difftime, Duration, Period and numeric class objects to intervals that begin at the specified date-time. Numeric objects are first coerced ...

Bug listing with status RESOLVED with ... - Gentoo's Bugzilla

Bug listing with status RESOLVED with resolution WORKSFORME as at 2022/12/22 12:46:30 · Bug:1262 - "manual ./configure" status:RESOLVED resolution:WORKSFORME ...

bpf-helpers(7) - Linux manual page - man7.org

Return The number of bytes written to the buffer, or a negative error in case of ... This perf event must have the...