question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: Error msg during training - Timestamp must be non-decreasing for series attribute

See original GitHub issue

Describe the bug

When running Neptune logger in PytorchLightning with ddp > 1gpus. Then there are continuous errors reading Error occured during asynchronous operation processing. Timestamp must be non-decreasing for series attribute If the Neptune logger is offline, or if neptune logger is removed then this error isn’t logged. There are too many errors, and even the progress bar of the training is difficult to identify.

Reproduction

When i run with 4gpus, i was able to reproduce this. https://colab.research.google.com/drive/1TOadmpet63eSXz6LMHVvdM-D6Gy0LDxe?usp=sharing

Expected behavior

If this is a valid error message, then there is no hint of what actions needs to be taken. If they are harmless/not valid kindly suggest a way to suppress this print.

Traceback

Error occurred during asynchronous operation processing: Timestamp must be non-decreasing for series attribute: monitoring/stdout. Invalid point: 2021-10-15T13:25:02.767Z Error occurred during asynchronous operation processing: Timestamp must be non-decreasing for series attribute: monitoring/stdout. Invalid point: 2021-10-15T13:25:02.767Z

Environment

The output of pip list: PyTorch version: 1.9.0+cu111 Is debug build: False CUDA used to build PyTorch: 11.1 ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.2 LTS (x86_64) GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Clang version: Could not collect CMake version: version 3.21.3 Libc version: glibc-2.31

Python version: 3.8.11 (default, Aug 3 2021, 15:09:35) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.11.0-37-generic-x86_64-with-glibc2.17 Is CUDA available: True CUDA runtime version: Could not collect GPU models and configuration: GPU 0: RTX A6000 GPU 1: RTX A6000 GPU 2: RTX A6000 GPU 3: RTX A6000 GPU 4: RTX A6000 GPU 5: RTX A6000 GPU 6: RTX A6000 GPU 7: RTX A6000

Nvidia driver version: 460.91.03 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A

Versions of relevant libraries:

Versions of relevant libraries: [pip3] mypy==0.910 [pip3] mypy-extensions==0.4.3 [pip3] neptune-pytorch-lightning==0.9.7 [pip3] numpy==1.21.2 [pip3] pytorch-lightning==1.4.9 [pip3] torch==1.9.0+cu111 [pip3] torch-poly-lr-decay==0.0.1 [pip3] torchaudio==0.9.0 [pip3] torchmetrics==0.4.1 [conda] blas 1.0 mkl [conda] cudatoolkit 11.1.74 h6bb024c_0 nvidia [conda] ffmpeg 4.3 hf484d3e_0 pytorch [conda] mkl 2021.3.0 h06a4308_520 [conda] mkl-service 2.4.0 py38h7f8727e_0 [conda] mkl_fft 1.3.0 py38h42c9631_2 [conda] mkl_random 1.2.2 py38h51133e4_0 [conda] mypy 0.910 pypi_0 pypi [conda] mypy-extensions 0.4.3 pypi_0 pypi [conda] neptune-client 0.12.0 pypi_0 pypi [conda] neptune-contrib 0.27.3 pypi_0 pypi [conda] neptune-pytorch-lightning 0.9.7 pypi_0 pypi [conda] numpy 1.21.1 pypi_0 pypi [conda] numpy-base 1.21.2 py38h79a1101_0 [conda] pytorch-lightning 1.4.9 pypi_0 pypi [conda] torch 1.9.0+cu111 pypi_0 pypi [conda] torch-poly-lr-decay 0.0.1 pypi_0 pypi [conda] torchaudio 0.9.0 pypi_0 pypi [conda] torchmetrics 0.4.1 pypi_0 pypi

Additional context

image

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:11 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
stonelazycommented, Nov 8, 2021

Hi @kamil-kaczmarek am now not having any issues with the suggested workaround.

0reactions
Blaizzycommented, Dec 24, 2021

Hi @stonelazy,

Prince Canuma here, a Data Scientist at Neptune.ai,

I want to personally inform you of the good news! This issue is now fixed on the latest release of PyTorch-Lightning v1.5.7 🎊 🥳

All you need to do is upgrade the library to the latest release👍

Happy Christmas and a prosperous New Year in advance!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Reading 13: Abstraction Functions & Rep Invariants
Objectives. Today's reading introduces several ideas: invariants; representation exposure; abstraction functions; representation invariants. In this reading ...
Read more >
Common parameters - CatBoost
Build the number of trees defined by the training parameters. Use the validation dataset to identify the iteration with the optimal value of...
Read more >
lubridate: Make Dealing with Dates a Little Easier
as.interval changes difftime, Duration, Period and numeric class objects to intervals that begin at the specified date-time. Numeric objects are first coerced ...
Read more >
Bug listing with status RESOLVED with ... - Gentoo's Bugzilla
Bug listing with status RESOLVED with resolution WORKSFORME as at 2022/12/22 12:46:30 · Bug:1262 - "manual ./configure" status:RESOLVED resolution:WORKSFORME ...
Read more >
bpf-helpers(7) - Linux manual page - man7.org
Return The number of bytes written to the buffer, or a negative error in case of ... This perf event must have the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found