question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. ItΒ collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BUG: Training gets killed due to Neptune

See original GitHub issue

Describe the bug

Training was ongoing in PytorchLightning and all of a sudden it has crashed with traces pointing to error being thrown from Neptune.

Reproduction

Couldn’t reproduce

Expected behavior

Training is supposed to continue, without crashing the experiment no matter what the issue is.

Traceback

Epoch 21   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 26600/-- 1:09:03 β€’ -:--:-- 17.04it/s loss: 0.1 v_num: -200 val_loss: 0.09
Validation ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1600/--  0:02:47 β€’ -:--:-- 17.04it/s loss: 0.1 v_num: -200 val_loss: 0.09 Epoch 21, global step 549999: val_track_loss reached 0.08127 (best 0.08127), saving model to "/home/kp/experiment_logs/vad/ruEpoch 22   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 26600/-- 1:08:53 β€’ -:--:-- 17.31it/s loss: 0.101 v_num: -200 val_loss: 0.09
Validation ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1600/--  0:02:37 β€’ -:--:-- 17.31it/s loss: 0.101 v_num: -200 val_loss: 0.09Epoch 22, global step 574999: val_track_loss reached 0.08107 (best 0.08107), saving model to "/home/kp/experiment_logs/vad/ruEpoch 23   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 26600/-- 1:07:59 β€’ -:--:-- 17.34it/s loss: 0.104 v_num: -200 val_loss: 0.089
Validation ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1600/--  0:02:27 β€’ -:--:-- 17.34it/s loss: 0.104 v_num: -200 val_loss: 0.089/home/kp/Remote/zspeech/zspeech/utils/training_utils.py:308: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
  figure = plt.figure(figsize=(8, 8))
/home/kp/Remote/zspeech/zspeech/utils/training_utils.py:308: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
  figure = plt.figure(figsize=(8, 8))
/home/kp/Remote/zspeech/zspeech/utils/training_utils.py:308: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
Epoch 23   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 26600/-- 1:07:59 β€’ -:--:-- 17.34it/s loss: 0.104 v_num: -200 val_loss: 0.089
Validation ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1600/--  0:02:27 β€’ -:--:-- 17.34it/s loss: 0.104 v_num: -200 val_loss: 0.089Epoch 23, global step 599999: val_track_loss reached 0.08095 (best 0.08095), saving model to "/home/kp/experiment_logs/vad/ruEpoch 24   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 26600/-- 1:08:03 β€’ -:--:-- 17.11it/s loss: 0.0989 v_num: -200 val_loss:
                                                                                       0.089
Validation ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1600/--  0:02:26 β€’ -:--:-- 17.11it/s loss: 0.0989 v_num: -200 val_loss:
Epoch 25   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 26448/-- 1:08:08 β€’ -:--:-- 17.19it/s loss: 0.0979 v_num: -200 val_loss:
                                                                                       0.089
Validation ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1448/--  0:02:16 β€’ -:--:-- 17.19it/s loss: 0.0979 v_num: -200 val_loss:
                                                                                       0.089                                 Unexpected error occurred in Neptune background thread: Killing Neptune asynchronous thread. All data is safe on disk and can be later synced manually using `neptune sync` command.
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/backends/hosted_neptune_backend.py", line 473, in _execute_operations
    result = self.leaderboard_client.api.executeOperations(**kwargs).response().result
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 200, in response
    swagger_result = self._get_swagger_result(incoming_response)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 124, in wrapper
    return func(self, *args, **kwargs)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 300, in _get_swagger_result
    unmarshal_response(
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 353, in unmarshal_response
    raise_on_expected(incoming_response)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 420, in raise_on_expected
    raise make_http_exception(
bravado.exception.HTTPNotFound: 404 Not Found

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/threading/daemon.py", line 54, in run
    self.work()
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/operation_processors/async_operation_processor.py", line 177, in work
    self.process_batch(batch, version)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/threading/daemon.py", line 78, in wrapper
    result = func(self_, *args, **kwargs)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/operation_processors/async_operation_processor.py", line 187, in process_batch
    result = self._processor._backend.execute_operations(self._processor._run_id, batch)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/backends/hosted_neptune_backend.py", line 363, in execute_operations
    errors.extend(self._execute_operations(run_id, other_operations))
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/backends/utils.py", line 71, in wrapper
    return func(*args, **kwargs)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/backends/hosted_neptune_backend.pUnexpected error occurred in Neptune background thread: Killing Neptune asynchronous thread. All data is safe on disk and can
be later synced manually using `neptune sync` command.
Exception in thread Thread-1:
Traceback (most recent call last):
  File
"/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/backends/hosted_neptune_backend.py",
line 473, in _execute_operations
    result = self.leaderboard_client.api.executeOperations(**kwargs).response().result
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 200, in response
    swagger_result = self._get_swagger_result(incoming_response)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 124, in wrapper
    return func(self, *args, **kwargs)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 300, in
_get_swagger_result
    unmarshal_response(
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 353, in
unmarshal_response
    raise_on_expected(incoming_response)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 420, in
raise_on_expected
    raise make_http_exception(
bravado.exception.HTTPNotFound: 404 Not Found

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/threading/daemon.py", line 54, in
run
    self.work()
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/operation_processors/async_operat
ion_processor.py", line 177, in work
    self.process_batch(batch, version)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/threading/daemon.py", line 78, in
wrapper
    result = func(self_, *args, **kwargs)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/operation_processors/async_operat
ion_processor.py", line 187, in process_batch
    result = self._processor._backend.execute_operations(self._processor._run_id, batch)
  File
"/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/backends/hosted_neptune_backend.py",
line 363, in execute_operations
    errors.extend(self._execute_operations(run_id, other_operations))
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/backends/utils.py", line 71, in
wrapper
    return func(*args, **kwargs)
  File
"/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/backends/hosted_neptune_backend.py",
line 476, in _execute_operations
    raise RunUUIDNotFound(run_id=run_id) from e
neptune.new.exceptions.RunUUIDNotFound: Run with ID 62136ba6-d853-4d76-a2df-e6321599479c not found. Could be deleted.
Epoch 25   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 26510/-- 1:08:11 β€’ -:--:-- 17.24it/s loss: 0.0979 v_num: -200 val_loss:
                                                                                       0.089
Validation ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1510/--  0:02:19 β€’ -:--:-- 17.24it/s loss: 0.0979 v_num: -200 val_loss:
                                                                                       0.089                                 Unexpected error occurred in Neptune background thread: Killing Neptune asynchronous thread. All data is safe on disk and can be later synced manually using `neptune sync` command.
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/backends/hosted_neptune_backend.py", line 473, in _execute_operations
    result = self.leaderboard_client.api.executeOperations(**kwargs).response().result
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 200, in response
    swagger_result = self._get_swagger_result(incoming_response)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 124, in wrapper
    return func(self, *args, **kwargs)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 300, in _get_swagger_result
    unmarshal_response(
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 353, in unmarshal_response
    raise_on_expected(incoming_response)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 420, in raise_on_expected
    raise make_http_exception(
bravado.exception.HTTPNotFound: 404 Not Found

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/threading/daemon.py", line 54, in run
    self.work()
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/operation_processors/async_operation_processor.py", line 177, in work
    self.process_batch(batch, version)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/threading/daemon.py", line 78, in wrapper
    result = func(self_, *args, **kwargs)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/operation_processors/async_operation_processor.py", line 187, in process_batch
    result = self._processor._backend.execute_operations(self._processor._run_id, batch)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/backends/hosted_neptune_backend.py", line 363, in execute_operations
    errors.extend(self._execute_operations(run_id, other_operations))
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/backends/utils.py", line 71, in wrapper
    return func(*args, **kwargs)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/backends/hosted_neptune_backend.pEpoch 25   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 26514/-- 1:08:12 β€’ -:--:-- 17.26it/s loss: 0.0979 v_num: -200 val_loss:
                                                                                       0.089
Validation ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1514/--  0:02:19 β€’ -:--:-- 17.26it/s loss: 0.0979 v_num: -200 val_loss:
                                                                                       0.089                                 Unexpected error occurred in Neptune background thread: Killing Neptune asynchronous thread. All data is safe on disk and can be later synced manually using `neptune sync` command.
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/backends/hosted_neptune_backend.py", line 473, in _execute_operations
    result = self.leaderboard_client.api.executeOperations(**kwargs).response().result
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 200, in response
    swagger_result = self._get_swagger_result(incoming_response)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 124, in wrapper
    return func(self, *args, **kwargs)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 300, in _get_swagger_result
    unmarshal_response(
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 353, in unmarshal_response
    raise_on_expected(incoming_response)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 420, in raise_on_expected
    raise make_http_exception(
bravado.exception.HTTPNotFound: 404 Not Found

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/threading/daemon.py", line 54, in run
    self.work()
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/operation_processors/async_operation_processor.py", line 177, in work
    self.process_batch(batch, version)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/threading/daemon.py", line 78, in wrapper
    result = func(self_, *args, **kwargs)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/operation_processors/async_operation_processor.py", line 187, in process_batch
    result = self._processor._backend.execute_operations(self._processor._run_id, batch)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/backends/hosted_neptune_backend.py", line 363, in execute_operations
    errors.extend(self._execute_operations(run_id, other_operations))
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/backends/utils.py", line 71, in wrapper
    return func(*args, **kwargs)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/backends/hosted_neptune_backend.pEpoch 25   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 26522/-- 1:08:12 β€’ -:--:-- 17.26it/s loss: 0.0979 v_num: -200 val_loss:
                                                                                       0.089
Validation ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1522/--  0:02:20 β€’ -:--:-- 17.26it/s loss: 0.0979 v_num: -200 val_loss:
                                                                                       0.089                                 Unexpected error occurred in Neptune background thread: Killing Neptune ping thread. Your run's status will not be updated and the run will be shown as inactive.
Exception in thread Thread-4:
Traceback (most recent call last):
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/backends/hosted_neptune_backend.py", line 317, in ping_run
    self.leaderboard_client.api.ping(
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 200, in response
    swagger_result = self._get_swagger_result(incoming_response)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 124, in wrapper
    return func(self, *args, **kwargs)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 300, in _get_swagger_result
    unmarshal_response(
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 353, in unmarshal_response
    raise_on_expected(incoming_response)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 420, in raise_on_expected
    raise make_http_exception(
bravado.exception.HTTPNotFound: 404 Not Found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/threading/daemon.py", line 54, in run
    self.work()
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/threading/daemon.py", line 78, in wrapper
    result = func(self_, *args, **kwargs)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/utils/ping_background_job.py", line 68, in work
    self._run.ping()
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/run.py", line 243, in ping
    self._backend.ping_run(self._id)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/backends/utils.py", line 71, in wrapper
    return func(*args, **kwargs)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/backends/hosted_neptune_backend.pUnexpected error occurred in Neptune background thread: Killing Neptune ping thread. Your run's status will not be updated
and the run will be shown as inactive.
Exception in thread Thread-4:
Traceback (most recent call last):
  File
"/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/backends/hosted_neptune_backend.py",
line 317, in ping_run
    self.leaderboard_client.api.ping(
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 200, in response
    swagger_result = self._get_swagger_result(incoming_response)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 124, in wrapper
    return func(self, *args, **kwargs)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 300, in
_get_swagger_result
    unmarshal_response(
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 353, in
unmarshal_response
    raise_on_expected(incoming_response)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 420, in
raise_on_expected
Epoch 25   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 26559/-- 1:08:14 β€’ -:--:-- 17.24it/s loss: 0.0979 v_num: -200 val_loss:
                                                                                       0.089
Validation ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1559/--  0:02:22 β€’ -:--:-- 17.24it/s loss: 0.0979 v_num: -200 val_loss:
                                                                                       0.089                                 Unexpected error occurred in Neptune background thread: Killing Neptune ping thread. Your run's status will not be updated and the run will be shown as inactive.
Exception in thread Thread-4:
Traceback (most recent call last):
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/backends/hosted_neptune_backend.p    raise make_http_exception(
Epoch 25   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 26559/-- 1:08:14 β€’ -:--:-- 17.24it/s loss: 0.0979 v_num: -200 val_loss:
                                                                                       0.089
Validation ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1559/--  0:02:22 β€’ -:--:-- 17.24it/s loss: 0.0979 v_num: -200 val_loss:
                                                                                       0.089                                 swagger_result = self._get_swagger_result(incoming_response)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 124, in wrapper
    return func(self, *args, **kwargs)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 300, in _get_swagger_result
    unmarshal_response(
bravado.exception.HTTPNotFound: 404 Not Found
Epoch 25   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 26559/-- 1:08:14 β€’ -:--:-- 17.24it/s loss: 0.0979 v_num: -200 val_loss:
                                                                                       0.089
Validation ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1559/--  0:02:22 β€’ -:--:-- 17.24it/s loss: 0.0979 v_num: -200 val_loss:
                                                                                       0.089                                   File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 420, in raise_on_expected
    raise make_http_exception(
bravado.exception.HTTPNotFound: 404 Not Found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/threading.py", line 932, in _bootstrap_inner

During handling of the above exception, another exception occurred:

Epoch 25   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 26559/-- 1:08:14 β€’ -:--:-- 17.24it/s loss: 0.0979 v_num: -200 val_loss:
                                                                                       0.089
Validation ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1559/--  0:02:22 β€’ -:--:-- 17.24it/s loss: 0.0979 v_num: -200 val_loss:
                                                                                       0.089                                     self.work()
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/threading/daemon.py", line 78, in wrapper
    result = func(self_, *args, **kwargs)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/utils/ping_background_job.py", line 68, in work
    self._run.ping()
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/run.py", line 243, in ping
    self._backend.ping_run(self._id)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/backends/utils.py", line 71, in wrapper
    return func(*args, **kwargs)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/backends/hosted_neptune_backend.pTraceback (most recent call last):
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/threading/daemon.py", line 54, in
run
    self.work()
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/threading/daemon.py", line 78, in
wrapper
    result = func(self_, *args, **kwargs)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/utils/ping_background_job.py",
line 68, in work
    self._run.ping()
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/run.py", line 243, in ping
    self._backend.ping_run(self._id)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/backends/utils.py", line 71, in
wrapper
    return func(*args, **kwargs)
  File
"/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/backends/hosted_neptune_backend.py",
line 322, in ping_run
    raise RunUUIDNotFound(run_id)
neptune.new.exceptions.RunUUIDNotFound: Run with ID 62136ba6-d853-4d76-a2df-e6321599479c not found. Could be deleted.
Epoch 25   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 26561/-- 1:08:14 β€’ -:--:-- 17.22it/s loss: 0.0979 v_num: -200 val_loss:
                                                                                       0.089
Validation ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1561/--  0:02:22 β€’ -:--:-- 17.22it/s loss: 0.0979 v_num: -200 val_loss:
                                                                                       0.089                                 Unexpected error occurred in Neptune background thread: Killing Neptune ping thread. Your run's status will not be updated and the run will be shown as inactive.
Exception in thread Thread-4:
Traceback (most recent call last):
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/backends/hosted_neptune_backend.py", line 317, in ping_run
    self.leaderboard_client.api.ping(
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 200, in response
    swagger_result = self._get_swagger_result(incoming_response)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 124, in wrapper
    return func(self, *args, **kwargs)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 300, in _get_swagger_result
    unmarshal_response(
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 353, in unmarshal_response
    raise_on_expected(incoming_response)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/bravado/http_future.py", line 420, in raise_on_expected
    raise make_http_exception(
bravado.exception.HTTPNotFound: 404 Not Found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/threading/daemon.py", line 54, in run
    self.work()
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/threading/daemon.py", line 78, in wrapper
    result = func(self_, *args, **kwargs)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/utils/ping_background_job.py", line 68, in work
    self._run.ping()
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/run.py", line 243, in ping
    self._backend.ping_run(self._id)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/backends/utils.py", line 71, in wrapper
    return func(*args, **kwargs)
  File "/home/kp/miniconda3/envs/gamd6-kp4/lib/python3.8/site-packages/neptune/new/internal/backends/hosted_neptune_backend.p
Shutting down background jobs, please wait a moment...

Environment

PyTorch version: 1.10.0+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.21.4
Libc version: glibc-2.31

Python version: 3.8.12 (default, Oct 12 2021, 13:49:34)  [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.11.0-38-generic-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: RTX A6000
GPU 1: RTX A6000
GPU 2: RTX A6000
GPU 3: RTX A6000
GPU 4: RTX A6000
GPU 5: RTX A6000
GPU 6: RTX A6000
GPU 7: RTX A6000

Nvidia driver version: 460.91.03
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] mypy==0.910
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.21.1
[pip3] pytorch-lightning==1.5.1
[pip3] torch==1.10.0+cu111
[pip3] torch-poly-lr-decay==0.0.1
[pip3] torchaudio==0.10.0+cu111
[pip3] torchmetrics==0.6.0
[conda] mypy                      0.910                    pypi_0    pypi
[conda] mypy-extensions           0.4.3                    pypi_0    pypi
[conda] neptune-client            0.12.1                   pypi_0    pypi
[conda] numpy                     1.21.1                   pypi_0    pypi
[conda] pytorch-lightning         1.5.1                    pypi_0    pypi
[conda] torch                     1.10.0+cu111             pypi_0    pypi
[conda] torch-poly-lr-decay       0.0.1                    pypi_0    pypi
[conda] torchaudio                0.10.0+cu111             pypi_0    pypi
[conda] torchmetrics              0.6.0                    pypi_0    pypi

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
Blaizzycommented, Dec 3, 2021

Thanks for sharing this information πŸ‘!

I will get back to you with updates and/or more questions πŸ˜„ if you don’t mind.

0reactions
Blaizzycommented, Dec 13, 2021

Hey @stonelazy

Thanks for your co-operation,

I spoke to the devs and we can’t really replicate it or pinpoint where the error happened as it only happened once but I will keep an eye for such errors during our maintenance breaks and gather more data about them if they repeat.

For now, I will close the issue. πŸ˜ƒ

Read more comments on GitHub >

github_iconTop Results From Across the Web

BUG: ClientHttpError during training Β· Issue #751 Β· neptune ...
Describe the bug Sometimes, during the training, there is a ClientHttpError raised Reproduction I am running a minimal working example onΒ ...
Read more >
Top 9 Facts About Operation Neptune Spear & Killing bin ...
The team went through intense training because there was little room for error. 3. One of the planes SEAL Team Six took crashed...
Read more >
FDA blames poor training for new deaths, injuries tied to ...
The original recall, issued June 5, warned customers against connecting the Neptune systems to high-powered surgical sucking systems after 1Β ...
Read more >
neptune.new
Maybe your spot instance died, and you need to resume your training? Fear not. Neptune is now prepared for that. And more.
Read more >
Neptune at fault in deadly plant explosion, CSST rules
8, 2012, explosion at the Neptune Technologies and Bio Resources plant near Sherbrooke killed three people and injured 19.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found