BUG: ClientHttpError during training
See original GitHub issueDescribe the bug
Sometimes, during the training, there is a ClientHttpError
raised
Reproduction
I am running a minimal working example on MNIST with the new Lightning integration. I can reproduce this in two different machines.
Traceback
Exception in thread Thread-1:
Traceback (most recent call last):
File "/home/luca/miniconda3/envs/lightning-project-template/lib/python3.8/site-packages/bravado/http_future.py", line 335, in unmarshal_response
incoming_response.swagger_result = unmarshal_response_inner( # type: ignore
File "/home/luca/miniconda3/envs/lightning-project-template/lib/python3.8/site-packages/bravado/http_future.py", line 370, in unmarshal_response_inner
response_spec = get_response_spec(status_code=response.status_code, op=op)
File "/home/luca/miniconda3/envs/lightning-project-template/lib/python3.8/site-packages/bravado_core/response.py", line 157, in get_response_spec
raise MatchingResponseNotFound(
bravado_core.exception.MatchingResponseNotFound: Response specification matching http status_code 400 not found for operation Operation(executeOperations). Either add a response specification for the status_code or use a `default` response.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/luca/miniconda3/envs/lightning-project-template/lib/python3.8/site-packages/neptune/new/internal/backends/utils.py", line 71, in wrapper
return func(*args, **kwargs)
File "/home/luca/miniconda3/envs/lightning-project-template/lib/python3.8/site-packages/neptune/new/internal/backends/hosted_neptune_backend.py", line 473, in _execute_operations
result = self.leaderboard_client.api.executeOperations(**kwargs).response().result
File "/home/luca/miniconda3/envs/lightning-project-template/lib/python3.8/site-packages/bravado/http_future.py", line 200, in response
swagger_result = self._get_swagger_result(incoming_response)
File "/home/luca/miniconda3/envs/lightning-project-template/lib/python3.8/site-packages/bravado/http_future.py", line 124, in wrapper
return func(self, *args, **kwargs)
File "/home/luca/miniconda3/envs/lightning-project-template/lib/python3.8/site-packages/bravado/http_future.py", line 300, in _get_swagger_result
unmarshal_response(
File "/home/luca/miniconda3/envs/lightning-project-template/lib/python3.8/site-packages/bravado/http_future.py", line 344, in unmarshal_response
six.reraise(
File "/home/luca/miniconda3/envs/lightning-project-template/lib/python3.8/site-packages/six.py", line 718, in reraise
raise value.with_traceback(tb)
File "/home/luca/miniconda3/envs/lightning-project-template/lib/python3.8/site-packages/bravado/http_future.py", line 335, in unmarshal_response
incoming_response.swagger_result = unmarshal_response_inner( # type: ignore
File "/home/luca/miniconda3/envs/lightning-project-template/lib/python3.8/site-packages/bravado/http_future.py", line 370, in unmarshal_response_inner
response_spec = get_response_spec(status_code=response.status_code, op=op)
File "/home/luca/miniconda3/envs/lightning-project-template/lib/python3.8/site-packages/bravado_core/response.py", line 157, in get_response_spec
raise MatchingResponseNotFound(
bravado.exception.HTTPBadRequest: 400 : Response specification matching http status_code 400 not found for operation Operation(executeOperations). Either add a response specification for the status_code or use a `default` response.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/luca/miniconda3/envs/lightning-project-template/lib/python3.8/threading.py", line 932, in _bootstrap_inner
self.run()
File "/home/luca/miniconda3/envs/lightning-project-template/lib/python3.8/site-packages/neptune/new/internal/threading/daemon.py", line 54, in run
self.work()
File "/home/luca/miniconda3/envs/lightning-project-template/lib/python3.8/site-packages/neptune/new/internal/operation_processors/async_operation_processor.py", line 177, in work
self.process_batch(batch, version)
File "/home/luca/miniconda3/envs/lightning-project-template/lib/python3.8/site-packages/neptune/new/internal/threading/daemon.py", line 78, in wrapper
result = func(self_, *args, **kwargs)
File "/home/luca/miniconda3/envs/lightning-project-template/lib/python3.8/site-packages/neptune/new/internal/operation_processors/async_operation_processor.py", line 187, in process_batch
result = self._processor._backend.execute_operations(self._processor._run_id, batch)
File "/home/luca/miniconda3/envs/lightning-project-template/lib/python3.8/site-packages/neptune/new/internal/backends/hosted_neptune_backend.py", line 363, in execute_operations
errors.extend(self._execute_operations(run_id, other_operations))
File "/home/luca/miniconda3/envs/lightning-project-template/lib/python3.8/site-packages/neptune/new/internal/backends/utils.py", line 86, in wrapper
raise ClientHttpError(e.status_code, e.response.text) from e
neptune.new.exceptions.ClientHttpError:
----ClientHttpError-----------------------------------------------------------------------
Neptune server returned status 400.
Server response was:
{"code":400,"errorType":"MALFORMED_JSON_REQUEST","title":"Malformed JSON request: JSON parse error: Numeric value (2470129626) out of range of int (-2147483648 - 2147483647); nested exception is com.fasterxml.jackson.databind.JsonMappingException: Numeric value (2470129626) out of range of int (-2147483648 - 2147483647)\n at [Source: (PushbackInputStream); line: 1, column: 16817] (through reference chain: java.util.ArrayList[54]->ml.neptune.leaderboard.api.model.operation.OperationDTO[\"assignInt\"]->ml.neptune.leaderboard.api.model.operation.AssignIntDTO[\"value\"])"}
Verify the correctness of your call or contact Neptune support.
Need help?-> https://docs.neptune.ai/getting-started/getting-help
Environment
The output of pip list
:
❯ pip list
Package Version Location
--------------------------------- -------------------- ----------------------------------------------------------------
absl-py 1.0.0
aiohttp 3.8.1
aiosignal 1.2.0
antlr4-python3-runtime 4.8
async-timeout 4.0.1
attrs 21.2.0
azure-core 1.20.1
azure-storage-blob 12.9.0
backports.entry-points-selectable 1.1.1
black 21.10b0
boto3 1.20.5
botocore 1.23.5
bravado 11.0.3
bravado-core 5.17.0
cachetools 4.2.4
certifi 2021.10.8
cffi 1.15.0
cfgv 3.3.1
charset-normalizer 2.0.7
click 8.0.3
cloudpathlib 0.6.2
cloudpickle 2.0.0
coverage 6.1.2
cryptography 35.0.0
dacite 1.6.0
dill 0.3.4
distlib 0.3.3
filelock 3.3.2
flake8 4.0.1
frozenlist 1.2.0
fsspec 2021.11.0
future 0.18.2
ghp-import 2.0.2
gitdb 4.0.9
GitPython 3.1.24
google-api-core 2.2.2
google-auth 2.3.3
google-auth-oauthlib 0.4.6
google-cloud-core 2.2.1
google-cloud-storage 1.42.3
google-crc32c 1.3.0
google-resumable-media 2.1.0
googleapis-common-protos 1.53.0
grpcio 1.41.1
identify 2.3.5
idna 3.3
importlib-metadata 4.8.2
iniconfig 1.1.1
isodate 0.6.0
isort 5.10.1
Jinja2 3.0.3
jmespath 0.10.0
jsonpointer 2.2
jsonref 0.2
jsonschema 3.2.0
lightning-project-template 0.1.dev1+gc63e8fd /home/luca/Projects/CookieTesting/lightning-project-template/src
Markdown 3.3.4
MarkupSafe 2.0.1
mccabe 0.6.1
mergedeep 1.3.4
mkapi 1.0.14
mkdocs 1.2.3
mkdocs-material 7.3.6
mkdocs-material-extensions 1.0.3
monotonic 1.6
msgpack 1.0.2
msrest 0.6.21
multidict 5.2.0
mypy-extensions 0.4.3
natsort 8.0.0
neptune-client 0.13.1
nodeenv 1.6.0
numpy 1.21.4
oauthlib 3.1.1
olefile 0.46
omegaconf 2.1.1
packaging 21.2
pandas 1.3.4
pathspec 0.9.0
Pillow 8.4.0
pip 21.2.4
platformdirs 2.4.0
pluggy 1.0.0
pre-commit 2.15.0
prime-config 0.9.3.dev24+g2885489
prime-pack 0.3.dev61+g3c35037
prime-utils 1.0.0
protobuf 3.19.1
psutil 5.8.0
py 1.11.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycodestyle 2.8.0
pycparser 2.21
pyDeprecate 0.3.1
pyflakes 2.4.0
Pygments 2.10.0
PyJWT 2.3.0
pymdown-extensions 9.1
pyparsing 2.4.7
pyrsistent 0.18.0
pytest 6.2.5
pytest-cov 3.0.0
python-dateutil 2.8.2
python-dotenv 0.19.2
pytorch-lightning 1.5.1
pytz 2021.3
PyYAML 6.0
pyyaml_env_tag 0.1
regex 2021.11.10
requests 2.26.0
requests-oauthlib 1.3.0
rfc3987 1.3.8
rsa 4.7.2
s3transfer 0.5.0
setuptools 58.0.4
simplejson 3.17.5
six 1.16.0
smmap 5.0.0
strict-rfc3339 0.7
swagger-spec-validator 2.7.4
tensorboard 2.7.0
tensorboard-data-server 0.6.1
tensorboard-plugin-wit 1.8.0
toml 0.10.2
tomli 1.2.2
torch 1.9.0
torchmetrics 0.6.0
torchvision 0.10.0
tqdm 4.62.3
typing-extensions 3.10.0.2
urllib3 1.26.7
virtualenv 20.10.0
watchdog 2.1.6
webcolors 1.11.1
websocket-client 1.2.1
Werkzeug 2.0.2
wheel 0.37.0
yarl 1.7.2
zipp 3.6.0
The operating system you’re using: Ubuntu
The output of python --version
: Python 3.8.12
Issue Analytics
- State:
- Created 2 years ago
- Comments:13 (9 by maintainers)
Top Results From Across the Web
Error Message: 01490290:3: OAuth Client: failed for server error
In BIG-IP APM as OAuth Client/Resource server, the BIG-IP system needs to connect to the OAuth Authorization server for OAuth token request.
Read more >Error messages | BigQuery - Google Cloud
Error message HTTP code Description
stopped 200 This status code returns when a job is canceled.
timeout 400 The job timed out.
Read more >What is HTTP error 400 and how do you fix it? - IT PRO
This error can indicate that a request has not been met successfully, or that the remote server that received the request was unable...
Read more >HTTP Error 431: Definition, Status, Causes & Solutions | Okta
You may need to talk with your server host before making some changes. If you adjust your code to accept more data, but...
Read more >HTTP Status Codes: All 63 explained - including FAQ & Video
The 429 Too Many Requests response code means that in the given time, the user has sent too many requests. What does 431...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
hey @lucmos,
Let me have a closer look with engineering team. We will get back here with more info.
Let me know if you need anything else regarding this issue,
I will be closing it for now.