question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Intermittent socket timeouts

See original GitHub issue

This happens rarely but we should likely catch the timeout.

"/Midgard/home/mrabadan/anaconda3/envs/pytorch/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/Midgard/home/mrabadan/anaconda3/envs/pytorch/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "scripts/train_mmnist.py", line 32, in train_run_worker
dir=run_dir)
File "/Midgard/home/mrabadan/anaconda3/envs/pytorch/lib/python3.7/site-packages/wandb/__init__.py", line 983, in init
_init_headless(run)
File "/Midgard/home/mrabadan/anaconda3/envs/pytorch/lib/python3.7/site-packages/wandb/__init__.py", line 239, in _init_headless
success, message = server.listen(30)
File "/Midgard/home/mrabadan/anaconda3/envs/pytorch/lib/python3.7/site-packages/wandb/wandb_socket.py", line 46, in listen
self.connect()
File "/Midgard/home/mrabadan/anaconda3/envs/pytorch/lib/python3.7/site-packages/wandb/wandb_socket.py", line 40, in connect
self.connection, addr = self.socket.accept()
File "/Midgard/home/mrabadan/anaconda3/envs/pytorch/lib/python3.7/socket.py", line 212, in accept
fd, addr = self._accept()
socket.timeout: timed out

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:8
  • Comments:26 (8 by maintainers)

github_iconTop GitHub Comments

3reactions
stathiuscommented, Jan 22, 2020

Hi, just started using wandb and I am also getting this error when running remote jobs (qsub) and it is not rare at all. Around 1/3 of my jobs end like that. Not sure what is happening but in conjuction to other bugs (https://github.com/wandb/client/issues/785) seems like wandb is not a viable option for me.

wandb: Tracking run with wandb version 0.8.21
Traceback (most recent call last):
  File "train_network.py", line 18, in <module>
    wandb.init(project=args.project, name=args.experiment_name, config=vars(args))
  File "/rds/general/user/ef1015/home/anaconda3/envs/cuda10/lib/python3.7/site-packages/wandb/__init__.py", line 1075, in init
    _init_headless(run)
  File "/rds/general/user/ef1015/home/anaconda3/envs/cuda10/lib/python3.7/site-packages/wandb/__init__.py", line 277, in _init_headless
    success, _ = server.listen(30)
  File "/rds/general/user/ef1015/home/anaconda3/envs/cuda10/lib/python3.7/site-packages/wandb/wandb_socket.py", line 46, in listen
    self.connect()
  File "/rds/general/user/ef1015/home/anaconda3/envs/cuda10/lib/python3.7/site-packages/wandb/wandb_socket.py", line 40, in connect
    self.connection, addr = self.socket.accept()
  File "/rds/general/user/ef1015/home/anaconda3/envs/cuda10/lib/python3.7/socket.py", line 212, in accept
    fd, addr = self._accept()
socket.timeout: timed out
2reactions
issue-label-bot[bot]commented, Nov 1, 2019

Issue-Label Bot is automatically applying the label bug to this issue, with a confidence of 0.99. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Diagnose intermittent connection timeout? - Stack Overflow
I have a java client that invokes a thread to hit a servlet and retrieves last few lines from logs at the server,...
Read more >
Intermittent socket timeouts · Issue #660 · wandb ... - GitHub
I find socket.timeout: timed out in ~half of my runs, even when I set WANDB_DISABLE_CODE=true . File "/home/ ...
Read more >
Troubleshooting intermittent connection timeout
Both cases indicate this connection issue was caused by application server failed to complete the handshaking process within timeout threshold.
Read more >
Connection Timeout vs. Read Timeout for Java Sockets
From the client side, the “read timed out” error happens if the server is taking longer to respond and send information. This could...
Read more >
Socket Timeouts - Apache Software Foundation
Socket timeouts can occur when attempting to connect to a remote server, or during communication, especially long-lived ones. They can be caused by...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found