[2.5] Failed to pick subchannel
See original GitHub issueEnvironment information (required)
Diagnostics
Diagnostics output
--- check: autoidentify
INFO: diagnose_tensorboard.py version e43767ef2b648d0d5d57c00f38ccbd38390e38da
--- check: general
INFO: sys.version_info: sys.version_info(major=3, minor=9, micro=2, releaselevel='final', serial=0)
INFO: os.name: posix
INFO: os.uname(): posix.uname_result(sysname='Linux', nodename='fs', release='5.4.0-42-generic', version='#46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020', machine='x86_64')
INFO: sys.getwindowsversion(): N/A
--- check: package_management
INFO: has conda-meta: True
INFO: $VIRTUAL_ENV: None
--- check: installed_packages
INFO: installed: tb-nightly==2.5.0a20210407
WARNING: no installation among: ['tensorflow', 'tensorflow-gpu', 'tf-nightly', 'tf-nightly-2.0-preview', 'tf-nightly-gpu', 'tf-nightly-gpu-2.0-preview']
INFO: installed: tf-estimator-nightly==2.5.0.dev2021032601
INFO: installed: tensorboard-data-server==0.6.0
--- check: tensorboard_python_version
INFO: tensorboard.version.VERSION: '2.5.0a20210407'
--- check: tensorflow_python_version
Traceback (most recent call last):
File "/nfs/homedirs/gaoni/diagnose_tensorboard.py", line 522, in main
suggestions.extend(check())
File "/nfs/homedirs/gaoni/diagnose_tensorboard.py", line 75, in wrapper
result = fn()
File "/nfs/homedirs/gaoni/diagnose_tensorboard.py", line 278, in tensorflow_python_version
import tensorflow as tf
ModuleNotFoundError: No module named 'tensorflow'
--- check: tensorboard_data_server_version
INFO: data server binary: '/nfs/homedirs/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages/tensorboard_data_server/bin/server'
INFO: data server binary version: b'rustboard 0.6.0'
--- check: tensorboard_binary_path
INFO: which tensorboard: b'/nfs/homedirs/gaoni/miniconda3/envs/wfnet/bin/tensorboard\n'
--- check: addrinfos
socket.has_ipv6 = True
socket.AF_UNSPEC = <AddressFamily.AF_UNSPEC: 0>
socket.SOCK_STREAM = <SocketKind.SOCK_STREAM: 1>
socket.AI_ADDRCONFIG = <AddressInfo.AI_ADDRCONFIG: 32>
socket.AI_PASSIVE = <AddressInfo.AI_PASSIVE: 1>
Loopback flags: <AddressInfo.AI_ADDRCONFIG: 32>
Loopback infos: [(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::1', 0, 0, 0)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.0.1', 0))]
Wildcard flags: <AddressInfo.AI_PASSIVE: 1>
Wildcard infos: [(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('0.0.0.0', 0)), (<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::', 0, 0, 0))]
--- check: readable_fqdn
INFO: socket.getfqdn(): 'fs'
--- check: stat_tensorboardinfo
INFO: directory: /tmp/.tensorboard-info
INFO: os.stat(...): os.stat_result(st_mode=16895, st_ino=7340163, st_dev=64768, st_nlink=2, st_uid=4430, st_gid=20909, st_size=4096, st_atime=1616715083, st_mtime=1617876453, st_ctime=1617876453)
INFO: mode: 0o40777
--- check: source_trees_without_genfiles
INFO: tensorboard_roots (1): ['/nfs/homedirs/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages']; bad_roots (0): []
--- check: full_pip_freeze
INFO: pip freeze --all:
absl-py==0.12.0
astunparse==1.6.3
cachetools==4.2.1
certifi==2020.12.5
chardet==4.0.0
flatbuffers==1.12
gast==0.4.0
google-auth==1.28.0
google-auth-oauthlib==0.4.4
google-pasta==0.2.0
grpcio==1.34.1
gviz-api==1.9.0
h5py==3.1.0
idna==2.10
jax==0.2.12
jaxlib==0.1.65+cuda102
keras-nightly==2.6.0.dev2021040800
Keras-Preprocessing==1.1.2
Markdown==3.3.4
numpy==1.19.5
oauthlib==3.1.0
opt-einsum==3.3.0
pip==21.0.1
protobuf==3.15.7
pyasn1==0.4.8
pyasn1-modules==0.2.8
requests==2.25.1
requests-oauthlib==1.3.0
rsa==4.7.2
scipy==1.6.2
setuptools==52.0.0.post20210125
six==1.15.0
tb-nightly==2.5.0a20210407
tensorboard-data-server==0.6.0
tensorboard-plugin-wit==1.8.0
termcolor==1.1.0
tf-estimator-nightly==2.5.0.dev2021032601
typing-extensions==3.7.4.3
urllib3==1.26.4
Werkzeug==1.0.1
wheel==0.36.2
wrapt==1.12.1
Next steps
No action items identified. Please copy ALL of the above output, including the lines containing only backticks, into your GitHub issue or comment. Be sure to redact any sensitive information.
Issue description
When opening the webinterface of the currently nightly build (installed via pip install --upgrade tb-nightly
) the website says “Data could not be loaded. The TensorBoard server may be down or inaccessible”. The console throws an exception:
E0408 12:07:32.421477 140535866312448 _internal.py:113] Error on request:
Traceback (most recent call last):
File "/nfs/staff-ssd/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages/werkzeug/serving.py", line 323, in run_wsgi
execute(self.server.app)
File "/nfs/staff-ssd/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages/werkzeug/serving.py", line 312, in execute
application_iter = app(environ, start_response)
File "/nfs/staff-ssd/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages/tensorboard/backend/application.py", line 525, in __call__
return self._app(environ, start_response)
File "/nfs/staff-ssd/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages/tensorboard/backend/application.py", line 566, in wrapper
return wsgi_app(environ, start_response)
File "/nfs/staff-ssd/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages/tensorboard/backend/security_validator.py", line 77, in __call__
return self._application(environ, start_response_proxy)
File "/nfs/staff-ssd/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages/tensorboard/backend/path_prefix.py", line 68, in __call__
return self._application(environ, start_response)
File "/nfs/staff-ssd/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages/tensorboard/backend/experiment_id.py", line 73, in __call__
return self._application(environ, start_response)
File "/nfs/staff-ssd/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages/tensorboard/backend/empty_path_redirect.py", line 43, in __call__
return self._application(environ, start_response)
File "/nfs/staff-ssd/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages/tensorboard/backend/application.py", line 589, in wrapper
return wsgi_app(environ, start_response)
File "/nfs/staff-ssd/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages/tensorboard/backend/application.py", line 548, in _route_request
return self.exact_routes[clean_path](environ, start_response)
File "/nfs/staff-ssd/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages/werkzeug/wrappers/base_request.py", line 238, in application
resp = f(*args[:-2] + (request,))
File "/nfs/staff-ssd/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages/tensorboard/plugins/core/core_plugin.py", line 178, in _serve_environment
md = self._data_provider.experiment_metadata(
File "/nfs/staff-ssd/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages/tensorboard/data/grpc_provider.py", line 56, in experiment_metadata
res = self._stub.GetExperiment(req)
File "/nfs/staff-ssd/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages/grpc/_channel.py", line 923, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/nfs/staff-ssd/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages/grpc/_channel.py", line 826, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses"
debug_error_string = "{"created":"@1617876452.407628535","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":4142,"referenced_errors":[{"created":"@1617876452.068148842","description":"failed to connect to all addresses","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":397,"grpc_status":14}]}"
Uninstalling 2.5 and installing 2.4 works.
Issue Analytics
- State:
- Created 2 years ago
- Comments:12 (7 by maintainers)
Top Results From Across the Web
grpc Failed to pick subchannel · Issue #23340 - GitHub
One moment I can connect and get a response from the server(localhost) in my case and then the next I get this error....
Read more >grpc method call error: Failed to pick subchannel
I'm getting below error when my python client calls a method. Any help to resolve is really appreciated. Python code snippet with open('chain....
Read more >Readiness - Jina 3.13.1 documentation
A Flow is marked as “ready”, when all its Executors and its Gateway are fully loaded and ready. After that, Flow is able...
Read more >Python SDK ReadFromKafka: Timeout expired while fetching ...
Kafka 2.5.0 (https://kafka.apache.org/quickstart - using default ... to pick subchannel" ... That error appears only for Flink 1.10, not for Flink 1.9.
Read more >Environment information (required) - You.com
miniconda3 library error · Issue #10274 · conda/conda · GitHub. Github.com > conda > conda ... tensorflow/tensorboard[2.5] Failed to pick subchannel#4844.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi @n-gao—thanks for the report. Hmm… this is unexpected. Accessing from behind an SSH tunnel should be fine.
Could you please try the following:
tensorboard --logdir PATH_TO_LOGDIR --verbosity 0
lsof -i tcp:4567
(replacing with the actual port number), andps -lyq 123
(replacing with the actual PID)?Also, if you could provide the full
--verbosity 0
log of startup messages, that’d be helpful, too.If you need a workaround for 2.5, you can pass
--load_fast false
, but ideally we’d like to fix this for everyone (whatever the circumstance ends up being), so we’d greatly appreciate any help that you can provide. Thanks!I suspect that that is probably related, but it’s still surprising to me that a localhost loopback connection is blocked. (Feels very Windows-y…)
Maybe your best bet is to ask your system administrators about what might be blocking this. If you want to try to reverse-engineer the config issue yourself, perhaps you could try running
to see if any of those works.