question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[2.5] Failed to pick subchannel

See original GitHub issue

Environment information (required)

Diagnostics

Diagnostics output
--- check: autoidentify
INFO: diagnose_tensorboard.py version e43767ef2b648d0d5d57c00f38ccbd38390e38da

--- check: general
INFO: sys.version_info: sys.version_info(major=3, minor=9, micro=2, releaselevel='final', serial=0)
INFO: os.name: posix
INFO: os.uname(): posix.uname_result(sysname='Linux', nodename='fs', release='5.4.0-42-generic', version='#46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020', machine='x86_64')
INFO: sys.getwindowsversion(): N/A

--- check: package_management
INFO: has conda-meta: True
INFO: $VIRTUAL_ENV: None

--- check: installed_packages
INFO: installed: tb-nightly==2.5.0a20210407
WARNING: no installation among: ['tensorflow', 'tensorflow-gpu', 'tf-nightly', 'tf-nightly-2.0-preview', 'tf-nightly-gpu', 'tf-nightly-gpu-2.0-preview']
INFO: installed: tf-estimator-nightly==2.5.0.dev2021032601
INFO: installed: tensorboard-data-server==0.6.0

--- check: tensorboard_python_version
INFO: tensorboard.version.VERSION: '2.5.0a20210407'

--- check: tensorflow_python_version
Traceback (most recent call last):
  File "/nfs/homedirs/gaoni/diagnose_tensorboard.py", line 522, in main
    suggestions.extend(check())
  File "/nfs/homedirs/gaoni/diagnose_tensorboard.py", line 75, in wrapper
    result = fn()
  File "/nfs/homedirs/gaoni/diagnose_tensorboard.py", line 278, in tensorflow_python_version
    import tensorflow as tf
ModuleNotFoundError: No module named 'tensorflow'

--- check: tensorboard_data_server_version
INFO: data server binary: '/nfs/homedirs/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages/tensorboard_data_server/bin/server'
INFO: data server binary version: b'rustboard 0.6.0'

--- check: tensorboard_binary_path
INFO: which tensorboard: b'/nfs/homedirs/gaoni/miniconda3/envs/wfnet/bin/tensorboard\n'

--- check: addrinfos
socket.has_ipv6 = True
socket.AF_UNSPEC = <AddressFamily.AF_UNSPEC: 0>
socket.SOCK_STREAM = <SocketKind.SOCK_STREAM: 1>
socket.AI_ADDRCONFIG = <AddressInfo.AI_ADDRCONFIG: 32>
socket.AI_PASSIVE = <AddressInfo.AI_PASSIVE: 1>
Loopback flags: <AddressInfo.AI_ADDRCONFIG: 32>
Loopback infos: [(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::1', 0, 0, 0)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.0.1', 0))]
Wildcard flags: <AddressInfo.AI_PASSIVE: 1>
Wildcard infos: [(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('0.0.0.0', 0)), (<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::', 0, 0, 0))]

--- check: readable_fqdn
INFO: socket.getfqdn(): 'fs'

--- check: stat_tensorboardinfo
INFO: directory: /tmp/.tensorboard-info
INFO: os.stat(...): os.stat_result(st_mode=16895, st_ino=7340163, st_dev=64768, st_nlink=2, st_uid=4430, st_gid=20909, st_size=4096, st_atime=1616715083, st_mtime=1617876453, st_ctime=1617876453)
INFO: mode: 0o40777

--- check: source_trees_without_genfiles
INFO: tensorboard_roots (1): ['/nfs/homedirs/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages']; bad_roots (0): []

--- check: full_pip_freeze
INFO: pip freeze --all:
absl-py==0.12.0
astunparse==1.6.3
cachetools==4.2.1
certifi==2020.12.5
chardet==4.0.0
flatbuffers==1.12
gast==0.4.0
google-auth==1.28.0
google-auth-oauthlib==0.4.4
google-pasta==0.2.0
grpcio==1.34.1
gviz-api==1.9.0
h5py==3.1.0
idna==2.10
jax==0.2.12
jaxlib==0.1.65+cuda102
keras-nightly==2.6.0.dev2021040800
Keras-Preprocessing==1.1.2
Markdown==3.3.4
numpy==1.19.5
oauthlib==3.1.0
opt-einsum==3.3.0
pip==21.0.1
protobuf==3.15.7
pyasn1==0.4.8
pyasn1-modules==0.2.8
requests==2.25.1
requests-oauthlib==1.3.0
rsa==4.7.2
scipy==1.6.2
setuptools==52.0.0.post20210125
six==1.15.0
tb-nightly==2.5.0a20210407
tensorboard-data-server==0.6.0
tensorboard-plugin-wit==1.8.0
termcolor==1.1.0
tf-estimator-nightly==2.5.0.dev2021032601
typing-extensions==3.7.4.3
urllib3==1.26.4
Werkzeug==1.0.1
wheel==0.36.2
wrapt==1.12.1

Next steps

No action items identified. Please copy ALL of the above output, including the lines containing only backticks, into your GitHub issue or comment. Be sure to redact any sensitive information.

Issue description

When opening the webinterface of the currently nightly build (installed via pip install --upgrade tb-nightly) the website says “Data could not be loaded. The TensorBoard server may be down or inaccessible”. The console throws an exception:

E0408 12:07:32.421477 140535866312448 _internal.py:113] Error on request:
Traceback (most recent call last):
  File "/nfs/staff-ssd/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages/werkzeug/serving.py", line 323, in run_wsgi
    execute(self.server.app)
  File "/nfs/staff-ssd/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages/werkzeug/serving.py", line 312, in execute
    application_iter = app(environ, start_response)
  File "/nfs/staff-ssd/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages/tensorboard/backend/application.py", line 525, in __call__
    return self._app(environ, start_response)
  File "/nfs/staff-ssd/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages/tensorboard/backend/application.py", line 566, in wrapper
    return wsgi_app(environ, start_response)
  File "/nfs/staff-ssd/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages/tensorboard/backend/security_validator.py", line 77, in __call__
    return self._application(environ, start_response_proxy)
  File "/nfs/staff-ssd/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages/tensorboard/backend/path_prefix.py", line 68, in __call__
    return self._application(environ, start_response)
  File "/nfs/staff-ssd/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages/tensorboard/backend/experiment_id.py", line 73, in __call__
    return self._application(environ, start_response)
  File "/nfs/staff-ssd/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages/tensorboard/backend/empty_path_redirect.py", line 43, in __call__
    return self._application(environ, start_response)
  File "/nfs/staff-ssd/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages/tensorboard/backend/application.py", line 589, in wrapper
    return wsgi_app(environ, start_response)
  File "/nfs/staff-ssd/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages/tensorboard/backend/application.py", line 548, in _route_request
    return self.exact_routes[clean_path](environ, start_response)
  File "/nfs/staff-ssd/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages/werkzeug/wrappers/base_request.py", line 238, in application
    resp = f(*args[:-2] + (request,))
  File "/nfs/staff-ssd/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages/tensorboard/plugins/core/core_plugin.py", line 178, in _serve_environment
    md = self._data_provider.experiment_metadata(
  File "/nfs/staff-ssd/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages/tensorboard/data/grpc_provider.py", line 56, in experiment_metadata
    res = self._stub.GetExperiment(req)
  File "/nfs/staff-ssd/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages/grpc/_channel.py", line 923, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/nfs/staff-ssd/gaoni/miniconda3/envs/wfnet/lib/python3.9/site-packages/grpc/_channel.py", line 826, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.UNAVAILABLE
        details = "failed to connect to all addresses"
        debug_error_string = "{"created":"@1617876452.407628535","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":4142,"referenced_errors":[{"created":"@1617876452.068148842","description":"failed to connect to all addresses","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":397,"grpc_status":14}]}"

Uninstalling 2.5 and installing 2.4 works.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:12 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
wchargincommented, Apr 9, 2021

Hi @n-gao—thanks for the report. Hmm… this is unexpected. Accessing from behind an SSH tunnel should be fine.

Could you please try the following:

  • Run tensorboard --logdir PATH_TO_LOGDIR --verbosity 0
  • Near the top, there should be a log line that says: “Established connection to data server at pid 123 via localhost:4567” (for some PID and port)
  • Could you please paste the output of:
    • lsof -i tcp:4567 (replacing with the actual port number), and
    • ps -lyq 123 (replacing with the actual PID)?

Also, if you could provide the full --verbosity 0 log of startup messages, that’d be helpful, too.

If you need a workaround for 2.5, you can pass --load_fast false, but ideally we’d like to fix this for everyone (whatever the circumstance ends up being), so we’d greatly appreciate any help that you can provide. Thanks!

1reaction
wchargincommented, Apr 9, 2021

I suspect that that is probably related, but it’s still surprising to me that a localhost loopback connection is blocked. (Feels very Windows-y…)

Maybe your best bet is to ask your system administrators about what might be blocking this. If you want to try to reverse-engineer the config issue yourself, perhaps you could try running

# explicit loopback addresses for IPv6/IPv4
tensorboard --logdir ... --extra_data_server_flags=--host=::1
tensorboard --logdir ... --extra_data_server_flags=--host=127.0.0.1

# explicit wildcard addresses for IPv6/IPv4
tensorboard --logdir ... --extra_data_server_flags=--host=::0
tensorboard --logdir ... --extra_data_server_flags=--host=0.0.0.0

to see if any of those works.

Read more comments on GitHub >

github_iconTop Results From Across the Web

grpc Failed to pick subchannel · Issue #23340 - GitHub
One moment I can connect and get a response from the server(localhost) in my case and then the next I get this error....
Read more >
grpc method call error: Failed to pick subchannel
I'm getting below error when my python client calls a method. Any help to resolve is really appreciated. Python code snippet with open('chain....
Read more >
Readiness - Jina 3.13.1 documentation
A Flow is marked as “ready”, when all its Executors and its Gateway are fully loaded and ready. After that, Flow is able...
Read more >
Python SDK ReadFromKafka: Timeout expired while fetching ...
Kafka 2.5.0 (https://kafka.apache.org/quickstart - using default ... to pick subchannel" ... That error appears only for Flink 1.10, not for Flink 1.9.
Read more >
Environment information (required) - You.com
miniconda3 library error · Issue #10274 · conda/conda · GitHub. Github.com > conda > conda ... tensorflow/tensorboard[2.5] Failed to pick subchannel#4844.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found