question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Tensorboard Projector - cosine distance "Nearest points in the original space" not correct

See original GitHub issue

Environment information (required)

--- check: autoidentify
INFO: diagnose_tensorboard.py version 393931f9685bd7e0f3898d7dcdf28819fef54c43

--- check: general
INFO: sys.version_info: sys.version_info(major=3, minor=6, micro=8, releaselevel='final', serial=0)
INFO: os.name: nt
INFO: os.uname(): N/A
INFO: sys.getwindowsversion(): sys.getwindowsversion(major=10, minor=0, build=17763, platform=2, service_pack='')

--- check: package_management
INFO: has conda-meta: False
INFO: $VIRTUAL_ENV: None

--- check: installed_packages
INFO: installed: tensorboard==1.13.1
INFO: installed: tensorflow-gpu==1.13.1
INFO: installed: tensorflow==1.14.0
WARNING: conflicting installations: ['tensorflow', 'tensorflow-gpu']
INFO: installed: tensorflow-estimator==1.13.0

--- check: tensorboard_python_version
INFO: tensorboard.version.VERSION: '1.13.1'

--- check: tensorflow_python_version
INFO: tensorflow.__version__: '1.13.1'
INFO: tensorflow.__git_version__: "b'v1.13.1-0-g6612da8951'"

--- check: tensorboard_binary_path
INFO: which tensorboard: b'F:\\Desktop\\Thesis\\Python3.6\\Scripts\\tensorboard.exe\r\n'

--- check: readable_fqdn
INFO: socket.getfqdn(): 'DESKTOP-LD8UUFN.home'

--- check: stat_tensorboardinfo
INFO: directory: C:\Users\josch\AppData\Local\Temp\.tensorboard-info
INFO: os.stat(...): os.stat_result(st_mode=16895, st_ino=61361544923004089, st_dev=3506408066, st_nlink=1, st_uid=0, st_gid=0, st_size=24576, st_atime=1562950451, st_mtime=1562950451, st_ctime=1560964117)
INFO: mode: 0o40777

--- check: source_trees_without_genfiles
INFO: tensorboard_roots (1): ['F:\\Python3.6\\lib\\site-packages']; bad_roots (0): []

--- check: full_pip_freeze
INFO: pip freeze --all:
absl-py==0.7.1
astor==0.8.0
attrs==19.1.0
backcall==0.1.0
bleach==3.1.0
boto==2.49.0
boto3==1.9.171
botocore==1.12.171
certifi==2019.6.16
chardet==3.0.4
colorama==0.4.1
cycler==0.10.0
decorator==4.4.0
defusedxml==0.6.0
docutils==0.14
entrypoints==0.3
gast==0.2.2
gensim==3.7.3
google-pasta==0.1.7
grpcio==1.21.1
h5py==2.9.0
idna==2.8
ipykernel==5.1.1
ipython==7.5.0
ipython-genutils==0.2.0
ipywidgets==7.4.2
jedi==0.13.3
Jinja2==2.10.1
jmespath==0.9.4
joblib==0.13.2
jsonschema==3.0.1
jupyter==1.0.0
jupyter-client==5.2.4
jupyter-console==6.0.0
jupyter-core==4.5.0
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
kiwisolver==1.1.0
Markdown==3.1.1
MarkupSafe==1.1.1
matplotlib==3.1.0
mistune==0.8.4
mock==3.0.5
nbconvert==5.5.0
nbformat==4.4.0
notebook==5.7.8
numpy==1.16.4
pandas==0.24.2
pandocfilters==1.4.2
parso==0.4.0
pickleshare==0.7.5
pip==18.1
prometheus-client==0.7.0
prompt-toolkit==2.0.9
protobuf==3.8.0
Pygments==2.4.2
pyparsing==2.4.0
pyrsistent==0.15.2
python-dateutil==2.8.0
pytz==2019.1
pywinpty==0.5.5
pyzmq==18.0.1
qtconsole==4.5.1
requests==2.22.0
s3transfer==0.2.1
scikit-learn==0.21.2
scipy==1.3.0
Send2Trash==1.5.0
setuptools==41.0.1
six==1.12.0
sklearn==0.0
smart-open==1.8.4
tensorboard==1.13.1
tensorflow==1.14.0
tensorflow-estimator==1.13.0
tensorflow-gpu==1.13.1
termcolor==1.1.0
terminado==0.8.2
testpath==0.4.2
tornado==6.0.2
traitlets==4.3.2
urllib3==1.25.3
wcwidth==0.1.7
webencodings==0.5.1
Werkzeug==0.15.4
wheel==0.33.4
widgetsnbextension==3.4.2
wrapt==1.11.2
xlrd==1.2.0

Issue description

I am currently visualizing word embeddings (shape=60,300) from my TensorFlow model in the TensorBoard Projector and i am having troubles with the cosine distance.

The displayed distances distort the results and doesn’t match the real cosine distances.

This was a test run with different category embeddings:

  • TensorBoard: sklearn

  • sklearn: tensorboard

Both use the same data and the results are not even close.

Is TensorBoard reducing the dimensions from the vectors and the label “Nearest points in the original space” is incorrect?

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:2
  • Comments:12 (4 by maintainers)

github_iconTop GitHub Comments

4reactions
dsmilkovcommented, Aug 19, 2019

Yes, with a tiny detail that you will have to call np.linalg.norm(N_vectors,axis=-1,keepdims=True) so the division broadcasting works in the last line of code.

3reactions
dsmilkovcommented, Jul 19, 2019

Hi!

Yes, the functionality hasn’t changed. Couple of notes:

  • To make fast projections, the Projector projects high-dimensional data down to 200 dimensions (randomly chosen). Is 60,300 the dimensionality of your data, or the number of points? This random projection could lead to loss of information.
  • Make sure to turn off Sphereize Data (checkbox in the left panel), which shifts the points and makes them unit norm. While this affects the absolute values of the cosine distances, it shouldn’t affect the ranking of the neighbors though.
Read more comments on GitHub >

github_iconTop Results From Across the Web

Why does the TensorBoard display the wrong cosine distance?
i want to visualize word embeddings in the Projector from TensorBoard, but the cosine distances doesnt seem right. If i compute the cosine ......
Read more >
Embedding projector
The number of neighbors (in the original space) to show when clicking on a point. distance. COSINE EUCLIDEAN. Nearest points in the original...
Read more >
TensorBoard Visualizations - | notebook.community
We will use a built-in Tensorboard visualizer called Embedding Projector in this ... the exact cosine/euclidean distances between them are not preserved, ...
Read more >
Word Embeddings and Embedding Projector of TensorFlow
Unlike euclidean distance, cosine similarity does not take the ... “own” is present in the text and a list containing the nearest points....
Read more >
t-SNE: T-Distributed Stochastic Neighbor Embedding Explained
To explain it, we will use two-dimensional data points (higher dimensional ... We will use TensorBoard Projector to map higher-dimensional ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found