question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TF 2.0 API for using the embedding projector

See original GitHub issue

Preparing embeddings for projector with tensorflow2.

tensorflow1 code would look something like that:

embeddings = tf.compat.v1.Variable(latent_data, name='embeddings')
CHECKPOINT_FILE = TENSORBOARD_DIR + '/model.ckpt'
# Write summaries for tensorboard
with tf.compat.v1.Session() as sess:
    saver = tf.compat.v1.train.Saver([embeddings])
    sess.run(embeddings.initializer)
    saver.save(sess, CHECKPOINT_FILE)
    config = projector.ProjectorConfig()
    embedding = config.embeddings.add()
    embedding.tensor_name = embeddings.name
    embedding.metadata_path = TENSORBOARD_METADATA_FILE

projector.visualize_embeddings(tf.summary.FileWriter(TENSORBOARD_DIR), config)

when using eager mode in tensorflow2 this should (?) look somehow like this:

embeddings = tf.Variable(latent_data, name='embeddings')
CHECKPOINT_FILE = TENSORBOARD_DIR + '/model.ckpt'
ckpt = tf.train.Checkpoint(embeddings=embeddings)
ckpt.save(CHECKPOINT_FILE)

config = projector.ProjectorConfig()
embedding = config.embeddings.add()
embedding.tensor_name = embeddings.name
embedding.metadata_path = TENSORBOARD_METADATA_FILE

writer = tf.summary.create_file_writer(TENSORBOARD_DIR)
projector.visualize_embeddings(writer, config)

however, there are 2 issues:

  • the writer created with tf.summary.create_file_writer does not have the function get_logdir() required by projector.visualize_embeddings, a simple workaround is to patch the visualize_embeddings function to take the logdir as parameter.
  • the checkpoint format has changed, when reading the checkpoint with load_checkpoint (which seems to be the tensorboard way of loading the file), the variable names change. e.g. embeddings changes to something like embeddings/.ATTRIBUTES/VARIABLE_VALUE (also there are additional variables in the map extracted by get_variable_to_shape_map()but they are empty anyways).

the second issue was solved with the following quick-and-dirty workaround (and logdir is now a parameter of visualize_embeddings())

embeddings = tf.Variable(latent_data, name='embeddings')
CHECKPOINT_FILE = TENSORBOARD_DIR + '/model.ckpt'
ckpt = tf.train.Checkpoint(embeddings=embeddings)
ckpt.save(CHECKPOINT_FILE)

reader = tf.train.load_checkpoint(TENSORBOARD_DIR)
map = reader.get_variable_to_shape_map()
key_to_use = ""
for key in map:
    if "embeddings" in key:
        key_to_use = key

config = projector.ProjectorConfig()
embedding = config.embeddings.add()
embedding.tensor_name = key_to_use
embedding.metadata_path = TENSORBOARD_METADATA_FILE

writer = tf.summary.create_file_writer(TENSORBOARD_DIR)
projector.visualize_embeddings(writer, config,TENSORBOARD_DIR)

I did not find any examples on how to use tensorflow2 to directly write the embeddings for tensorboard, so I am not sure if this is the right way, but if it is, then those two issues would need to be addressed.

dump of diagnose_tensorboard.py

Diagnostics

Diagnostics output
--- check: autoidentify
INFO: diagnose_tensorboard.py version 393931f9685bd7e0f3898d7dcdf28819fef54c43

--- check: general
INFO: sys.version_info: sys.version_info(major=3, minor=7, micro=3, releaselevel='final', serial=0)
INFO: os.name: posix
INFO: os.uname(): posix.uname_result(sysname='Darwin', nodename='MBPT', release='18.6.0', version='Darwin Kernel Version 18.6.0: Thu Apr 25 23:16:27 PDT 2019; root:xnu-4903.261.4~2/RELEASE_X86_64', machine='x86_64')
INFO: sys.getwindowsversion(): N/A

--- check: package_management
INFO: has conda-meta: True
INFO: $VIRTUAL_ENV: None

--- check: installed_packages
INFO: installed: tb-nightly==1.14.0a20190603
INFO: installed: tensorflow==2.0.0b1
INFO: installed: tf-estimator-nightly==1.14.0.dev2019060501

--- check: tensorboard_python_version
INFO: tensorboard.version.VERSION: '1.14.0a20190603'

--- check: tensorflow_python_version
INFO: tensorflow.__version__: '2.0.0-beta1'
INFO: tensorflow.__git_version__: 'v2.0.0-beta0-16-g1d91213fe7'

--- check: tensorboard_binary_path
INFO: which tensorboard: b'/USER_DIR/anaconda3/envs/TF20/bin/tensorboard\n'

--- check: readable_fqdn
INFO: socket.getfqdn(): '104.1.168.192.in-addr.arpa'

--- check: stat_tensorboardinfo
INFO: directory: /var/folders/zv/0ywdhk0s55q2770ygg2xbty40000gn/T/.tensorboard-info
INFO: .tensorboard-info directory does not exist

--- check: source_trees_without_genfiles
INFO: tensorboard_roots (1): ['/USER_DIR/anaconda3/envs/TF20/lib/python3.7/site-packages']; bad_roots (0): []

--- check: full_pip_freeze
INFO: pip freeze --all:
absl-py==0.7.1
astor==0.8.0
certifi==2019.6.16
gast==0.2.2
google-pasta==0.1.7
grpcio==1.22.0
h5py==2.9.0
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
Markdown==3.1.1
numpy==1.16.4
pandas==0.25.0
pip==19.2.1
protobuf==3.9.0
python-dateutil==2.8.0
pytz==2019.1
setuptools==41.0.1
six==1.12.0
tb-nightly==1.14.0a20190603
tensorflow==2.0.0b1
termcolor==1.1.0
tf-estimator-nightly==1.14.0.dev2019060501
Werkzeug==0.15.5
wheel==0.33.4
wrapt==1.11.2

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:8
  • Comments:16 (1 by maintainers)

github_iconTop GitHub Comments

42reactions
palohacommented, Jan 30, 2020

Adding my two cents. Hopefully it will save some people from frustration. This is how I made the Tensorboard Projector show my embeddings in both TF2.0 and TF2.1 in both Non-Eager and Eager execution modes.

I have created a Variant A which runs in Non-Eager mode and Variant B which runs in Eager mode. I will also present two more Variant C and Variant D, which I hoped would work but they do not. Maybe someone can point me to the reason why.

# Some initial code which is the same for all the variants
import os
import numpy as np
import tensorflow as tf
from tensorboard.plugins import projector

def register_embedding(embedding_tensor_name, meta_data_fname, log_dir):
    config = projector.ProjectorConfig()
    embedding = config.embeddings.add()
    embedding.tensor_name = embedding_tensor_name
    embedding.metadata_path = meta_data_fname
    projector.visualize_embeddings(log_dir, config)

def get_random_data(shape=(100,100)):
    x = np.random.rand(*shape)
    y = np.random.randint(low=0, high=2, size=shape[0])
    return x, y

def save_labels_tsv(labels, filepath, log_dir):
    with open(os.path.join(log_dir, filepath), 'w') as f:
        for label in labels:
            f.write('{}\n'.format(label))

LOG_DIR = 'tmp'  # Tensorboard log dir
META_DATA_FNAME = 'meta.tsv'  # Labels will be stored here
EMBEDDINGS_TENSOR_NAME = 'embeddings'
EMBEDDINGS_FPATH = os.path.join(LOG_DIR, EMBEDDINGS_TENSOR_NAME + '.ckpt')
STEP = 0

x, y = get_random_data((100,100))
register_embedding(EMBEDDINGS_TENSOR_NAME, META_DATA_FNAME, LOG_DIR)
save_labels_tsv(y, META_DATA_FNAME, LOG_DIR)

VARIANT A (Works in TF2.0 and TF2.1, but not in eager mode)

# Size of files created on disk: 163kB
tf.compat.v1.disable_eager_execution()
tensor_embeddings = tf.Variable(x, name=EMBEDDINGS_TENSOR_NAME)
sess = tf.compat.v1.InteractiveSession()
sess.run(tf.compat.v1.global_variables_initializer())
saver = tf.compat.v1.train.Saver()
saver.save(sess, EMBEDDINGS_FPATH, STEP)
sess.close()

VARIANT B (Works in both TF2.0 and TF2.1 in Eager mode)

# Size of files created on disk: 80.5kB
tensor_embeddings = tf.Variable(x, name=EMBEDDINGS_TENSOR_NAME)
saver = tf.compat.v1.train.Saver([tensor_embeddings])  # Must pass list or dict
saver.save(sess=None, global_step=STEP, save_path=EMBEDDINGS_FPATH)

VARIANT C (Does not work in TF2.0 or TF2.1, Projector tab is active but no data is displayed)

# Size of files created on disk: 80.8kB
tensor_embeddings = tf.Variable(x, name=EMBEDDINGS_TENSOR_NAME)
checkpoint = tf.train.Checkpoint(embeddings=tensor_embeddings)
checkpoint.save(EMBEDDINGS_FPATH)

VARIANT D (Does not work in both TF2.0 and TF2.1, Projector tab is inactive, No checkpoint was found)

# Size of files created on disk: 80.4kB
tensor_embeddings = tf.Variable(x, name=EMBEDDINGS_TENSOR_NAME)
writer = tf.summary.create_file_writer(LOG_DIR)
with writer.as_default():
    tf.summary.write(tag='projector', tensor=tensor_embeddings,
                     step=STEP, name=EMBEDDINGS_TENSOR_NAME)

It would be great, if this was simplified in new versions of TF. Something like this would be cool:

# WARNING this is purely fictional code :)
writer = tf.summary.create_file_writer(LOG_DIR)
with writer.as_default():
    tf.summary.projector(tensor=tensor_embeddings, labels=tensor_labels,
                     step=STEP, name='desired name')

I understand Tensorflow & Tensorboard team probably have to deal with more important issues, but I must say the lack of documentation on this matter is disturbing. Even the current documentation is sometimes misleading, e.g. pointing out that tf.train.Saver() is deprecated and tf.train.Checkpoint() should be used (with no example). But I just could not make it work. @asitplus-pteufl in the answer at the top shows and example where it works with Checkpoint(), but it works only partially and it requires workarounds. Thanks for that anyways, you have pointed me in a good direction.

34reactions
omoindrotcommented, Aug 14, 2019

Would be great to have a clean tutorial on how to use TensorBoard projector with TensorFlow 2.0 !

Read more comments on GitHub >

github_iconTop Results From Across the Web

Visualizing Data using the Embedding Projector in TensorBoard
The TensorBoard Projector is a great tool for interpreting and visualzing embedding. The dashboard allows users to search for specific terms, ...
Read more >
How to use the Embedding Projector in Tensorflow 2.0
The Tensorboard documentation for Tensorflow 2.0 explains how to create plots and summaries, and how to use the summary tool in general, but ......
Read more >
Simple BERT using TensorFlow 2.0 - Gergely D Nemeth
... a simple usage of the BERT [1] embedding using TensorFlow 2.0. ... to use easy, ready-to-use models based on the high-level Keras...
Read more >
TensorBoard Embedding Projector with Precomputed ...
To make matters even more amusing or frustrating depending on your state of mind, you can easily upload embeddings as a tsv file...
Read more >
How to Visualize Text Embeddings with TensorBoard
A word embedding is any method which converts words into numbers, and it is the primary task of any Machine Learning (ML) workflow...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found