Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TF 2.0 API for using the embedding projector

See original GitHub issue

Preparing embeddings for projector with tensorflow2.

tensorflow1 code would look something like that:

embeddings = tf.compat.v1.Variable(latent_data, name='embeddings')
CHECKPOINT_FILE = TENSORBOARD_DIR + '/model.ckpt'
# Write summaries for tensorboard
with tf.compat.v1.Session() as sess:
    saver = tf.compat.v1.train.Saver([embeddings])
    sess.run(embeddings.initializer)
    saver.save(sess, CHECKPOINT_FILE)
    config = projector.ProjectorConfig()
    embedding = config.embeddings.add()
    embedding.tensor_name = embeddings.name
    embedding.metadata_path = TENSORBOARD_METADATA_FILE

projector.visualize_embeddings(tf.summary.FileWriter(TENSORBOARD_DIR), config)

when using eager mode in tensorflow2 this should (?) look somehow like this:

embeddings = tf.Variable(latent_data, name='embeddings')
CHECKPOINT_FILE = TENSORBOARD_DIR + '/model.ckpt'
ckpt = tf.train.Checkpoint(embeddings=embeddings)
ckpt.save(CHECKPOINT_FILE)

config = projector.ProjectorConfig()
embedding = config.embeddings.add()
embedding.tensor_name = embeddings.name
embedding.metadata_path = TENSORBOARD_METADATA_FILE

writer = tf.summary.create_file_writer(TENSORBOARD_DIR)
projector.visualize_embeddings(writer, config)

however, there are 2 issues:

the writer created with tf.summary.create_file_writer does not have the function get_logdir() required by projector.visualize_embeddings, a simple workaround is to patch the visualize_embeddings function to take the logdir as parameter.
the checkpoint format has changed, when reading the checkpoint with load_checkpoint (which seems to be the tensorboard way of loading the file), the variable names change. e.g. embeddings changes to something like embeddings/.ATTRIBUTES/VARIABLE_VALUE (also there are additional variables in the map extracted by get_variable_to_shape_map()but they are empty anyways).

the second issue was solved with the following quick-and-dirty workaround (and logdir is now a parameter of visualize_embeddings())

embeddings = tf.Variable(latent_data, name='embeddings')
CHECKPOINT_FILE = TENSORBOARD_DIR + '/model.ckpt'
ckpt = tf.train.Checkpoint(embeddings=embeddings)
ckpt.save(CHECKPOINT_FILE)

reader = tf.train.load_checkpoint(TENSORBOARD_DIR)
map = reader.get_variable_to_shape_map()
key_to_use = ""
for key in map:
    if "embeddings" in key:
        key_to_use = key

config = projector.ProjectorConfig()
embedding = config.embeddings.add()
embedding.tensor_name = key_to_use
embedding.metadata_path = TENSORBOARD_METADATA_FILE

writer = tf.summary.create_file_writer(TENSORBOARD_DIR)
projector.visualize_embeddings(writer, config,TENSORBOARD_DIR)

I did not find any examples on how to use tensorflow2 to directly write the embeddings for tensorboard, so I am not sure if this is the right way, but if it is, then those two issues would need to be addressed.

dump of diagnose_tensorboard.py

Diagnostics

Diagnostics output

--- check: autoidentify
INFO: diagnose_tensorboard.py version 393931f9685bd7e0f3898d7dcdf28819fef54c43

--- check: general
INFO: sys.version_info: sys.version_info(major=3, minor=7, micro=3, releaselevel='final', serial=0)
INFO: os.name: posix
INFO: os.uname(): posix.uname_result(sysname='Darwin', nodename='MBPT', release='18.6.0', version='Darwin Kernel Version 18.6.0: Thu Apr 25 23:16:27 PDT 2019; root:xnu-4903.261.4~2/RELEASE_X86_64', machine='x86_64')
INFO: sys.getwindowsversion(): N/A

--- check: package_management
INFO: has conda-meta: True
INFO: $VIRTUAL_ENV: None

--- check: installed_packages
INFO: installed: tb-nightly==1.14.0a20190603
INFO: installed: tensorflow==2.0.0b1
INFO: installed: tf-estimator-nightly==1.14.0.dev2019060501

--- check: tensorboard_python_version
INFO: tensorboard.version.VERSION: '1.14.0a20190603'

--- check: tensorflow_python_version
INFO: tensorflow.__version__: '2.0.0-beta1'
INFO: tensorflow.__git_version__: 'v2.0.0-beta0-16-g1d91213fe7'

--- check: tensorboard_binary_path
INFO: which tensorboard: b'/USER_DIR/anaconda3/envs/TF20/bin/tensorboard\n'

--- check: readable_fqdn
INFO: socket.getfqdn(): '104.1.168.192.in-addr.arpa'

--- check: stat_tensorboardinfo
INFO: directory: /var/folders/zv/0ywdhk0s55q2770ygg2xbty40000gn/T/.tensorboard-info
INFO: .tensorboard-info directory does not exist

--- check: source_trees_without_genfiles
INFO: tensorboard_roots (1): ['/USER_DIR/anaconda3/envs/TF20/lib/python3.7/site-packages']; bad_roots (0): []

--- check: full_pip_freeze
INFO: pip freeze --all:
absl-py==0.7.1
astor==0.8.0
certifi==2019.6.16
gast==0.2.2
google-pasta==0.1.7
grpcio==1.22.0
h5py==2.9.0
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
Markdown==3.1.1
numpy==1.16.4
pandas==0.25.0
pip==19.2.1
protobuf==3.9.0
python-dateutil==2.8.0
pytz==2019.1
setuptools==41.0.1
six==1.12.0
tb-nightly==1.14.0a20190603
tensorflow==2.0.0b1
termcolor==1.1.0
tf-estimator-nightly==1.14.0.dev2019060501
Werkzeug==0.15.5
wheel==0.33.4
wrapt==1.11.2

Issue Analytics

State:
Created 4 years ago
Reactions:8
Comments:16 (1 by maintainers)

Top GitHub Comments

42reactions

palohacommented, Jan 30, 2020

Adding my two cents. Hopefully it will save some people from frustration. This is how I made the Tensorboard Projector show my embeddings in both TF2.0 and TF2.1 in both Non-Eager and Eager execution modes.

I have created a Variant A which runs in Non-Eager mode and Variant B which runs in Eager mode. I will also present two more Variant C and Variant D, which I hoped would work but they do not. Maybe someone can point me to the reason why.

# Some initial code which is the same for all the variants
import os
import numpy as np
import tensorflow as tf
from tensorboard.plugins import projector

def register_embedding(embedding_tensor_name, meta_data_fname, log_dir):
    config = projector.ProjectorConfig()
    embedding = config.embeddings.add()
    embedding.tensor_name = embedding_tensor_name
    embedding.metadata_path = meta_data_fname
    projector.visualize_embeddings(log_dir, config)

def get_random_data(shape=(100,100)):
    x = np.random.rand(*shape)
    y = np.random.randint(low=0, high=2, size=shape[0])
    return x, y

def save_labels_tsv(labels, filepath, log_dir):
    with open(os.path.join(log_dir, filepath), 'w') as f:
        for label in labels:
            f.write('{}\n'.format(label))

LOG_DIR = 'tmp'  # Tensorboard log dir
META_DATA_FNAME = 'meta.tsv'  # Labels will be stored here
EMBEDDINGS_TENSOR_NAME = 'embeddings'
EMBEDDINGS_FPATH = os.path.join(LOG_DIR, EMBEDDINGS_TENSOR_NAME + '.ckpt')
STEP = 0

x, y = get_random_data((100,100))
register_embedding(EMBEDDINGS_TENSOR_NAME, META_DATA_FNAME, LOG_DIR)
save_labels_tsv(y, META_DATA_FNAME, LOG_DIR)

VARIANT A (Works in TF2.0 and TF2.1, but not in eager mode)

# Size of files created on disk: 163kB
tf.compat.v1.disable_eager_execution()
tensor_embeddings = tf.Variable(x, name=EMBEDDINGS_TENSOR_NAME)
sess = tf.compat.v1.InteractiveSession()
sess.run(tf.compat.v1.global_variables_initializer())
saver = tf.compat.v1.train.Saver()
saver.save(sess, EMBEDDINGS_FPATH, STEP)
sess.close()

VARIANT B (Works in both TF2.0 and TF2.1 in Eager mode)

# Size of files created on disk: 80.5kB
tensor_embeddings = tf.Variable(x, name=EMBEDDINGS_TENSOR_NAME)
saver = tf.compat.v1.train.Saver([tensor_embeddings])  # Must pass list or dict
saver.save(sess=None, global_step=STEP, save_path=EMBEDDINGS_FPATH)

VARIANT C (Does not work in TF2.0 or TF2.1, Projector tab is active but no data is displayed)

# Size of files created on disk: 80.8kB
tensor_embeddings = tf.Variable(x, name=EMBEDDINGS_TENSOR_NAME)
checkpoint = tf.train.Checkpoint(embeddings=tensor_embeddings)
checkpoint.save(EMBEDDINGS_FPATH)

VARIANT D (Does not work in both TF2.0 and TF2.1, Projector tab is inactive, No checkpoint was found)

# Size of files created on disk: 80.4kB
tensor_embeddings = tf.Variable(x, name=EMBEDDINGS_TENSOR_NAME)
writer = tf.summary.create_file_writer(LOG_DIR)
with writer.as_default():
    tf.summary.write(tag='projector', tensor=tensor_embeddings,
                     step=STEP, name=EMBEDDINGS_TENSOR_NAME)

It would be great, if this was simplified in new versions of TF. Something like this would be cool:

# WARNING this is purely fictional code :)
writer = tf.summary.create_file_writer(LOG_DIR)
with writer.as_default():
    tf.summary.projector(tensor=tensor_embeddings, labels=tensor_labels,
                     step=STEP, name='desired name')

I understand Tensorflow & Tensorboard team probably have to deal with more important issues, but I must say the lack of documentation on this matter is disturbing. Even the current documentation is sometimes misleading, e.g. pointing out that tf.train.Saver() is deprecated and tf.train.Checkpoint() should be used (with no example). But I just could not make it work. @asitplus-pteufl in the answer at the top shows and example where it works with Checkpoint(), but it works only partially and it requires workarounds. Thanks for that anyways, you have pointed me in a good direction.

34reactions

omoindrotcommented, Aug 14, 2019

Would be great to have a clean tutorial on how to use TensorBoard projector with TensorFlow 2.0 !