Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Anyone able to load tfrecords into TFRS generated with the spark-to-tf-records connector?

See original GitHub issue

Anyone see this kind of error when trying to load TF records generated from Spark by the spark to tf records connector or linkedin’s spark tf record library?

Error: Error when deserializing tfrecord’s in TF 2.x: Only integers, slices (:), ellipsis (...), tf.newaxis (None) and scalar tf.int32/tf.int64 tensors are valid indices

Filed tickets there with details

Really just doing a simple thing, using the small movielens dataset:

    # Code for the connector
    movies_df.write.format("tfrecords").mode("overwrite").save(tf_movies_dir)
    ratings_df.write.format("tfrecords").mode("overwrite").save(tf_ratings_dir)

    # Alternatively, code for the spark to tfrecord
    movies_df.write.format("tfrecord").mode("overwrite").option("recordType", "Example").save(tf_movies_dir)
    ratings_df.write.format("tfrecord").mode("overwrite").option("recordType", "Example").save(tf_ratings_dir)

    s3 = boto3.resource("s3", verify=False)
    bucket = s3.Bucket("mybucket")

    filenames = []
    for object_summary in bucket.objects.filter(
            Prefix=f"emr/spark_apps/myapp/movielens-100k-conversion/movies-0001/part"
    ):
        filenames.append(os.path.join("s3://audiomack-master-airflow/", object_summary.key))
    movies_dataset = tf.data.TFRecordDataset(filenames)

    filenames = []
    for object_summary in bucket.objects.filter(
            Prefix=f"emr/spark_apps/myapp/movielens-100k-conversion/ratings-0001/part"
    ):
        filenames.append(os.path.join("s3://audiomack-master-airflow/", object_summary.key))
    ratings_dataset = tf.data.TFRecordDataset(filenames)

Issue Analytics

State:
Created 3 years ago
Comments:15

Top GitHub Comments

1reaction

maciejkulacommented, Dec 30, 2020

Have you looked at the docs for reading TFRecord files containing tf.train.Examples?

It looks like you’re skipping the deserialization step (converting the serialized tf.train.Example protos to dictionaries of tensors).

0reactions

Data-Jackcommented, Apr 26, 2021

@dgoldenberg-audiomack Yeah, I will do. My first guess was it was how it was being written.

Top Results From Across the Web

Save Apache Spark DataFrames as TFRecord files

Learn how to use spark-tensorflow-connector to save Apache Spark DataFrames to TFRecord files and load TFRecord with TensorFlow.

TFRecord and tf.train.Example | TensorFlow Core

Writing a TFRecord file. The easiest way to get the data into a dataset is to use the from_tensor_slices method. Applied to an...

Spark-TFRecord: Toward full support of TFRecord in Spark

How to use Spark-TFRecord. Spark-TFRecord is fully backward-compatible with Spark-Tensorflow-Connector. Migration is easy: just include the ...

A hands-on guide to TFRecords - Towards Data Science

And it took quite some time to get all these files loaded. This is where TFRecords (or large NumPy arrays, for that matter)...

Using TF-Records on Spark Cluster - nareshr8

Using just TF-Records, I was able to get a direct decrease in the training time 3x times. ... Thanks to the Spark Tensorflow...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Anyone able to load tfrecords into TFRS generated with the spark-to-tf-records connector?

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Prevent overfitting in models.

Some issues not covered in the tutorials