question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

spark 3.1.1 Caused by: java.lang.ClassNotFoundException: tfrecords.DefaultSource

See original GitHub issue

spark 3.1.1

Using Scala version 2.12.10 (Eclipse OpenJ9 VM, Java 11.0.10) spark-tfrecord 0.3.4 libraryDependencies += “com.linkedin.sparktfrecord” %% “spark-tfrecord” % “0.3.4” 启动方式 spark-shell --jars /data/spark/jars/spark-tfrecord_2.12-0.3.4.jar

import org.apache.spark.sql.SaveMode
val caseFinalModelFeaturePath ="hdfs:///auth/data/model/salecase_warehouse/case_model_feature_snappy.parquet"
val finalInputDf = spark.read.parquet(caseFinalModelFeaturePath)
val caseFinalTFRecordPath ="file:///data/model/salecase_warehouse/case_model_tfrecord"
finalInputDf.coalesce(10).write.format("tfrecords").option("recordType", "Example")
      .option("codec", "org.apache.hadoop.io.compress.GzipCodec")
      .mode(SaveMode.Overwrite)
      .save(caseFinalTFRecordPath)

meet error java.lang.ClassNotFoundException: Failed to find data source: tfrecords. Please find packages at http://spark.apache.org/third-party-projects.html

java.lang.ClassNotFoundException: Failed to find data source: tfrecords. Please find packages at http://spark.apache.org/third-party-projects.html
  at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:689)
  at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:743)
  at org.apache.spark.sql.DataFrameWriter.lookupV2Provider(DataFrameWriter.scala:993)
  at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:311)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
  ... 58 elided
Caused by: java.lang.ClassNotFoundException: tfrecords.DefaultSource
  at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:72)
  at java.base/java.lang.ClassLoader.loadClassHelper(ClassLoader.java:1185)
  at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:1100)
  at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:1083)
  at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$5(DataSource.scala:663)
  at org.apache.spark.sql.execution.datasources.DataSource$$$Lambda$7483/0x0000000000000000.apply(Unknown Source)
  at scala.util.Try$.apply(Try.scala:213)
  at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$4(DataSource.scala:663)
  at org.apache.spark.sql.execution.datasources.DataSource$$$Lambda$4336/0x0000000000000000.apply(Unknown Source)
  at scala.util.Failure.orElse(Try.scala:224)
  at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:663)
  ... 62 more

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
mullerhaicommented, Jun 28, 2022

It looks like you did not set the jar properly. Make sure this file is valid: /data/spark/jars/spark-tfrecord_2.12-0.3.4.jar

Or you can try pulling from maven central: spark-shell --packages com.linkedin.sparktfrecord:spark-tfrecord_2.12:0.4.0 You need maven central repo access for this one to work. I found just we change the symbol word for write & read tfrecord, old version :.write.format(“tfrecords”) , new version .write.format(“tfrecord”)

0reactions
mullerhaicommented, Jun 28, 2022

glad you figured it out.

it is my pleasure

Read more comments on GitHub >

github_iconTop Results From Across the Web

Error with Saving DataFrame to TFRecords in Spark
I am trying to save dataframe to TFrecord file in spark-shell which ... DataFrameWriter@da1382f scala> s.save("tmp/tfrecords") java.lang.
Read more >
Migration Guide: SQL, Datasets and DataFrame - Apache Spark
This may cause Spark throw AnalysisException of the CANNOT_UP_CAST_DATATYPE error class when using views created by prior versions.
Read more >
Save data from Spark DataFrames to TFRecords
Save data from Spark DataFrames to TFRecords ... java.lang. ... /databricks/spark/python/pyspark/sql/readwriter.py in save(self, path, format, mode, ...
Read more >
spark-bigquery-connector - Scaladex
Apache Spark SQL connector for Google BigQuery. The connector supports reading Google BigQuery tables into Spark's DataFrames, and writing DataFrames back ...
Read more >
学习笔记---java.lang.ClassNotFoundException: Failed to find ...
ApplicationMaster$$anon$4.run(ApplicationMaster.scala:721) Caused by: java.lang.ClassNotFoundException: tfrecords.DefaultSource at java.net.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found