Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Spark cannot skip glue table name validation when creating ICEBERG tabel

See original GitHub issue

Apache Iceberg version

0.13.0

Query engine

Spark

Please describe the bug 🐞

iceberg-spark3-runtime version: 0.13.0

I attempted to skip the glut table name validation by setting glue.skip-name-validation to true. However none of the following spark sql was successful.

iceberg catalog properties:

spark-shell --packages $DEPENDENCIES \
    --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
    --conf spark.sql.catalog.my_catalog.glue.skip-name-validation=true \
    --conf spark.sql.catalog.my_catalog.warehouse=<s3-placeholder>\
    --conf spark.sql.catalog.my_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog \
    --conf spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO

spark sql queries tried so far

spark.sql("""CREATE TABLE IF NOT EXISTS my_catalog.db.`iceberg-table` ( id string, 
creation_date string, 
last_update_time string) 
LOCATION '<my-s3-bucket>' 
TBLPROPERTIES ('table_type'='ICEBERG', 'format'='parquet', 'glue.skip-name-validation'=true) """)

spark.sql("""CREATE TABLE IF NOT EXISTS my_catalog.db.`iceberg-table` (id string,
creation_date string,
last_update_time string)
USING iceberg
OPTIONS ( 'glue.skip-name-validation'=true )
LOCATION '<my-s3-bucket>'' """)

Error Stack trace:

java.lang.IllegalArgumentException: Invalid table identifier: db.iceberg-table
  at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkArgument(Preconditions.java:217)
  at org.apache.iceberg.BaseMetastoreCatalog$BaseMetastoreCatalogTableBuilder.<init>(BaseMetastoreCatalog.java:115)
  at org.apache.iceberg.BaseMetastoreCatalog.buildTable(BaseMetastoreCatalog.java:68)
  at org.apache.iceberg.spark.SparkCatalog.newBuilder(SparkCatalog.java:578)
  at org.apache.iceberg.spark.SparkCatalog.createTable(SparkCatalog.java:148)
  at org.apache.iceberg.spark.SparkCatalog.createTable(SparkCatalog.java:92)

Besides I also tried to write data to the glue table, also failed. The ICEBERG table cannot be created via spark, however I can create such a table by using Athena query.

df.writeTo("my_catalog.db.`iceberg-table`").append()

Got table nor found

org.apache.spark.sql.AnalysisException: Table or view not found: my_catalog.db.`iceberg-table`;
'AppendData 'UnresolvedRelation [my_catalog, db, iceberg-table], [], false, true
+- Project [_1#3 AS id#10, _2#4 AS creation_date#11, _3#5 AS last_update_time#12]
   +- LocalRelation [_1#3, _2#4, _3#5]

  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:134)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94)
  at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:302)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:91)
  at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:172)
  at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:195)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
  at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:192)
  at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:90)
  at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:192)
  at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:224)
  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
  at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:224)
  at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:90)
  at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:88)
  at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:95)
  at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:93)
  at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:136)
  at org.apache.spark.sql.DataFrameWriterV2.runCommand(DataFrameWriterV2.scala:194)
  at org.apache.spark.sql.DataFrameWriterV2.append(DataFrameWriterV2.scala:148)

I know I am not following the Glue/Athena best practices here: https://docs.aws.amazon.com/athena/latest/ug/glue-best-practices.html, however for purpose of backwards compatibility with my existing glue table naming format, I am still trying to figure out if it is viable to use dashes in ICEBERG table name.

Issue Analytics

State:
Created a year ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

can-suncommented, Sep 15, 2022

Thanks a lot! I just tried 0.14.1 and it works. Glue table can be created successfully and data insert also succeeded! I am closing this issue : )

1reaction

can-suncommented, Sep 15, 2022

Gotcha, you are right, I just noticed that for >= 0.14.0 the jar was released based on different version of spark. Does this mean I have to use a specific version of the iceberg-runtime jar according to the version of scala and spark? Just wondered if there would be any incompatibility issues.

Let me try 0.14.0 first and see if I can make it : ) Thanks for your help, will update this open issue if I have any progress.