question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Spark cannot skip glue table name validation when creating ICEBERG tabel

See original GitHub issue

Apache Iceberg version

0.13.0

Query engine

Spark

Please describe the bug 🐞

iceberg-spark3-runtime version: 0.13.0

I attempted to skip the glut table name validation by setting glue.skip-name-validation to true. However none of the following spark sql was successful.

iceberg catalog properties:

spark-shell --packages $DEPENDENCIES \
    --conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
    --conf spark.sql.catalog.my_catalog.glue.skip-name-validation=true \
    --conf spark.sql.catalog.my_catalog.warehouse=<s3-placeholder>\
    --conf spark.sql.catalog.my_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog \
    --conf spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO

spark sql queries tried so far

spark.sql("""CREATE TABLE IF NOT EXISTS my_catalog.db.`iceberg-table` ( id string, 
creation_date string, 
last_update_time string) 
LOCATION '<my-s3-bucket>' 
TBLPROPERTIES ('table_type'='ICEBERG', 'format'='parquet', 'glue.skip-name-validation'=true) """)

spark.sql("""CREATE TABLE IF NOT EXISTS my_catalog.db.`iceberg-table` (id string,
creation_date string,
last_update_time string)
USING iceberg
OPTIONS ( 'glue.skip-name-validation'=true )
LOCATION '<my-s3-bucket>'' """)

Error Stack trace:

java.lang.IllegalArgumentException: Invalid table identifier: db.iceberg-table
  at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkArgument(Preconditions.java:217)
  at org.apache.iceberg.BaseMetastoreCatalog$BaseMetastoreCatalogTableBuilder.<init>(BaseMetastoreCatalog.java:115)
  at org.apache.iceberg.BaseMetastoreCatalog.buildTable(BaseMetastoreCatalog.java:68)
  at org.apache.iceberg.spark.SparkCatalog.newBuilder(SparkCatalog.java:578)
  at org.apache.iceberg.spark.SparkCatalog.createTable(SparkCatalog.java:148)
  at org.apache.iceberg.spark.SparkCatalog.createTable(SparkCatalog.java:92)

Besides I also tried to write data to the glue table, also failed. The ICEBERG table cannot be created via spark, however I can create such a table by using Athena query.

df.writeTo("my_catalog.db.`iceberg-table`").append()

Got table nor found

org.apache.spark.sql.AnalysisException: Table or view not found: my_catalog.db.`iceberg-table`;
'AppendData 'UnresolvedRelation [my_catalog, db, iceberg-table], [], false, true
+- Project [_1#3 AS id#10, _2#4 AS creation_date#11, _3#5 AS last_update_time#12]
   +- LocalRelation [_1#3, _2#4, _3#5]

  at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:134)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94)
  at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:302)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94)
  at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:91)
  at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:172)
  at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:195)
  at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
  at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:192)
  at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:90)
  at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:192)
  at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:224)
  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
  at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:224)
  at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:90)
  at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:88)
  at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:95)
  at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:93)
  at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:136)
  at org.apache.spark.sql.DataFrameWriterV2.runCommand(DataFrameWriterV2.scala:194)
  at org.apache.spark.sql.DataFrameWriterV2.append(DataFrameWriterV2.scala:148)

I know I am not following the Glue/Athena best practices here: https://docs.aws.amazon.com/athena/latest/ug/glue-best-practices.html, however for purpose of backwards compatibility with my existing glue table naming format, I am still trying to figure out if it is viable to use dashes in ICEBERG table name.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
can-suncommented, Sep 15, 2022

Thanks a lot! I just tried 0.14.1 and it works. Glue table can be created successfully and data insert also succeeded! I am closing this issue : )

1reaction
can-suncommented, Sep 15, 2022

Gotcha, you are right, I just noticed that for >= 0.14.0 the jar was released based on different version of spark. Does this mean I have to use a specific version of the iceberg-runtime jar according to the version of scala and spark? Just wondered if there would be any incompatibility issues.

Let me try 0.14.0 first and see if I can make it : ) Thanks for your help, will update this open issue if I have any progress.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Facing error when creating iceberg table in EMR using Glue ...
can try setting glue.skip-name-validation via catalog properties if you wanna skip these validations : It is very hard to figure out how to...
Read more >
Iceberg AWS Integrations
By default, Glue stores all the table versions created and user can rollback a table ... Allow user to skip name validation for...
Read more >
Creating Iceberg tables - Amazon Athena - AWS Documentation
Athena CREATE TABLE creates an Iceberg table with no data. ... such as Apache Spark directly if the table uses the Iceberg open...
Read more >
Unable to query Iceberg table from PySpark script in AWS Glue
I'm trying to read data from an iceberg table, the data is in ORC format and partitioned by column. I'm getting this error...
Read more >
Implement a CDC-based UPSERT in a data lake using ...
Create a connection by providing a name and choosing Create connection ... To create input and output Iceberg tables in the AWS Glue...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found