Spark cannot skip glue table name validation when creating ICEBERG tabel
See original GitHub issueApache Iceberg version
0.13.0
Query engine
Spark
Please describe the bug 🐞
iceberg-spark3-runtime version: 0.13.0
I attempted to skip the glut table name validation by setting glue.skip-name-validation to true. However none of the following spark sql was successful.
iceberg catalog properties:
spark-shell --packages $DEPENDENCIES \
--conf spark.sql.catalog.my_catalog=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.my_catalog.glue.skip-name-validation=true \
--conf spark.sql.catalog.my_catalog.warehouse=<s3-placeholder>\
--conf spark.sql.catalog.my_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog \
--conf spark.sql.catalog.my_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO
spark sql queries tried so far
spark.sql("""CREATE TABLE IF NOT EXISTS my_catalog.db.`iceberg-table` ( id string,
creation_date string,
last_update_time string)
LOCATION '<my-s3-bucket>'
TBLPROPERTIES ('table_type'='ICEBERG', 'format'='parquet', 'glue.skip-name-validation'=true) """)
spark.sql("""CREATE TABLE IF NOT EXISTS my_catalog.db.`iceberg-table` (id string,
creation_date string,
last_update_time string)
USING iceberg
OPTIONS ( 'glue.skip-name-validation'=true )
LOCATION '<my-s3-bucket>'' """)
Error Stack trace:
java.lang.IllegalArgumentException: Invalid table identifier: db.iceberg-table
at org.apache.iceberg.relocated.com.google.common.base.Preconditions.checkArgument(Preconditions.java:217)
at org.apache.iceberg.BaseMetastoreCatalog$BaseMetastoreCatalogTableBuilder.<init>(BaseMetastoreCatalog.java:115)
at org.apache.iceberg.BaseMetastoreCatalog.buildTable(BaseMetastoreCatalog.java:68)
at org.apache.iceberg.spark.SparkCatalog.newBuilder(SparkCatalog.java:578)
at org.apache.iceberg.spark.SparkCatalog.createTable(SparkCatalog.java:148)
at org.apache.iceberg.spark.SparkCatalog.createTable(SparkCatalog.java:92)
Besides I also tried to write data to the glue table, also failed. The ICEBERG table cannot be created via spark, however I can create such a table by using Athena query.
df.writeTo("my_catalog.db.`iceberg-table`").append()
Got table nor found
org.apache.spark.sql.AnalysisException: Table or view not found: my_catalog.db.`iceberg-table`;
'AppendData 'UnresolvedRelation [my_catalog, db, iceberg-table], [], false, true
+- Project [_1#3 AS id#10, _2#4 AS creation_date#11, _3#5 AS last_update_time#12]
+- LocalRelation [_1#3, _2#4, _3#5]
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:134)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:302)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:91)
at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:172)
at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:195)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:192)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:90)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:192)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:224)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:224)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:90)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:88)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:95)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:93)
at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:136)
at org.apache.spark.sql.DataFrameWriterV2.runCommand(DataFrameWriterV2.scala:194)
at org.apache.spark.sql.DataFrameWriterV2.append(DataFrameWriterV2.scala:148)
I know I am not following the Glue/Athena best practices here: https://docs.aws.amazon.com/athena/latest/ug/glue-best-practices.html, however for purpose of backwards compatibility with my existing glue table naming format, I am still trying to figure out if it is viable to use dashes in ICEBERG table name.
Issue Analytics
- State:
- Created a year ago
- Comments:7 (4 by maintainers)
Top GitHub Comments
Thanks a lot! I just tried 0.14.1 and it works. Glue table can be created successfully and data insert also succeeded! I am closing this issue : )
Gotcha, you are right, I just noticed that for >= 0.14.0 the jar was released based on different version of spark. Does this mean I have to use a specific version of the iceberg-runtime jar according to the version of scala and spark? Just wondered if there would be any incompatibility issues.
Let me try 0.14.0 first and see if I can make it : ) Thanks for your help, will update this open issue if I have any progress.