Cannot write nullable values to non-null column
See original GitHub issueIn my spark jobs I am reading from JSON and merging to iceberg. In my iceberg tables I would like to have NOT NULL constraints. However, when loading data from JSON, spark doesn’t enforce the schema nullability constraints. To work around this I have discovered two alternatives:
input_dataset = spark.read.schema(my_schema).json("s3://my_bucket/my_folder/").filter(col("my_key").isNotNull())
input_dataset = spark.createDataFrame(input_dataset.rdd, schema=my_schema).createOrReplaceTempView("source")
spark.sql("MERGE INTO my_table ...
or
--conf spark.sql.storeAssignmentPolicy=LEGACY
input_dataset = spark.read.schema(my_schema).json("s3://my_bucket/my_folder/").filter(col("my_key").isNotNull()).createOrReplaceTempView("source")
spark.sql("MERGE INTO my_table ...
The first option is very slow, adding 40-60 minutes to the processing time of my spark application. The second option seems too permissive. I have noticed that there is a configuration option named spark.sql.iceberg.check-nullability in the code. I’d like to propose that this option be included in the AssignmentAlignmentTrait to allow writers to bypass NULL constraints while preserving the other types of compatibility checks.
Thanks for consideration.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:16 (7 by maintainers)
Top Results From Across the Web
Problems with adding NOT NULL columns or making nullable ...
In this example, the problems involve the use of NULL values, and happen when you try to add a new column that can't...
Read more >[GitHub] [iceberg] rdblue commented on issue #2456
[GitHub] [iceberg] rdblue commented on issue #2456: Cannot write nullable values to non-null column · GitBox Wed, 28 Jul 2021 14:24:11 -0700.
Read more >Altering a Column from Null to Not Null in SQL Server - Chartio
Alter a column from NULL to not NULL in SQL Server by updating existing column data and altering the column data structure to...
Read more >FIX: "Attempting to set a non-NULL-able column's value to ...
Cause. Query Optimizer determines that the column that is referenced in the ISNULL() function is non-nullable because the join operator rejects NULL values....
Read more >How to make a column non-nullable in Spark Structured ...
For Spark in Batch mode, one way to change column nullability is by creating a new dataframe with a new schema that has...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
coalesce
the nullable field can change to not null likecoalesce(field, not_null)
any ideas @rdblue ?