Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cannot write nullable values to non-null column

See original GitHub issue

In my spark jobs I am reading from JSON and merging to iceberg. In my iceberg tables I would like to have NOT NULL constraints. However, when loading data from JSON, spark doesn’t enforce the schema nullability constraints. To work around this I have discovered two alternatives:

input_dataset = spark.read.schema(my_schema).json("s3://my_bucket/my_folder/").filter(col("my_key").isNotNull())
input_dataset = spark.createDataFrame(input_dataset.rdd, schema=my_schema).createOrReplaceTempView("source")
spark.sql("MERGE INTO my_table ...

--conf spark.sql.storeAssignmentPolicy=LEGACY
input_dataset = spark.read.schema(my_schema).json("s3://my_bucket/my_folder/").filter(col("my_key").isNotNull()).createOrReplaceTempView("source")
spark.sql("MERGE INTO my_table ...

The first option is very slow, adding 40-60 minutes to the processing time of my spark application. The second option seems too permissive. I have noticed that there is a configuration option named spark.sql.iceberg.check-nullability in the code. I’d like to propose that this option be included in the AssignmentAlignmentTrait to allow writers to bypass NULL constraints while preserving the other types of compatibility checks.

https://github.com/apache/iceberg/blob/apache-iceberg-0.11.1/spark3-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/AssignmentAlignmentSupport.scala#L152

Thanks for consideration.

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:16 (7 by maintainers)

Top GitHub Comments

1reaction

redsnow1992commented, Nov 26, 2021

coalesce the nullable field can change to not null like coalesce(field, not_null)

0reactions

vikrambohracommented, Mar 2, 2022

any ideas @rdblue ?

Top Results From Across the Web

Problems with adding NOT NULL columns or making nullable ...

In this example, the problems involve the use of NULL values, and happen when you try to add a new column that can't...

[GitHub] [iceberg] rdblue commented on issue #2456

[GitHub] [iceberg] rdblue commented on issue #2456: Cannot write nullable values to non-null column · GitBox Wed, 28 Jul 2021 14:24:11 -0700.

Altering a Column from Null to Not Null in SQL Server - Chartio

Alter a column from NULL to not NULL in SQL Server by updating existing column data and altering the column data structure to...

FIX: "Attempting to set a non-NULL-able column's value to ...

Cause. Query Optimizer determines that the column that is referenced in the ISNULL() function is non-nullable because the join operator rejects NULL values....

How to make a column non-nullable in Spark Structured ...

For Spark in Batch mode, one way to change column nullability is by creating a new dataframe with a new schema that has...