question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cannot write nullable values to non-null column

See original GitHub issue

In my spark jobs I am reading from JSON and merging to iceberg. In my iceberg tables I would like to have NOT NULL constraints. However, when loading data from JSON, spark doesn’t enforce the schema nullability constraints. To work around this I have discovered two alternatives:

input_dataset = spark.read.schema(my_schema).json("s3://my_bucket/my_folder/").filter(col("my_key").isNotNull())
input_dataset = spark.createDataFrame(input_dataset.rdd, schema=my_schema).createOrReplaceTempView("source")
spark.sql("MERGE INTO my_table ...

or

--conf spark.sql.storeAssignmentPolicy=LEGACY
input_dataset = spark.read.schema(my_schema).json("s3://my_bucket/my_folder/").filter(col("my_key").isNotNull()).createOrReplaceTempView("source")
spark.sql("MERGE INTO my_table ...

The first option is very slow, adding 40-60 minutes to the processing time of my spark application. The second option seems too permissive. I have noticed that there is a configuration option named spark.sql.iceberg.check-nullability in the code. I’d like to propose that this option be included in the AssignmentAlignmentTrait to allow writers to bypass NULL constraints while preserving the other types of compatibility checks.

https://github.com/apache/iceberg/blob/apache-iceberg-0.11.1/spark3-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/AssignmentAlignmentSupport.scala#L152

Thanks for consideration.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:16 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
redsnow1992commented, Nov 26, 2021

coalesce the nullable field can change to not null like coalesce(field, not_null)

0reactions
vikrambohracommented, Mar 2, 2022

any ideas @rdblue ?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Problems with adding NOT NULL columns or making nullable ...
In this example, the problems involve the use of NULL values, and happen when you try to add a new column that can't...
Read more >
[GitHub] [iceberg] rdblue commented on issue #2456
[GitHub] [iceberg] rdblue commented on issue #2456: Cannot write nullable values to non-null column · GitBox Wed, 28 Jul 2021 14:24:11 -0700.
Read more >
Altering a Column from Null to Not Null in SQL Server - Chartio
Alter a column from NULL to not NULL in SQL Server by updating existing column data and altering the column data structure to...
Read more >
FIX: "Attempting to set a non-NULL-able column's value to ...
Cause. Query Optimizer determines that the column that is referenced in the ISNULL() function is non-nullable because the join operator rejects NULL values....
Read more >
How to make a column non-nullable in Spark Structured ...
For Spark in Batch mode, one way to change column nullability is by creating a new dataframe with a new schema that has...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found