question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Writes Fail due to Column Mismatch

See original GitHub issue

When writing out new ConceptMaps to an existing table, there can be an exception on column data-type mismatches:

An error occurred while calling o214.writeToDatabase.
: org.apache.spark.sql.AnalysisException: cannot resolve '`useContext`' due to data type mismatch: cannot cast array<struct<id:string,code:struct<id:string,system:string,version:string,code:string,display:string,userSelected:boolean>,valueQuantity:struct<id:string,value:decimal(12,4),comparator:string,unit:string,system:string,code:string>,valueRange:struct<id:string,low:struct<id:string,value:decimal(12,4),comparator:string,unit:string,system:string,code:string>,high:struct<id:string,value:decimal(12,4),comparator:string,unit:string,system:string,code:string>>,valueCodeableConcept:struct<id:string,coding:array<struct<id:string,system:string,version:string,code:string,display:string,userSelected:boolean>>,text:string>>> to array<struct<id:string,code:struct<id:string,system:string,version:string,code:string,display:string,userSelected:boolean>,valueCodeableConcept:struct<id:string,coding:array<struct<id:string,system:string,version:string,code:string,display:string,userSelected:boolean>>,text:string>,valueQuantity:struct<id:string,value:decimal(12,4),comparator:string,unit:string,system:string,code:string>,valueRange:struct<id:string,low:struct<id:string,value:decimal(12,4),comparator:string,unit:string,system:string,code:string>,high:struct<id:string,value:decimal(12,4),comparator:string,unit:string,system:string,code:string>>>>;;
'InsertIntoHadoopFsRelationCommand location/warehouse/ontologies.db/conceptmaps, false, [timestamp#5475], Parquet, Map(serialization.format -> 1, path -> location/warehouse/ontologies.db/conceptmaps), Append, CatalogTable(
Database: ontologies
Table: conceptmaps
Owner: hadoop
Created Time: Mon Aug 06 20:33:27 UTC 2018
Last Access: Thu Jan 01 00:00:00 UTC 1970
Created By: Spark 2.3.0
Type: MANAGED
Provider: parquet
Table Properties: [transient_lastDdlTime=1533587608]
Location: location/warehouse/ontologies.db/conceptmaps
Serde Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
Storage Properties: [serialization.format=1]
Partition Provider: Catalog
Partition Columns: [`timestamp`]

On analysis it appears the Hive table schema can drift from the dynamically created Spark schema; columns can be in different orders. We can compare gists of the schemas:

Schema for a Dataset<ConceptMap> using Bunsen 0.4.5, using HAPI 3.3.0.

Schema for an existing ConceptMap table, built on a previous version of Bunsen. This schema differs from the first in column order of SourceUri/SourceReference, TargetUri/TargetReference, and useContext.valueQuantity fields (valueQuantity being in a different position is what is conveyed by the error message at the top).

Schema for a new ConceptMap table, built from the Dataset. This schema matches the first.

Even if we load the original table using Bunsen APIs

ontologies_maps = get_concept_maps(spark, "ontologies")

ontologies_maps.get_maps().printSchema()

as opposed to Spark APIs

spark.table("ontologies.conceptmaps").printSchema()

the result is still a mismatch to the Dataset<ConceptMap> we’d intend to write.

I don’t think this is related to issues we’ve seen with Spark in the past, where we have to explicitly SELECT columns in a particular order to avoid data being written under the wrong column.

I think this is an issue related to the order the a RuntimeElement returns information about its children in the EncoderBuilder. Digging into ConceptMap.useContext.value, comparing the Encoder schema for different versions of Bunsen again, we see the differences seen at the table/dataset schema level, and if we dig more deeply into the EncoderBuilder runtime, we find that depending on the HAPI version, we get different orders in the ChoiceType children for ConceptMap.useContext.value, and those orders match the differences we see in the Dataset and table schemas.

This amounts to tables for a given STU release being subject to non-passive changes (even though updates within HAPI for an STU release should be purely passive in regards to the resources).

The simplest thing to do is to just drop/archive the tables and rebuild them with the latest Bunsen version, but this requirement might be unexpected to users while consuming Bunsen over a HAPI version on the same FHIR STU release.

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
rbrushcommented, Feb 22, 2019

The existing code assumes the HAPI API preserved order when it does not, but I don’t think this will be a problem once we bring forward the 0.5.0-dev branch (and adopt its ordering). That branch uses the FHIR resource definition, which unambiguously specifies field ordering, and the same resource definition will produce the same field ordering independently of the version of HAPI.

0reactions
dmartino88commented, Oct 11, 2019

The existing code assumes the HAPI API preserved order when it does not, but I don’t think this will be a problem once we bring forward the 0.5.0-dev branch (and adopt its ordering). That branch uses the FHIR resource definition, which unambiguously specifies field ordering, and the same resource definition will produce the same field ordering independently of the version of HAPI.

Any update about this issue?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Nullable column mismatch between Spark DataFrame & SQL ...
When i try to save, due to the mismatch it throws the error. ... column is NOT NULLABLE and SQL column is NULLABLE,...
Read more >
azure data factory copy activity failing DUE TO COLUMN ...
I am performing copy activity in ADF with source as csv file in gen1. which's copied to sql server. i am getting the...
Read more >
3 common foreign key mistakes (and how to avoid them)
Foreign key constraints are important to any SQL database, but they can also cause problems if they're not implemented correctly.
Read more >
Ubable to load data into table - Snowflake Community
1) Try running the COPY INTO with validation mode turned on to RETURN_ERRORS. · 2) Change your column mismatch option to FALSE ( ......
Read more >
Troubleshoot dual-write issues in finance and operations apps
Version mismatch error and upgrading dual-write solutions ... You might receive the following error messages when you try to run the table ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found