Table cells are all NULL if the Delta Lake table were earlier saved with option "delta.columnMapping.mode" as "name"
See original GitHub issueReproducibility
If the Delta Lake table is saved using the Python code:
(notice the setting delta.columnMapping.mode
)
final_df.write.format("delta")\
.option("path", f"hdfs://some_ip:9000/data-warehouse/ABC")\
.option("delta.columnMapping.mode", "name")\
.mode("overwrite")\
.saveAsTable("ABC")
In Power BI later, fetching the table, using the Connector, as followed:
let
Source = fn_ReadDeltaTable(Hdfs.Files("some_ip:50070/data-warehouse/ABC"), [UseFileBuffer=true])
in
Source
Even though the columns are well recognized/displayed, the table cells are completely null. (i.e. null everywhere)
Comment
As far as I know, the reason to use this setting is to support “disassociate Delta lake table columns with their physical parquet filenames”; thus, allowing to rename/drop them or using names containing special characters. See documentation Column mapping on Databricks.
Unfortunately, it looks like the Connector doesn’t support such new option of Delta Lake.
Environment
- Apache Spark v3.2.2
- Hadoop/HDFS v3.3.4
- openjdk 11.0.16.1 2022-08-12 (Temurin)
- Python 3.10.4
- Ubuntu 22 LTS
- Power BI Desktop 2.109.642.0 64-bit (September 2022) for Windows 10
Issue Analytics
- State:
- Created a year ago
- Comments:7 (4 by maintainers)
Top Results From Across the Web
Delta tables: Cannot set default column mapping mode to ...
Hello,. I am trying to write Delta files for some CSV data. When I do. csv_dataframe.write.format("delta").save("/path/to/table.delta").
Read more >Issues · delta-io/connectors - GitHub
Table cells are all NULL if the Delta Lake table were earlier saved with option "delta.columnMapping.mode" as "name". #444 opened on Sep 14...
Read more >DataBricks: Ingesting CSV data to a Delta Live Table in ...
The documentation I've seen on the issue explains how to set the column mapping mode to 'name' AFTER a table has been created...
Read more >Update Delta Lake table schema - Azure Databricks
Renaming existing columns. You can make these changes explicitly using DDL or implicitly using DML. Important. When you update a Delta table ......
Read more >Table batch reads and writes - Delta Lake Documentation
For many Delta Lake operations on tables, you enable integration with Apache Spark DataSourceV2 and Catalog APIs (since 3.0) by setting configurations when...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I will have a look but i am quite sure this is currently not supported. The options I see is either implementing column mapping properly or throwing a better error as suggested Will keep you updated here!
thanks for the testing and the positive feedback @ThachNgocTran and @dominikpeter
I just created the PR with the fix https://github.com/delta-io/connectors/pull/448