question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Table cells are all NULL if the Delta Lake table were earlier saved with option "delta.columnMapping.mode" as "name"

See original GitHub issue

Reproducibility

If the Delta Lake table is saved using the Python code:

(notice the setting delta.columnMapping.mode)

final_df.write.format("delta")\
    .option("path", f"hdfs://some_ip:9000/data-warehouse/ABC")\
    .option("delta.columnMapping.mode", "name")\
    .mode("overwrite")\
    .saveAsTable("ABC")

In Power BI later, fetching the table, using the Connector, as followed:

let
    Source = fn_ReadDeltaTable(Hdfs.Files("some_ip:50070/data-warehouse/ABC"), [UseFileBuffer=true])
in
    Source

Even though the columns are well recognized/displayed, the table cells are completely null. (i.e. null everywhere)

Comment

As far as I know, the reason to use this setting is to support “disassociate Delta lake table columns with their physical parquet filenames”; thus, allowing to rename/drop them or using names containing special characters. See documentation Column mapping on Databricks.

Unfortunately, it looks like the Connector doesn’t support such new option of Delta Lake.

Environment

  • Apache Spark v3.2.2
  • Hadoop/HDFS v3.3.4
  • openjdk 11.0.16.1 2022-08-12 (Temurin)
  • Python 3.10.4
  • Ubuntu 22 LTS
  • Power BI Desktop 2.109.642.0 64-bit (September 2022) for Windows 10

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
gbruecklcommented, Sep 15, 2022

I will have a look but i am quite sure this is currently not supported. The options I see is either implementing column mapping properly or throwing a better error as suggested Will keep you updated here!

0reactions
gbruecklcommented, Oct 3, 2022

thanks for the testing and the positive feedback @ThachNgocTran and @dominikpeter
I just created the PR with the fix https://github.com/delta-io/connectors/pull/448

Read more comments on GitHub >

github_iconTop Results From Across the Web

Delta tables: Cannot set default column mapping mode to ...
Hello,. I am trying to write Delta files for some CSV data. When I do. csv_dataframe.write.format("delta").save("/path/to/table.delta").
Read more >
Issues · delta-io/connectors - GitHub
Table cells are all NULL if the Delta Lake table were earlier saved with option "delta.columnMapping.mode" as "name". #444 opened on Sep 14...
Read more >
DataBricks: Ingesting CSV data to a Delta Live Table in ...
The documentation I've seen on the issue explains how to set the column mapping mode to 'name' AFTER a table has been created...
Read more >
Update Delta Lake table schema - Azure Databricks
Renaming existing columns. You can make these changes explicitly using DDL or implicitly using DML. Important. When you update a Delta table ......
Read more >
Table batch reads and writes - Delta Lake Documentation
For many Delta Lake operations on tables, you enable integration with Apache Spark DataSourceV2 and Catalog APIs (since 3.0) by setting configurations when...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found