question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Spark can't read Iceberg table created from Presto

See original GitHub issue

Spark can’t read table which was created in Presto

create table iceberg.examples.test_table 
with (format = 'parquet')
as select timestamp '2021-01-19 23:59:59.999999' as ts;

In Spark

spark.read
      .format("iceberg")
      .load("examples.test_table")
      .show(10, false)

Fails with exception java.lang.UnsupportedOperationException: Spark does not support timestamp without time zone fields

Please add support of Timestamp without timezone type into Iceberg Spark runtime, because Iceberg supports this type

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:4
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

3reactions
davseitsevcommented, Jan 26, 2021

According to Spark docs TimestampType represents values comprising values of fields year, month, day, hour, minute, and second, with the session local time-zone. The timestamp value represents an absolute point in time. So it’s more like LocalDateTime. Why can’t Iceberg expose timestamp without timezone as Spark timestamp?

As far as I understand the problem is that Spark has only one timestamp type but Iceberg has two types which causes some inconsistency in type mapping between Spark and Iceberg.

In my opinion it could be solved in a following way:

  1. For read access there is no problem, both Iceberg types (timestamp, timestamptz) could be exposed as Spark timestamp.
  2. For insert there is no problem because we know target type defined in the table metadata and can choose correct one
  3. For table creation we have inconsistency which could be solved by configuration property. Spark has such kind of property to choose how to store timestamp fields in parquet (spark.sql.parquet.outputTimestampType=(INT96 | TIMESTAMP_MICROS | TIMESTAMP_MILLIS)). Why can’t we have such logic for Iceberg?
Read more comments on GitHub >

github_iconTop Results From Across the Web

[GitHub] [iceberg] sshkvar opened a new issue #2122: Spark ...
Spark can't read table which was created in Presto ```java create table iceberg.examples.test_table with (format = 'parquet') as select ...
Read more >
Spark Queries - Apache Iceberg
Paths and table names can be loaded with Spark's DataFrameReader interface. How tables are loaded depends on how the identifier is specified. When...
Read more >
Use a cluster with Iceberg installed - Amazon EMR
Create an Iceberg cluster · Initialize a Spark session for Iceberg · Write to an Iceberg table · Read from an Iceberg table...
Read more >
How to make iceberg data find in metabase from presto?
My data is stored in Hive, which uses Iceberg table style and is called through Presto/Spark Sql, but No fields found for table....
Read more >
Iceberg connector — Trino 403 Documentation
All changes to table state create a new metadata file and replace the old metadata ... Whether schema locations should be deleted when...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found