question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DataFrameWriterV2 doesn't create partitions when using partitionedBy

See original GitHub issue

When I try using the DataFrameWriterV2 to create an Iceberg table, the partitioning columns seem to get ignored (thereby not partitioned) even though they are specified.

Spark version: 3.1.2 Iceberg version: 0.13.0

Note: The application is written in python.

orders_df.writeTo("iceberg_catalog.p2.orders").using("iceberg").option(
    "fanout-enabled", "true"
).partitionedBy("date(pickupts)", "appname").create()

Using Spark SQL to describe the table using this way shows this:

# Partitioning
Not partitioned

The workaround for creating the tables with the appropriate partitions is via Spark SQL.

i.e. this works as expected (see note below snippet):

orders_df.createOrReplaceTempView("p2_orders_view")
spark.sql(
    """
    create table iceberg_catalog.p2.orders using iceberg
    PARTITIONED BY (date(pickupts), appname)
    as select * from p2_orders_view
    """
)

Here, when using Spark SQL to describe the table using this above method to create the table shows this:

# Partitioning
Part 0	days(pickupts)
Part 1	appname

My question is, am I using the DataFrameWriterV2 correctly when specifying the partitionedBy columns?

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:9 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
liuchunhuacommented, Apr 3, 2022
0reactions
github-actions[bot]commented, Nov 17, 2022

This issue has been closed because it has not received any activity in the last 14 days since being marked as ‘stale’

Read more comments on GitHub >

github_iconTop Results From Across the Web

DataFrameWriterV2 (Spark 3.0.0 JavaDoc)
Partition the output table created by create , createOrReplace , or replace using the given columns or transforms. When specified, the table data...
Read more >
Writing multiple partition specs to Apache Iceberg table
For batch writes, partition expressions are now supported in partitionBy using Spark 3 new DataFrameWriterV2 API:
Read more >
[SPARK-28612][SQL] Add DataFrameWriterV2 API
Does not allow table configuration options for operations that cannot ... Unit + + /** + * Partition the output table created by...
Read more >
Table batch reads and writes - Delta Lake Documentation
You can also create Delta tables using the Spark DataFrameWriterV2 API. ... To partition data when you create a Delta table, specify a...
Read more >
Managing Spark Partitions with Coalesce and Repartition
Let's create a homerDf from the numbersDf with two partitions. ... It does not attempt to minimize data movement like the coalesce algorithm....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found