question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Overwrite using staging table fails when table has dependencies

See original GitHub issue

When using overwrite mode to save data to a table, and also leaving usestagingtable to its default value of true, the operation fails with the following error when the target table already has dependencies (e.g. a view depends on the table):

java.sql.SQLException: [Amazon](500310) Invalid operation: current transaction is aborted, commands ignored until end of transaction block;
    at com.amazon.redshift.client.messages.inbound.ErrorResponse.toErrorException(ErrorResponse.java:1830)
    at com.amazon.redshift.client.PGMessagingContext.handleErrorResponse(PGMessagingContext.java:804)
    at com.amazon.redshift.client.PGMessagingContext.handleMessage(PGMessagingContext.java:642)
    at com.amazon.jdbc.communications.InboundMessagesPipeline.getNextMessageOfClass(InboundMessagesPipeline.java:312)
    at com.amazon.redshift.client.PGMessagingContext.doMoveToNextClass(PGMessagingContext.java:1062)
    at com.amazon.redshift.client.PGMessagingContext.getParameterDescription(PGMessagingContext.java:978)
    at com.amazon.redshift.client.PGClient.prepareStatement(PGClient.java:1844)
    at com.amazon.redshift.dataengine.PGQueryExecutor.<init>(PGQueryExecutor.java:106)
    at com.amazon.redshift.dataengine.PGDataEngine.prepare(PGDataEngine.java:211)
    at com.amazon.jdbc.common.SPreparedStatement.<init>(Unknown Source)
    at com.amazon.jdbc.jdbc41.S41PreparedStatement.<init>(Unknown Source)
    at com.amazon.redshift.core.jdbc41.PGJDBC41PreparedStatement.<init>(PGJDBC41PreparedStatement.java:49)
    at com.amazon.redshift.core.jdbc41.PGJDBC41ObjectFactory.createPreparedStatement(PGJDBC41ObjectFactory.java:119)
    at com.amazon.jdbc.common.SConnection.prepareStatement(Unknown Source)
    at com.databricks.spark.redshift.RedshiftWriter.withStagingTable(RedshiftWriter.scala:137)
    at com.databricks.spark.redshift.RedshiftWriter.saveToRedshift(RedshiftWriter.scala:369)
    at com.databricks.spark.redshift.DefaultSource.createRelation(DefaultSource.scala:106)
    at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:222)
    at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
    at py4j.Gateway.invoke(Gateway.java:259)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:209)
Caused by: com.amazon.support.exceptions.ErrorException: [Amazon](500310) Invalid operation: current transaction is aborted, commands ignored until end of transaction block;
    ... 29 more

I tracked this error down to the following code in RedshiftWriter.scala:

    try {
      action(tempTable.toString)

      if (jdbcWrapper.tableExists(conn, table.toString)) {
        jdbcWrapper.executeInterruptibly(conn.prepareStatement(
          s"""
             | BEGIN;
             | ALTER TABLE $table RENAME TO ${backupTable.escapedTableName};
             | ALTER TABLE $tempTable RENAME TO ${table.escapedTableName};
             | DROP TABLE $backupTable;
             | END;
           """.stripMargin.trim))
      } else {
        jdbcWrapper.executeInterruptibly(conn.prepareStatement(
          s"ALTER TABLE $tempTable RENAME TO ${table.escapedTableName}"))
      }
    } finally {
      jdbcWrapper.executeInterruptibly(conn.prepareStatement(s"DROP TABLE IF EXISTS $tempTable"))
    }

When trying this transaction manually in SQL Workbench, I get the following error:

[Amazon](500310) Invalid operation: cannot drop table myschema.mytable because other objects depend on it;

I was hoping that spark-redshift would let this error (which is the actual culprit) bubble up when it happens, but instead I get the error I mentioned in the beginning. This is happening because the original exception is masked by another exception that happens due to the DROP TABLE IF EXISTS in the finally block, which fails because the transaction is in a bad state at this point, giving the error message Invalid operation: current transaction is aborted, commands ignored until end of transaction block.

I’m not sure what the best solution is in this case. I’m open to suggestions.

Issue Analytics

  • State:open
  • Created 7 years ago
  • Comments:12 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
dongjoon-hyuncommented, Jul 13, 2016

SPARK-16410 is trying to make SaveMode.Truncate.

SPARK-16463 is trying to make truncate option in SaveMode.Overwrite.

https://issues.apache.org/jira/browse/SPARK-16463 (https://github.com/apache/spark/pull/14086)

I think SPARK-16463 is the fast way to support TRUNCATE feature with minimal change. Also, PR is ready.

0reactions
moohebatcommented, Jul 11, 2022

Let’s say we have this scenario: There is a materialized view connected to the table. If I try to overwrite the table, as under the hood, the tables should be drop and recreate, I get an error due to the dependancies. If in the preaction, I drop the table with cascade effect, View also will be dropped, which is not preferable as the BI is connected to the view and it will cause a bad user experience. I should refresh the data every 15 minutes, is there any way that without effecting the Views, I write into tables by Spark?

Read more comments on GitHub >

github_iconTop Results From Across the Web

How do I replace multiple SQL tables without breaking ...
I'm not liking this solution as it seems error-prone at best. Leave the tables as-is, and slowly move all dependencies into functions or ......
Read more >
Dependencies and References in SQL Server - Simple Talk
A view, for example, that references tables is dependent upon them, and wherever that view is used the function, procedure or view that...
Read more >
Managing table data | BigQuery - Google Cloud
You can work with BigQuery table data in the following ways: Load data into a table; Append to or overwrite table data; Browse...
Read more >
Working with tables on the AWS Glueconsole - AWS Glue
A table in the AWS Glue Data Catalog is the metadata definition that ... The name is determined when the table is created,...
Read more >
Staging Process - an overview | ScienceDirect Topics
You need to decide the most efficient place to do each transformation: at the source, in the staging area, during the load operation,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found