question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

WARN Utils: An error occurred while trying to read the S3 bucket lifecycle configuration java.lang.NullPointerException

See original GitHub issue

Hello guys, I am getting this warn

WARN Utils$: An error occurred while trying to read the S3 bucket lifecycle configuration
java.lang.NullPointerException
        at java.lang.String.startsWith(String.java:1385)
        at java.lang.String.startsWith(String.java:1414)
        at com.databricks.spark.redshift.Utils$$anonfun$3.apply(Utils.scala:102)
        at com.databricks.spark.redshift.Utils$$anonfun$3.apply(Utils.scala:98)
        at scala.collection.Iterator$class.exists(Iterator.scala:753)
        at scala.collection.AbstractIterator.exists(Iterator.scala:1157)
        at scala.collection.IterableLike$class.exists(IterableLike.scala:77)
        at scala.collection.AbstractIterable.exists(Iterable.scala:54)
        at com.databricks.spark.redshift.Utils$.checkThatBucketHasObjectLifecycleConfiguration(Utils.scala:98)
        at com.databricks.spark.redshift.RedshiftWriter.saveToRedshift(RedshiftWriter.scala:361)
        at com.databricks.spark.redshift.DefaultSource.createRelation(DefaultSource.scala:106)
        at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:222)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:259)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:745)

I have seen this issue here before, but it still occurs for me.

I do have a lifecycle configuration for my bucket. I’ve traced this warn to this piece of code

def checkThatBucketHasObjectLifecycleConfiguration(
      tempDir: String,
      s3Client: AmazonS3Client): Unit = {
    try {
      val s3URI = createS3URI(Utils.fixS3Url(tempDir))
      val bucket = s3URI.getBucket
      assert(bucket != null, "Could not get bucket from S3 URI")
      val key = Option(s3URI.getKey).getOrElse("")
      val hasMatchingBucketLifecycleRule: Boolean = {
        val rules = Option(s3Client.getBucketLifecycleConfiguration(bucket))
          .map(_.getRules.asScala)
          .getOrElse(Seq.empty)
        rules.exists { rule =>
          // Note: this only checks that there is an active rule which matches the temp directory;
          // it does not actually check that the rule will delete the files. This check is still
          // better than nothing, though, and we can always improve it later.
          rule.getStatus == BucketLifecycleConfiguration.ENABLED && key.startsWith(rule.getPrefix)
        }
      }
      if (!hasMatchingBucketLifecycleRule) {
        log.warn(s"The S3 bucket $bucket does not have an object lifecycle configuration to " +
          "ensure cleanup of temporary files. Consider configuring `tempdir` to point to a " +
          "bucket with an object lifecycle policy that automatically deletes files after an " +
          "expiration period. For more information, see " +
          "https://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html")
      }
    } catch {
      case NonFatal(e) =>
        log.warn("An error occurred while trying to read the S3 bucket lifecycle configuration", e)
    }
  }

I believe the exception is thrown because of this key.startsWith(rule.getPrefix)

I checked the Amazon SDK documents, the method getPrefix returns null if the prefix wasn’t set using the setPrefix method, therefore it will always return null in this case.

I have a very limited knowledge of the Amazon SDK and Scala, so I’m not really sure about this.

Issue Analytics

  • State:open
  • Created 6 years ago
  • Reactions:8
  • Comments:17

github_iconTop GitHub Comments

6reactions
RyanZotticommented, Mar 9, 2018

I agree that this is a super annoying error, since the stack trace is so long. This solution worked for me:

spark.sparkContext.setLogLevel("ERROR")

I got the suggestion from here.

6reactions
dmnavacommented, May 12, 2017

The same here:

17/05/12 13:57:56 WARN redshift.Utils$: An error occurred while trying to read the S3 bucket lifecycle configuration
java.lang.NullPointerException
	at java.lang.String.startsWith(String.java:1405)
	at java.lang.String.startsWith(String.java:1434)
	at com.databricks.spark.redshift.Utils$$anonfun$5.apply(Utils.scala:140)
	at com.databricks.spark.redshift.Utils$$anonfun$5.apply(Utils.scala:136)
	at scala.collection.Iterator$class.exists(Iterator.scala:919)
	at scala.collection.AbstractIterator.exists(Iterator.scala:1336)
	at scala.collection.IterableLike$class.exists(IterableLike.scala:77)
	at scala.collection.AbstractIterable.exists(Iterable.scala:54)
	at com.databricks.spark.redshift.Utils$.checkThatBucketHasObjectLifecycleConfiguration(Utils.scala:136)
	at com.databricks.spark.redshift.RedshiftWriter.saveToRedshift(RedshiftWriter.scala:389)
	at com.databricks.spark.redshift.DefaultSource.createRelation(DefaultSource.scala:108)
	at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
...
...

I thought it had to do with not setting a bucket prefix when configuring the lifecycle policy but even after setting it, it keeps showing (although the operation succeeds)

Read more comments on GitHub >

github_iconTop Results From Across the Web

S3 Lifecycle error on AWS EMR reading AWS Redshift using ...
I am trying to read redshift table into EMR cluster using pyspark. ... WARN Utils$: An error occurred while trying to read the...
Read more >
WARN Utils: An error occurred while trying to read the S3 ...
Hello guys, I am getting this warn. WARN Utils$: An error occurred while trying to read the S3 bucket lifecycle configuration java.lang.
Read more >
Issue with Spark-redshift - Apache Mail Archives
I am trying to read data from redshift table using spark-redshift ... WARN Utils$: An error occurred while trying to read the S3...
Read more >
Accessing Redshift fails with NullPointerException - Databricks
Problem Sometimes when you read a Redshift table: %scala val original_df = spark.read. format(
Read more >
AWS Lambda function errors in Java
This page describes how to view Lambda function invocation errors for the Java runtime using the Lambda console and the AWS CLI.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found