WARN Utils: An error occurred while trying to read the S3 bucket lifecycle configuration java.lang.NullPointerException
See original GitHub issueHello guys, I am getting this warn
WARN Utils$: An error occurred while trying to read the S3 bucket lifecycle configuration
java.lang.NullPointerException
at java.lang.String.startsWith(String.java:1385)
at java.lang.String.startsWith(String.java:1414)
at com.databricks.spark.redshift.Utils$$anonfun$3.apply(Utils.scala:102)
at com.databricks.spark.redshift.Utils$$anonfun$3.apply(Utils.scala:98)
at scala.collection.Iterator$class.exists(Iterator.scala:753)
at scala.collection.AbstractIterator.exists(Iterator.scala:1157)
at scala.collection.IterableLike$class.exists(IterableLike.scala:77)
at scala.collection.AbstractIterable.exists(Iterable.scala:54)
at com.databricks.spark.redshift.Utils$.checkThatBucketHasObjectLifecycleConfiguration(Utils.scala:98)
at com.databricks.spark.redshift.RedshiftWriter.saveToRedshift(RedshiftWriter.scala:361)
at com.databricks.spark.redshift.DefaultSource.createRelation(DefaultSource.scala:106)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:222)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
I have seen this issue here before, but it still occurs for me.
I do have a lifecycle configuration for my bucket. I’ve traced this warn to this piece of code
def checkThatBucketHasObjectLifecycleConfiguration(
tempDir: String,
s3Client: AmazonS3Client): Unit = {
try {
val s3URI = createS3URI(Utils.fixS3Url(tempDir))
val bucket = s3URI.getBucket
assert(bucket != null, "Could not get bucket from S3 URI")
val key = Option(s3URI.getKey).getOrElse("")
val hasMatchingBucketLifecycleRule: Boolean = {
val rules = Option(s3Client.getBucketLifecycleConfiguration(bucket))
.map(_.getRules.asScala)
.getOrElse(Seq.empty)
rules.exists { rule =>
// Note: this only checks that there is an active rule which matches the temp directory;
// it does not actually check that the rule will delete the files. This check is still
// better than nothing, though, and we can always improve it later.
rule.getStatus == BucketLifecycleConfiguration.ENABLED && key.startsWith(rule.getPrefix)
}
}
if (!hasMatchingBucketLifecycleRule) {
log.warn(s"The S3 bucket $bucket does not have an object lifecycle configuration to " +
"ensure cleanup of temporary files. Consider configuring `tempdir` to point to a " +
"bucket with an object lifecycle policy that automatically deletes files after an " +
"expiration period. For more information, see " +
"https://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html")
}
} catch {
case NonFatal(e) =>
log.warn("An error occurred while trying to read the S3 bucket lifecycle configuration", e)
}
}
I believe the exception is thrown because of this
key.startsWith(rule.getPrefix)
I checked the Amazon SDK documents, the method getPrefix returns null if the prefix wasn’t set using the setPrefix method, therefore it will always return null in this case.
I have a very limited knowledge of the Amazon SDK and Scala, so I’m not really sure about this.
Issue Analytics
- State:
- Created 6 years ago
- Reactions:8
- Comments:17
Top Results From Across the Web
S3 Lifecycle error on AWS EMR reading AWS Redshift using ...
I am trying to read redshift table into EMR cluster using pyspark. ... WARN Utils$: An error occurred while trying to read the...
Read more >WARN Utils: An error occurred while trying to read the S3 ...
Hello guys, I am getting this warn. WARN Utils$: An error occurred while trying to read the S3 bucket lifecycle configuration java.lang.
Read more >Issue with Spark-redshift - Apache Mail Archives
I am trying to read data from redshift table using spark-redshift ... WARN Utils$: An error occurred while trying to read the S3...
Read more >Accessing Redshift fails with NullPointerException - Databricks
Problem Sometimes when you read a Redshift table: %scala val original_df = spark.read. format(
Read more >AWS Lambda function errors in Java
This page describes how to view Lambda function invocation errors for the Java runtime using the Lambda console and the AWS CLI.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I agree that this is a super annoying error, since the stack trace is so long. This solution worked for me:
I got the suggestion from here.
The same here:
I thought it had to do with not setting a bucket prefix when configuring the lifecycle policy but even after setting it, it keeps showing (although the operation succeeds)