question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

IAM Role not taken into account

See original GitHub issue

Hi,

I launch a job with spark-shell or spark-submit:

spark-shell --name "my test app" --packages com.databricks:spark-redshift_2.11:2.0.1

then

sparkSession.read.schema(schema).
      format("com.databricks.spark.redshift").
      option("url", redshiftUrl).
      option("user",redshiftUser).
      option("password",redshiftPassword).
      option("aws_iam_role", redshiftRoleArn).
      option("query", query).
      option("tempdir", s3TmpDir).
      load()

and I get the exception:

java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified
as the username or password (respectively) of a s3n URL, or by setting the fs.s3n.awsAccessKeyId
 or fs.s3n.awsSecretAccessKey properties
at com.databricks.spark.redshift.S3Credentials.initialize(S3Credentials.java:67)
at com.databricks.spark.redshift.AWSCredentialsUtils$.com$databricks$spark$redshift$AWSCredentialsUtils$$loadFromURI(AWSCredentialsUtils.scala:60)
    at com.databricks.spark.redshift.AWSCredentialsUtils$$anonfun$load$1.apply(AWSCredentialsUtils.scala:48)
    at com.databricks.spark.redshift.AWSCredentialsUtils$$anonfun$load$1.apply(AWSCredentialsUtils.scala:48)
    at scala.Option.getOrElse(Option.scala:121)
    at com.databricks.spark.redshift.AWSCredentialsUtils$.load(AWSCredentialsUtils.scala:48)
    at com.databricks.spark.redshift.RedshiftRelation.buildScan(RedshiftRelation.scala:89)
    at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$8.apply(DataSourceStrategy.scala:260)
    at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$8.apply(DataSourceStrategy.scala:260)
    at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:303)
...

The role works using psql client and UNLOAD write the result on s3TmpDir. I checked the code, and I am not really sure how IAM role can be handled… It looks like a bug to me.

PS: using spark-submit, my app have the following dependencies in build.sbt

   libraryDependencies ++= Seq(
      "com.databricks" %% "spark-redshift" % "2.0.1" % Compile,
      "joda-time" % "joda-time" % "2.9.4" % Compile,
      "com.amazonaws" % "aws-java-sdk-core" % "1.11.37" % Compile,
      "com.amazonaws" % "aws-java-sdk-s3" % "1.11.37" % Compile,
      "org.scalaj" %% "scalaj-http" % "2.3.0" % Compile,
      "net.liftweb" %% "lift-json" % "2.6.3" % Compile,
      "com.amazon.redshift" % "jdbc42.Driver" % "1.1.17.1017" from "https://s3.amazonaws.com/redshift-downloads/drivers/RedshiftJDBC42-1.1.17.1017.jar",
      "org.apache.spark" %% "spark-core" % "2.0.0" % Compile,
      "org.apache.spark" %% "spark-sql" % "2.0.0" % Compile,
      "org.apache.spark" %% "spark-hive" % "2.0.0" % Compile,
      "org.scalatest" %% "scalatest" % "3.0.0" % Test
    )

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:10 (2 by maintainers)

github_iconTop GitHub Comments

5reactions
JoshRosencommented, Sep 27, 2016

The problem here relates to Redshift <-> S3 communication, not Spark <-> S3.

I think that you’ll have to either use the AWS security token service instructions (mentioned at https://github.com/databricks/spark-redshift/issues/252#issuecomment-237910447) or will need to use the aws_iam_role parameter to tell Redshift which role to assume when communicating with S3.

I’ve been meaning to write a tutorial on using this library with only IAM authentication because I think that the current featureset makes it possible but not as easy at it could be.

3reactions
alindencommented, Oct 5, 2016

I encountered this issue with code identical to the original post. The problem seems to be that instance based authentication is not supported for s3://... or s3n://... urls. Changing to s3a://... resolved the issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshooting IAM roles - AWS Documentation - Amazon.com
I can't assume a role; A new role appeared in my AWS account ... It does not matter what permissions are granted to...
Read more >
Manage access to projects, folders, and organizations
In Identity and Access Management (IAM), access is granted through allow policies, also known as IAM policies. An allow policy is attached to...
Read more >
IAM: What happens when you assume a role? - tecRacer
An IAM group is not a principal and as such can't take action in an AWS account. We can see in this diagram...
Read more >
Switching to an IAM role (Amazon CLI)
Use the Amazon Command Line Interface to switch to an IAM role that provides temporary access to resources in an Amazon account.
Read more >
AWS IAM Role Chaining - In Plain English
When the user in Account 1 assumes a role in Account 2, you get temporary credentials. Using those temporary credentials you can perform...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found