IAM Role not taken into account
See original GitHub issueHi,
I launch a job with spark-shell or spark-submit:
spark-shell --name "my test app" --packages com.databricks:spark-redshift_2.11:2.0.1
then
sparkSession.read.schema(schema).
format("com.databricks.spark.redshift").
option("url", redshiftUrl).
option("user",redshiftUser).
option("password",redshiftPassword).
option("aws_iam_role", redshiftRoleArn).
option("query", query).
option("tempdir", s3TmpDir).
load()
and I get the exception:
java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified
as the username or password (respectively) of a s3n URL, or by setting the fs.s3n.awsAccessKeyId
or fs.s3n.awsSecretAccessKey properties
at com.databricks.spark.redshift.S3Credentials.initialize(S3Credentials.java:67)
at com.databricks.spark.redshift.AWSCredentialsUtils$.com$databricks$spark$redshift$AWSCredentialsUtils$$loadFromURI(AWSCredentialsUtils.scala:60)
at com.databricks.spark.redshift.AWSCredentialsUtils$$anonfun$load$1.apply(AWSCredentialsUtils.scala:48)
at com.databricks.spark.redshift.AWSCredentialsUtils$$anonfun$load$1.apply(AWSCredentialsUtils.scala:48)
at scala.Option.getOrElse(Option.scala:121)
at com.databricks.spark.redshift.AWSCredentialsUtils$.load(AWSCredentialsUtils.scala:48)
at com.databricks.spark.redshift.RedshiftRelation.buildScan(RedshiftRelation.scala:89)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$8.apply(DataSourceStrategy.scala:260)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$8.apply(DataSourceStrategy.scala:260)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:303)
...
The role works using psql client and UNLOAD write the result on s3TmpDir. I checked the code, and I am not really sure how IAM role can be handled… It looks like a bug to me.
PS: using spark-submit, my app have the following dependencies in build.sbt
libraryDependencies ++= Seq(
"com.databricks" %% "spark-redshift" % "2.0.1" % Compile,
"joda-time" % "joda-time" % "2.9.4" % Compile,
"com.amazonaws" % "aws-java-sdk-core" % "1.11.37" % Compile,
"com.amazonaws" % "aws-java-sdk-s3" % "1.11.37" % Compile,
"org.scalaj" %% "scalaj-http" % "2.3.0" % Compile,
"net.liftweb" %% "lift-json" % "2.6.3" % Compile,
"com.amazon.redshift" % "jdbc42.Driver" % "1.1.17.1017" from "https://s3.amazonaws.com/redshift-downloads/drivers/RedshiftJDBC42-1.1.17.1017.jar",
"org.apache.spark" %% "spark-core" % "2.0.0" % Compile,
"org.apache.spark" %% "spark-sql" % "2.0.0" % Compile,
"org.apache.spark" %% "spark-hive" % "2.0.0" % Compile,
"org.scalatest" %% "scalatest" % "3.0.0" % Test
)
Issue Analytics
- State:
- Created 7 years ago
- Comments:10 (2 by maintainers)
Top Results From Across the Web
Troubleshooting IAM roles - AWS Documentation - Amazon.com
I can't assume a role; A new role appeared in my AWS account ... It does not matter what permissions are granted to...
Read more >Manage access to projects, folders, and organizations
In Identity and Access Management (IAM), access is granted through allow policies, also known as IAM policies. An allow policy is attached to...
Read more >IAM: What happens when you assume a role? - tecRacer
An IAM group is not a principal and as such can't take action in an AWS account. We can see in this diagram...
Read more >Switching to an IAM role (Amazon CLI)
Use the Amazon Command Line Interface to switch to an IAM role that provides temporary access to resources in an Amazon account.
Read more >AWS IAM Role Chaining - In Plain English
When the user in Account 1 assumes a role in Account 2, you get temporary credentials. Using those temporary credentials you can perform...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
The problem here relates to Redshift <-> S3 communication, not Spark <-> S3.
I think that you’ll have to either use the AWS security token service instructions (mentioned at https://github.com/databricks/spark-redshift/issues/252#issuecomment-237910447) or will need to use the
aws_iam_role
parameter to tell Redshift which role to assume when communicating with S3.I’ve been meaning to write a tutorial on using this library with only IAM authentication because I think that the current featureset makes it possible but not as easy at it could be.
I encountered this issue with code identical to the original post. The problem seems to be that instance based authentication is not supported for
s3://...
ors3n://...
urls. Changing tos3a://...
resolved the issue.