Dependency issues when using --packages option with spark
See original GitHub issueI encounter an issue when using the packages option with spark shell. Any idea why is this happening? im using spark 1.6.1 on amazon EMR emr-4.7.1
**spark-shell --packages com.databricks:spark-redshift_2.10:0.6.0**
Ivy Default Cache set to: /home/hadoop/.ivy2/cache
The jars for the packages stored in: /home/hadoop/.ivy2/jars
:: loading settings :: url = jar:file:/usr/lib/spark/lib/spark-assembly-1.6.1-hadoop2.7.2-amzn-2.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.databricks#spark-redshift_2.10 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found com.databricks#spark-redshift_2.10;0.6.0 in central
found org.slf4j#slf4j-api;1.7.5 in local-m2-cache
found com.databricks#spark-avro_2.10;2.0.1 in central
found org.apache.avro#avro;1.7.6 in central
found org.codehaus.jackson#jackson-core-asl;1.9.13 in local-m2-cache
found org.codehaus.jackson#jackson-mapper-asl;1.9.13 in local-m2-cache
found com.thoughtworks.paranamer#paranamer;2.3 in local-m2-cache
found org.xerial.snappy#snappy-java;1.0.5 in local-m2-cache
found org.apache.commons#commons-compress;1.4.1 in local-m2-cache
found org.tukaani#xz;1.0 in local-m2-cache
downloading https://repo1.maven.org/maven2/com/databricks/spark-redshift_2.10/0.6.0/spark-redshift_2.10-0.6.0.jar ...
[SUCCESSFUL ] com.databricks#spark-redshift_2.10;0.6.0!spark-redshift_2.10.jar (27ms)
downloading https://repo1.maven.org/maven2/com/databricks/spark-avro_2.10/2.0.1/spark-avro_2.10-2.0.1.jar ...
[SUCCESSFUL ] com.databricks#spark-avro_2.10;2.0.1!spark-avro_2.10.jar (13ms)
downloading https://repo1.maven.org/maven2/org/apache/avro/avro/1.7.6/avro-1.7.6.jar ...
[SUCCESSFUL ] org.apache.avro#avro;1.7.6!avro.jar(bundle) (22ms)
downloading file:/home/hadoop/.m2/repository/org/apache/commons/commons-compress/1.4.1/commons-compress-1.4.1.jar ...
[SUCCESSFUL ] org.apache.commons#commons-compress;1.4.1!commons-compress.jar (2ms)
downloading file:/home/hadoop/.m2/repository/org/tukaani/xz/1.0/xz-1.0.jar ...
[SUCCESSFUL ] org.tukaani#xz;1.0!xz.jar (2ms)
:: resolution report :: resolve 2496ms :: artifacts dl 88ms
:: modules in use:
com.databricks#spark-avro_2.10;2.0.1 from central in [default]
com.databricks#spark-redshift_2.10;0.6.0 from central in [default]
com.thoughtworks.paranamer#paranamer;2.3 from local-m2-cache in [default]
org.apache.avro#avro;1.7.6 from central in [default]
org.apache.commons#commons-compress;1.4.1 from local-m2-cache in [default]
org.codehaus.jackson#jackson-core-asl;1.9.13 from local-m2-cache in [default]
org.codehaus.jackson#jackson-mapper-asl;1.9.13 from local-m2-cache in [default]
org.slf4j#slf4j-api;1.7.5 from local-m2-cache in [default]
org.tukaani#xz;1.0 from local-m2-cache in [default]
org.xerial.snappy#snappy-java;1.0.5 from local-m2-cache in [default]
:: evicted modules:
org.slf4j#slf4j-api;1.6.4 by [org.slf4j#slf4j-api;1.7.5] in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 11 | 10 | 10 | 1 || 10 | 5 |
---------------------------------------------------------------------
:: problems summary ::
:::: WARNINGS
[NOT FOUND ] org.slf4j#slf4j-api;1.7.5!slf4j-api.jar (0ms)
==== local-m2-cache: tried
file:/home/hadoop/.m2/repository/org/slf4j/slf4j-api/1.7.5/slf4j-api-1.7.5.jar
[NOT FOUND ] org.codehaus.jackson#jackson-core-asl;1.9.13!jackson-core-asl.jar (0ms)
==== local-m2-cache: tried
file:/home/hadoop/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.9.13/jackson-core-asl-1.9.13.jar
[NOT FOUND ] org.codehaus.jackson#jackson-mapper-asl;1.9.13!jackson-mapper-asl.jar (0ms)
==== local-m2-cache: tried
file:/home/hadoop/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.9.13/jackson-mapper-asl-1.9.13.jar
[NOT FOUND ] com.thoughtworks.paranamer#paranamer;2.3!paranamer.jar (0ms)
==== local-m2-cache: tried
file:/home/hadoop/.m2/repository/com/thoughtworks/paranamer/paranamer/2.3/paranamer-2.3.jar
[NOT FOUND ] org.xerial.snappy#snappy-java;1.0.5!snappy-java.jar(bundle) (0ms)
==== local-m2-cache: tried
file:/home/hadoop/.m2/repository/org/xerial/snappy/snappy-java/1.0.5/snappy-java-1.0.5.jar
::::::::::::::::::::::::::::::::::::::::::::::
:: FAILED DOWNLOADS ::
:: ^ see resolution messages for details ^ ::
::::::::::::::::::::::::::::::::::::::::::::::
:: org.slf4j#slf4j-api;1.7.5!slf4j-api.jar
:: org.codehaus.jackson#jackson-core-asl;1.9.13!jackson-core-asl.jar
:: org.codehaus.jackson#jackson-mapper-asl;1.9.13!jackson-mapper-asl.jar
:: com.thoughtworks.paranamer#paranamer;2.3!paranamer.jar
:: org.xerial.snappy#snappy-java;1.0.5!snappy-java.jar(bundle)
::::::::::::::::::::::::::::::::::::::::::::::
:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [download failed: org.slf4j#slf4j-api;1.7.5!slf4j-api.jar, download failed: org.codehaus.jackson#jackson-core-asl;1.9.13!jackson-core-asl.jar, download failed: org.codehaus.jackson#jackson-mapper-asl;1.9.13!jackson-mapper-asl.jar, download failed: com.thoughtworks.paranamer#paranamer;2.3!paranamer.jar, download failed: org.xerial.snappy#snappy-java;1.0.5!snappy-java.jar(bundle)]
at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1068)
at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:287)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:154)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Issue Analytics
- State:
- Created 7 years ago
- Comments:10 (2 by maintainers)
Top Results From Across the Web
Resolving dependency problems in Apache Spark
Apache Spark's classpath is built dynamically (to accommodate per-application user code) which makes it vulnerable to such issues.
Read more >How to Manage Python Dependencies in PySpark - Databricks
Apache Spark™ provides several standard ways to manage dependencies across the nodes in a cluster via script options such as --jars , --packages...
Read more >Managing dependencies and artifacts in PySpark
In this blog entry, we'll examine how to solve these problems by following a good practice of using 'setup.py' as your dependency management...
Read more >Best Practices for Dependency Problem in Spark - Gankrin
Resolve Dependency Problem in Spark . While building any Spark Application – this is one of the main concerns that any Engineer should...
Read more >Exception when using spark.jars.packages - Apache
When more than one process is using packages option it's possible to create ... INFO - datastax#spark-cassandra-connector added as a dependency INFO ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
The problem has nothing related with spark or ivy itself. It’s essentially maven repo issue. When you specify a 3rd party lib in --packages, ivy will first check local ivy repo and local maven repo for the lib as well as all its dependencies. If found, it won’t try to download it from central repo. However, when searching the local maven repo, ivy will only check if the directory of artifact exists without checking if there is actually jar file in the dir.
found com.thoughtworks.paranamer#paranamer;2.3 in local-m2-cache
This msg indicates that directory of paranamer-2.3.jar was found in local maven repo. But if you go to the directory, you will find no jar file there. I think it’s because maven tried to download the artifact from central before but failed to get the jar for some reason.
A solution is to remove related dir in .ivy2/cache, ivy2/jars and .m2/repository/
Ran into the same issue. In my case, I deleted my
$HOME/.ivy2
directory and ran./bin/spark-shell --packages com.databricks:spark-redshift_2.10:2.0.0
again to get rid of the issue.