question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to use almond with Hadoop 2.8

See original GitHub issue

I’m trying to use almond with Hadoop 2.8.5 (the hadoop version used by recent EMR) and I ran into an error due to incompatible versions of hadoop jars in the classpath.

19/07/23 16:09:47 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, ip-10-20-101-239.eu-west-1.compute.internal, executor 2): java.lang.IllegalAccessError: tried to access method org.apache.hadoop.metrics2.lib.MutableCounterLong.<init>(Lorg/apache/hadoop/metrics2/MetricsInfo;J)V from class org.apache.hadoop.fs.s3a.S3AInstrumentation
	at org.apache.hadoop.fs.s3a.S3AInstrumentation.streamCounter(S3AInstrumentation.java:195)

It seems that spark-yarn has transitive dependencies on hadoop 2.6.5

The first idea is to use a profile but unfortunately, there is no hadoop 2.8 profile for Spark.

The second idea is to exclude hadoop jars, it works for the driver but there are still downloaded to the executors.

interp.load.ivy(
  coursier.Dependency(
    module = coursier.Module(coursier.Organization("org.apache.spark"), coursier.ModuleName("spark-yarn_2.11")),
    version = "2.4.3",
    exclusions = Set((coursier.Organization("org.apache.hadoop"), coursier.ModuleName("*")))
  )
)
import $ivy.`sh.almond::almond-spark:0.6.0`
import $ivy.`org.apache.hadoop:hadoop-aws:2.8.5`
import $ivy.`org.apache.hadoop:hadoop-hdfs-client:2.8.5`
import $ivy.`org.apache.hadoop:hadoop-hdfs:2.8.5`
import $ivy.`org.apache.hadoop:hadoop-yarn-api:2.8.5`
import $ivy.`org.apache.hadoop:hadoop-yarn-client:2.8.5`
import $ivy.`org.apache.hadoop:hadoop-mapreduce-client-core:2.8.5`
import $ivy.`org.apache.hadoop:hadoop-yarn-server-web-proxy:2.8.5`
import $ivy.`org.apache.hadoop:hadoop-yarn-common:2.8.5`

The third idea is to have a way to exclude jar from the classpath built in ammonite-spark, but it doesn’t seem possible yet.

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

3reactions
YannMoisancommented, Jul 28, 2019

Good news, it works ! I’ve built a custom version of almond against ammonite-spark with this PR https://github.com/alexarchambault/ammonite-spark/pull/58. In the notebook, I’ve just forced the version of hadoop-client before importing spark-sql

import coursier.core._
interp.resolutionHooks += { fetch =>
   fetch.withResolutionParams(
     fetch.resolutionParams.addForceVersion(
         (Module(Organization("org.apache.hadoop"), ModuleName("hadoop-client"), Map.empty), "2.8.5")
       )
       )
 }

And I’m now able to read a file from s3 with the following config : EMR 5.24.1 (Hadoop 2.8.5), Spark 2.4.3, Scala 2.12.

0reactions
brayellisoncommented, Oct 6, 2020

I missed this before and came back to it searching for a solution again. Thank you @YannMoisan, I’ll give it a shot!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Apache Hadoop 2.8.0 – Hadoop Cluster Setup
This document describes how to install and configure Hadoop clusters ranging from a few nodes to extremely large clusters with thousands of ...
Read more >
HBase Installation Step By Step Guide | by Yuchen Z. | Medium
Install Hadoop. Go to the Apache website to check for the most recent version of Hadoop http://hadoop.apache.org/releases.html and also pick ...
Read more >
spark - Scaladex
Apache Spark - A unified analytics engine for large-scale data processing. scala · python · spark · big-data · java · r ·...
Read more >
Team Leader MIS Salary in India | AmbitionBox
Calculate Your Take Home Salary ... 2.8L. ₹ 19.6L. Team Leader (Technical) Salary. (78 salaries). ₹ 14.3L ... 25% OFF on Big Data...
Read more >
2015 ford taurus sel performance upgrades - 精釀啤酒餐廳
Like the best-selling F-150, the Edge and Taurus, Ford is using smaller ... is insufficient memory for the java runtime environment to continue...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found