question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Write bundle with spark-submit fails.

See original GitHub issue

I’m just trying to serialize a vanilla Spark pipeline to a BundleFile using spark-submit. The main method of the application looks like:

    val training: DataFrame = ACMEData.readData()
    val pipeline: PipelineModel = ACMEModel.buildModel() // train a spark pipeline
    val sbc = SparkBundleContext().withDataset(pipeline.transform(training))
    for(bf <- managed(BundleFile("jar:file:/tmp/acme-detection.zip"))) {
      pipeline.writeBundle.format(SerializationFormat.Json).save(bf)(sbc).get
    }

I get the following error:

Exception in thread "main" com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'ml.combust.mleap.spark'
        at com.typesafe.config.impl.SimpleConfig.findKeyOrNull(SimpleConfig.java:152)
        at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:145)
        at com.typesafe.config.impl.SimpleConfig.findOrNull(SimpleConfig.java:172)
        at com.typesafe.config.impl.SimpleConfig.findOrNull(SimpleConfig.java:176)
        at com.typesafe.config.impl.SimpleConfig.findOrNull(SimpleConfig.java:176)
        at com.typesafe.config.impl.SimpleConfig.findOrNull(SimpleConfig.java:176)
        at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:184)
        at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:189)
        at com.typesafe.config.impl.SimpleConfig.getObject(SimpleConfig.java:258)
        at com.typesafe.config.impl.SimpleConfig.getConfig(SimpleConfig.java:264)
        at com.typesafe.config.impl.SimpleConfig.getConfig(SimpleConfig.java:37)
        at ml.combust.bundle.BundleRegistry$.apply(BundleRegistry.scala:34)
        at ml.combust.bundle.BundleRegistry$.apply(BundleRegistry.scala:27)
        at org.apache.spark.ml.bundle.SparkBundleContext$.apply(SparkBundleContext.scala:17)
        at org.apache.spark.ml.bundle.SparkBundleContext$.defaultContext$lzycompute(SparkBundleContext.scala:11)
        at org.apache.spark.ml.bundle.SparkBundleContext$.defaultContext(SparkBundleContext.scala:11)

This happens when it tries to create the SparkBundleContext(). Any ideas what I’m doing wrong here?

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:9 (4 by maintainers)

github_iconTop GitHub Comments

7reactions
sethahcommented, Sep 5, 2017

Thanks for your help, @hollinwilkins. Adding the following to the maven shade plugin fixed it:

<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
    <resource>reference.conf</resource>
</transformer>

This seems to be the same problem discussed here: https://stackoverflow.com/questions/28555174/running-akka-with-runnable-jar

0reactions
yairdatacommented, Mar 18, 2019

i am getting the below error:

java.lang.NoClassDefFoundError: resource/package$

i wonder why the shading is not part of the release, for companies that from security policies have access only pypi , it will be great that all dependencies will be fetched from there and maven , and won’t involve changes in maven/sbt files and building fat jars. alternatives can also be be building a spark+mleap docker for the train & export part , and not just the mleap-serving part.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Spark Submit python failed while trying to access HDFS in ...
Your problem is not the file on HDFS. The exception is thrown because the program python is not found by one of your...
Read more >
Submitting Applications - Spark 3.3.1 Documentation
Once a user application is bundled, it can be launched using the bin/spark-submit script. This script takes care of setting up the classpath...
Read more >
Spark Submit Command Explained with Examples
The spark-submit command is a utility to run or submit a Spark or PySpark application program (or job) to the cluster by specifying...
Read more >
Error - Spark-Submit - java.io.FileNotFoundExcepti... - 240121
In this HDFS path, Spark will try to write it's event logs - not to be confused with YARN application logs, or your...
Read more >
Deploy .NET for Apache Spark worker and user-defined ...
When deploying workers and writing UDFs, there are a few commonly used ... application is bundled, you can launch it using spark-submit ....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found