Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Write bundle with spark-submit fails.

See original GitHub issue

I’m just trying to serialize a vanilla Spark pipeline to a BundleFile using spark-submit. The main method of the application looks like:

    val training: DataFrame = ACMEData.readData()
    val pipeline: PipelineModel = ACMEModel.buildModel() // train a spark pipeline
    val sbc = SparkBundleContext().withDataset(pipeline.transform(training))
    for(bf <- managed(BundleFile("jar:file:/tmp/acme-detection.zip"))) {
      pipeline.writeBundle.format(SerializationFormat.Json).save(bf)(sbc).get
    }

I get the following error:

Exception in thread "main" com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'ml.combust.mleap.spark'
        at com.typesafe.config.impl.SimpleConfig.findKeyOrNull(SimpleConfig.java:152)
        at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:145)
        at com.typesafe.config.impl.SimpleConfig.findOrNull(SimpleConfig.java:172)
        at com.typesafe.config.impl.SimpleConfig.findOrNull(SimpleConfig.java:176)
        at com.typesafe.config.impl.SimpleConfig.findOrNull(SimpleConfig.java:176)
        at com.typesafe.config.impl.SimpleConfig.findOrNull(SimpleConfig.java:176)
        at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:184)
        at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:189)
        at com.typesafe.config.impl.SimpleConfig.getObject(SimpleConfig.java:258)
        at com.typesafe.config.impl.SimpleConfig.getConfig(SimpleConfig.java:264)
        at com.typesafe.config.impl.SimpleConfig.getConfig(SimpleConfig.java:37)
        at ml.combust.bundle.BundleRegistry$.apply(BundleRegistry.scala:34)
        at ml.combust.bundle.BundleRegistry$.apply(BundleRegistry.scala:27)
        at org.apache.spark.ml.bundle.SparkBundleContext$.apply(SparkBundleContext.scala:17)
        at org.apache.spark.ml.bundle.SparkBundleContext$.defaultContext$lzycompute(SparkBundleContext.scala:11)
        at org.apache.spark.ml.bundle.SparkBundleContext$.defaultContext(SparkBundleContext.scala:11)

This happens when it tries to create the SparkBundleContext(). Any ideas what I’m doing wrong here?

Issue Analytics

State:
Created 6 years ago
Comments:9 (4 by maintainers)

Top GitHub Comments

7reactions

sethahcommented, Sep 5, 2017

Thanks for your help, @hollinwilkins. Adding the following to the maven shade plugin fixed it:

<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
    <resource>reference.conf</resource>
</transformer>

This seems to be the same problem discussed here: https://stackoverflow.com/questions/28555174/running-akka-with-runnable-jar

0reactions

yairdatacommented, Mar 18, 2019

i am getting the below error:

java.lang.NoClassDefFoundError: resource/package$

i wonder why the shading is not part of the release, for companies that from security policies have access only pypi , it will be great that all dependencies will be fetched from there and maven , and won’t involve changes in maven/sbt files and building fat jars. alternatives can also be be building a spark+mleap docker for the train & export part , and not just the mleap-serving part.