Write bundle with spark-submit fails.
See original GitHub issueIssue Description
I’m just trying to serialize a vanilla Spark pipeline to a BundleFile using spark-submit. The main method of the application looks like:
val training: DataFrame = ACMEData.readData()
val pipeline: PipelineModel = ACMEModel.buildModel() // train a spark pipeline
val sbc = SparkBundleContext().withDataset(pipeline.transform(training))
for(bf <- managed(BundleFile("jar:file:/tmp/acme-detection.zip"))) {
pipeline.writeBundle.format(SerializationFormat.Json).save(bf)(sbc).get
}
I get the following error:
Exception in thread "main" com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'ml.combust.mleap.spark'
at com.typesafe.config.impl.SimpleConfig.findKeyOrNull(SimpleConfig.java:152)
at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:145)
at com.typesafe.config.impl.SimpleConfig.findOrNull(SimpleConfig.java:172)
at com.typesafe.config.impl.SimpleConfig.findOrNull(SimpleConfig.java:176)
at com.typesafe.config.impl.SimpleConfig.findOrNull(SimpleConfig.java:176)
at com.typesafe.config.impl.SimpleConfig.findOrNull(SimpleConfig.java:176)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:184)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:189)
at com.typesafe.config.impl.SimpleConfig.getObject(SimpleConfig.java:258)
at com.typesafe.config.impl.SimpleConfig.getConfig(SimpleConfig.java:264)
at com.typesafe.config.impl.SimpleConfig.getConfig(SimpleConfig.java:37)
at ml.combust.bundle.BundleRegistry$.apply(BundleRegistry.scala:34)
at ml.combust.bundle.BundleRegistry$.apply(BundleRegistry.scala:27)
at org.apache.spark.ml.bundle.SparkBundleContext$.apply(SparkBundleContext.scala:17)
at org.apache.spark.ml.bundle.SparkBundleContext$.defaultContext$lzycompute(SparkBundleContext.scala:11)
at org.apache.spark.ml.bundle.SparkBundleContext$.defaultContext(SparkBundleContext.scala:11)
This happens when it tries to create the SparkBundleContext()
. Any ideas what I’m doing wrong here?
Issue Analytics
- State:
- Created 5 years ago
- Comments:9 (4 by maintainers)
Top Results From Across the Web
Spark Submit python failed while trying to access HDFS in ...
Your problem is not the file on HDFS. The exception is thrown because the program python is not found by one of your...
Read more >Submitting Applications - Spark 3.3.1 Documentation
Once a user application is bundled, it can be launched using the bin/spark-submit script. This script takes care of setting up the classpath...
Read more >Spark Submit Command Explained with Examples
The spark-submit command is a utility to run or submit a Spark or PySpark application program (or job) to the cluster by specifying...
Read more >Error - Spark-Submit - java.io.FileNotFoundExcepti... - 240121
In this HDFS path, Spark will try to write it's event logs - not to be confused with YARN application logs, or your...
Read more >Deploy .NET for Apache Spark worker and user-defined ...
When deploying workers and writing UDFs, there are a few commonly used ... application is bundled, you can launch it using spark-submit ....
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Thanks for your help, @hollinwilkins. Adding the following to the maven shade plugin fixed it:
This seems to be the same problem discussed here: https://stackoverflow.com/questions/28555174/running-akka-with-runnable-jar
i am getting the below error:
i wonder why the shading is not part of the release, for companies that from security policies have access only pypi , it will be great that all dependencies will be fetched from there and maven , and won’t involve changes in maven/sbt files and building fat jars. alternatives can also be be building a spark+mleap docker for the train & export part , and not just the mleap-serving part.