End to end SBT project in Scala
See original GitHub issueDescription
Feature Request: Not a bug
Citing the fact that this library requires out-of-date version of Scala and Spark, it would significantly help newcomers if there were a complete example project that included sample pom.xml
and/or build.sbt
files and the proper configuration needed.
Of course, this is not just an issue with this library, but is a core Spark issue. However, it would save a lot of time for newbies to Spark and this library if there were a read-to-go build.sbt
file somewhere.
I’m currently sorting through this mess myself (downgrading from Scala 2.13 to 2.11 in order to use Spark), and would be happy to submit my build.sbt
file and associated docker-compose.yml
file once I figure it out, if we can find a good place to put it on this repo (or another one).
Please let me know what you guys think, and thanks for this awesome toolset!
Steps to Reproduce
Not applicable
Your Environment
Not applicable
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:12 (5 by maintainers)
I’ve temporarily given up on using the MinIO Java client directly in lieu of using Spark’s S3 capabilities (MinIO is S3 compliant). I’ve gotten it to work on my local dev cluster (
local[*]
) but not on my Docker cluster, even when assembling a fat JAR and submitting it with the job.It might be a few days or a week or so before I have something. It seems I have even more to learn about dependency management on Spark clusters than I had realized…
You are very welcome, we do have lots of users using Fat JAR to submit their jobs on EMR, GCP, etc. It’s true, we don’t have many examples for them apart from that starter project I gave you. Most of the examples are to show how to use the library, especially in PySpark/Jupyter since it’s easier to just run it immediately.
Also, you can easily change the Scala version to 2.12.13 and the Apache Spark version to 3.0.1 as we are doing it ourselves to start supporting Apache Spark 3.0.1, there are a few small issues in some models, but apart from that everything else is compatible. (Apache Spark 2.3.x and 2.4.x are being heavily used by our users and all the people in production, so we had to make sure we will support all 2.3, 2.4, and 3.0 on Scala 2.11 and Scala 2.12 since the other two versions won’t go anywhere anytime soon.)
While you are making a PoC against Apache Spark alone to match your dependencies and see if you can shade some and whether there is an issue, I’ll start adding more examples in that starter project with instructions as to how to package it and use it with spark-submit.