question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Spark 3.1.1, scala 2.12 & xgboost 1.3.1 migration

See original GitHub issue

Hello mleap developers,

I’ve been busy migrating transmogrif.ai to use Spark 3.1.1, scala 2.12 and xgboost 1.3.1, and due to our heavy reliance on mleap I’ve had to play with the mleap master branch in order to make it work. I thought it would be a good time to share my findings cause we’ll ultimately need a new mleap build in Maven to complete the migration.

What I’ve done so far is:

  • take the existing spark-3.0.0 branch and rebase it on top of master
  • updated the spark (3.1.1), scala (2.12.10), akka (2.6.14 and 10.2.4 for akka-http) and xgboost (1.3.1) versions
  • made a few modifications to the code to adapt to spark & xgboost API changes

All of which was enough to fix all of our JSON serialization/deserialization issues although we still have a number of unit tests to fix and we may discover more issues related to mleap. What I got so far is for sure not sufficient to release a new build but I’m wondering if/when you could help me make it happen. I’m happy to submit either a new PR, push a new branch or update the spark-3.0.0 branch with my changes. Let me know which you’d prefer.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:6
  • Comments:11 (4 by maintainers)

github_iconTop GitHub Comments

2reactions
jsleightcommented, Jul 14, 2021

@emitc2h @jsleight I was wondering is there a timeline by when we can provide Spark3 support?

Other PR seemed to have stalled, so I put together #765

1reaction
jsleightcommented, Jul 15, 2021

@jsleight would this mean that post fix mleap would remain backward compatible with xgboost 1.1.0 , Spark 2.4.x ?

I didn’t touch xgboost in #765, so still xgboost 1.0.0. xgboost 1.1.0 may or may not work – leaving the xgboost upgrade for its own PR.

Spark 2.4.x will not be compatible because spark 2.4 requires scala 2.11 while spark 3 requires scala 2.12. However, #765 does maintain backwards compatibility for deserializing a model trained on spark 2.4 into mleap runtime.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Migration Guide: SQL, Datasets and DataFrame - Spark 3.1.1 ...
This behavior change is introduced because Spark 3.0 is built with Scala 2.12 by default. In Spark 3.0, a higher-order function exists follows...
Read more >
Databricks Runtime 8.2 for ML (Unsupported)
Databricks Runtime ML contains many popular machine learning libraries, including TensorFlow, PyTorch, and XGBoost. It also supports distributed deep ...
Read more >
Databricks Runtime 7.6 for Machine Learning (Unsupported)
Prior to this version, XGBoost was not integrated with PySpark. Users had to either use xgboost4j-spark in Scala or break the PySpark ML ......
Read more >
XGBoost4J-Spark Tutorial (version 0.9+)
XGBoost4J-Spark is a project aiming to seamlessly integrate XGBoost and Apache Spark by fitting XGBoost to Apache Spark's MLLIB framework.
Read more >
xgboost - Scaladex
dmlc / xgboost 1.7.1. Apache License 2.0 Website GitHub ... Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow ... JVM: 2.12....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found