question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Joblib-spark as a possible alternative for Distributed Optimization

See original GitHub issue

Since we are able to use Optuna with joblib, it seems possible to generalize the method using joblib-spark to leverage a spark backend similar to HyperOpt SparkTrials(). Of course, the tradeoffs between parallelism and running should be considered here.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:14 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
WaterKnight1998commented, Jun 21, 2022

FYI: Optuna does not use joblib internally from #2269, but Optuna can still be used with joblib and joblib-spark as it is now.

Could you share an example, please? Thanks in advance @HideakiImamura

1reaction
felipeeeantunescommented, Oct 17, 2020

@toshihikoyanase started this PR https://github.com/optuna/optuna/pull/1942.

I have some doubts about it: the current version of joblibspark has a bug, fixed in this merge (https://github.com/joblib/joblib-spark/pull/21). I should figure out how to add it to the Dockerfile instead of using pip.

Also, I should figure out how to provide a Dockerfile and k8s’ YAML with Spark to reproduce the example in minikube. Can you help with that?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Efficient Distributed Hyperparameter Tuning with Apache Spark
Hyperparameter tuning is a key step in achieving and maintaining optimal performance from Machine Learning (ML) models.
Read more >
Learning Spark, Second Edition - Databricks
Big Data and Distributed Computing at Google. 1. Hadoop at Yahoo! 2. Spark's Early Years at AMPLab. 3. What Is Apache Spark?
Read more >
Introduction to Distributed Optimization
1) Create some input RDDs from external data or parallelize a collection in your driver program. 3) Ask Spark to cache() any intermediate...
Read more >
Efficient Distributed Hyperparameter Tuning with Apache Spark
Hyperparameter tuning is a key step in achieving and maintaining optimal performance from Machine Learning (ML) models. Today, there are many open-source ...
Read more >
Optuna vs Hyperopt: Which Hyperparameter Optimization ...
Ideally, you would like to stop those runs as soon as possible try different parameters instead. Optuna gives you an option to do...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found