question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Cannot use databricks connection with R

See original GitHub issue

Issues Policy acknowledgement

  • I have read and agree to submit bug reports in accordance with the issues policy

Willingness to contribute

Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.

MLflow version

mlflow, version 1.27.0

System information

  • Window
  • Python version: Python 3.10.0
  • R version: R version 4.1.2 (2021-11-01)
  • mlflow R package version: 2.0.1

Describe the problem

I am attempting to track and log parameters within DataBricks using mlflow. I can see the runs that I am creating quite clearly:

image

However there is an issue whereby I cannot actually log anything to this run. This is due to the way the package sets the active run ID - or rather doesn’t set it - when there is a client provided. See here. The result of this is that whenever you try to log something, it will not work as there is a check for the active run ID that takes place here. Ultimately this is because mlflow:::mlflow_get_active_run_id() is NULL because it is never set in mlflow::mlflow_start_run() when there is a client_id provided.

Tracking information

No response

Code to reproduce issue

library(mlflow)
client <- mlflow::mlflow_client(tracking_uri = "databricks")
experiment <- "1709256526326232"
run <- mlflow::mlflow_start_run(experiment_id = "1709256526326232", client = client)

mlflow:::mlflow_get_active_run_id()
# NULL

# Try to log a parameter
with(run, {
  mlflow::mlflow_log_param(
    key = "test",
    value = 1,
    client = client
  )
})
# Error: `with()` should only be used with `mlflow_start_run()`.

# Try to use `mlflow::mlflow_start_run()`:
with(
  mlflow::mlflow_start_run(
    experiment_id = "1709256526326232",
    client = mlflow::mlflow_client(tracking_uri = "databricks")
  ), {
  mlflow::mlflow_log_param(
    key = "test",
    value = 1,
    client = client
  )
})
# Error: `with()` should only be used with `mlflow_start_run()`.

Stack trace

3: stop("`with()` should only be used with `mlflow_start_run()`.",
       call. = FALSE)
2: with.mlflow_run(mlflow::mlflow_start_run(experiment_id = "1709256526326232", 
       client = mlflow::mlflow_client(tracking_uri = "databricks")),
       {
           mlflow::mlflow_log_param(key = "test", value = 1, client = client)   
       })
1: with(mlflow::mlflow_start_run(experiment_id = "1709256526326232",
       client = mlflow::mlflow_client(tracking_uri = "databricks")),
       {
           mlflow::mlflow_log_param(key = "test", value = 1, client = client)
       })

Other info / logs

No response

What component(s) does this bug affect?

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

What language(s) does this bug affect?

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

Issue Analytics

  • State:closed
  • Created 10 months ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
nathaneastwoodcommented, Nov 29, 2022

Good question. Seems I had this set many months ago when I was first working with this code locally. Now we have DataBricks set up I have come back to it and so continued to use it which is where I found the issue. I think we can close this now, thanks for your (very) quick help!

0reactions
harupycommented, Nov 29, 2022

Where did you get this code?

client <- mlflow::mlflow_client(tracking_uri = "databricks")
experiment <- "1709256526326232"
run <- mlflow::mlflow_start_run(experiment_id = "1709256526326232", client = client)
Read more comments on GitHub >

github_iconTop Results From Across the Web

RStudio server backend connection error - Databricks
Problem You get a backend connection error when using RStudio server. ... This terminates the R session and cleans the RBackend.
Read more >
Databricks Connect | Databricks on AWS
Learn how to use Databricks Connect to connect your favorite IDE, notebook server, or custom applications to Databricks clusters.
Read more >
Troubleshooting JDBC and ODBC connections - Databricks
This article provides information to help you troubleshoot the connection between your Databricks JDBC/ODBC server and BI tools and data ...
Read more >
Make Your RStudio on Databricks More Durable and Resilient
Learn how to make RStudio on Databricks resilient to cluster termination while making sure R code and GitHub access keys are secured.
Read more >
Databricks-connect
I use databricks-connect, and spark jobs related spark dataframe works good. But, when I trigger spark ml code, I am getting errors.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found