[BUG] Loading mlflow model using spark_udf fails on Databricks with cannot import name '_SparkDirectoryDistributor'
See original GitHub issueWillingness to contribute
No. I cannot contribute a bug fix at this time.
MLflow version
1.20.2
System information
- Python version: 3.8.2.
- Databricks ML Runtime version: 9.1.
Describe the problem
I’m working on Databricks with ML Runtime 9.1 and Mlflow, to build a simple training & inference pipeline, by logging trained models to Mlflow’s Model Registry and retrieving it during inference as a spark_udf
.
I successfully logged the model and artifacts to the Model Registry. Once there, I manually registered the model on the UI, and set the stage to production. In the inference notebook, I simply use the load_model()
function to load the model, and then mlflow.pyfunc.spark_udf
to convert the model to spark user defined function.
At this last line, a surprising error shows that I don’t understand:
ImportError: cannot import name '_SparkDirectoryDistributor' from
'mlflow.utils._spark_utils'
(/databricks/python/lib/python3.8/site-packages/mlflow/utils/_spark_utils.py)
Detailed traceback:
ImportError Traceback (most recent call last)
<command-1233344365339> in <module>
25 model = mlflow.sklearn.load_model(model_path)
26
---> 27 predict = mlflow.pyfunc.spark_udf(spark, model_uri=model_path, result_type=ArrayType(StringType()))
/databricks/python/lib/python3.8/site-packages/mlflow/pyfunc/__init__.py in spark_udf(spark, model_uri, result_type)
795
796 def get_model_dependencies(model_uri, format="pip"): # pylint: disable=redefined-builtin
--> 797 """
798 :param model_uri: The uri of the model to get dependencies from.
799 :param format: The format of the returned dependency file. If the ``"pip"`` format is
/databricks/python_shell/dbruntime/PythonPackageImportsInstrumentation/__init__.py in import_patch(name, globals, locals, fromlist, level)
160 # Import the desired module. If you’re seeing this while debugging a failed import,
161 # look at preceding stack frames for relevant error information.
--> 162 original_result = python_builtin_import(name, globals, locals, fromlist, level)
163
164 is_root_import = thread_local._nest_level == 1
/databricks/python/lib/python3.8/site-packages/mlflow/pyfunc/spark_model_cache.py in <module>
----> 1 from mlflow.utils._spark_utils import _SparkDirectoryDistributor
2
3
4 class SparkModelCache:
5 """Caches models in memory on Spark Executors, to avoid continually reloading from disk.
ImportError: cannot import name '_SparkDirectoryDistributor' from 'mlflow.utils._spark_utils' (/databricks/python/lib/python3.8/site-packages/mlflow/utils/_spark_utils.py)
Tracking information
No response
Code to reproduce issue
training notebook:
import mlflow
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
MODEL_NAME = "model001"
mlflow.start_run()
digits = load_digits()
x_train, x_test, y_train, y_test = train_test_split(digits.data, digits.target)
logisticRegr = LogisticRegression()
logisticRegr.fit(x_train, y_train)
score = logisticRegr.score(x_test, y_test)
conda_env = mlflow.pyfunc.get_default_conda_env()
class MyModel(mlflow.pyfunc.PythonModel):
def save_context(self, model, path, conda_env):
artifacts = {'model': path}
mlflow.sklearn.save_model(sk_model=model, path=artifacts['model'], conda_env=conda_env)
def load_context(self, context: mlflow.pyfunc.PythonModelContext):
model = mlflow.sklearn.load_model(context.artifacts['model'])
def predict(self, context: mlflow.pyfunc.PythonModelContext, input_data):
return model.predict(input_data.values)
pymodel = MyModel()
pymodel.save_context(logisticRegr, MODEL_NAME, conda_env)
mlflow.sklearn.log_model(artifact_path=MODEL_NAME, sk_model=pymodel, conda_env = conda_env)
mlflow.end_run()
inference notebook:
import mlflow
from mlflow.store.artifact.models_artifact_repo import ModelsArtifactRepository
MODEL_NAME = "model001"
model_path = f"models:/{MODEL_NAME}/production"
model = mlflow.sklearn.load_model(model_path)
predict = mlflow.pyfunc.spark_udf(spark, model_uri=model_path)
Other info / logs
No response
What component(s) does this bug affect?
-
area/artifacts
: Artifact stores and artifact logging -
area/build
: Build and test infrastructure for MLflow -
area/docs
: MLflow documentation pages -
area/examples
: Example code -
area/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registry -
area/models
: MLmodel format, model serialization/deserialization, flavors -
area/projects
: MLproject format, project running backends -
area/scoring
: MLflow Model server, model deployment tools, Spark UDFs -
area/server-infra
: MLflow Tracking server backend -
area/tracking
: Tracking Service, tracking client APIs, autologging
What interface(s) does this bug affect?
-
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev server -
area/docker
: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models -
area/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registry -
area/windows
: Windows support
What language(s) does this bug affect?
-
language/r
: R APIs and clients -
language/java
: Java APIs and clients -
language/new
: Proposals for new client languages
What integration(s) does this bug affect?
-
integrations/azure
: Azure and Azure ML integrations -
integrations/sagemaker
: SageMaker integrations -
integrations/databricks
: Databricks integrations
Issue Analytics
- State:
- Created a year ago
- Comments:11 (7 by maintainers)
Top Results From Across the Web
Error in importing mlflow.sklearn - Databricks Community
ImportError: cannot import name '_MIN_SKLEARN_VERSION' from 'mlflow.sklearn.utils' (/databricks/python/lib/python3.8/site-packages/mlflow/sklearn/utils.py).
Read more >MLFlow error - Databricks Community
I am running into an error within the Databricks notebook (on Databricks website) environment where MLFlow will not load: MLflow autologging encountered a ......
Read more >Log, load, register, and deploy MLflow models
Register models in the Model Registry; Save models to DBFS; Download model artifacts; Deploy models for online serving. Log and load models.
Read more >ML 10 - Feature Store notebook | feature_store import error
the following code... from pyspark.sql.functions import monotonically_increasing_id, lit, expr, rand. import uuid. from databricks import feature_store.
Read more >ML Practioner | ml 09 | error on importing databricks.automl
executing the following code... from databricks import automl. summary = automl.regress(train_df, target_col="price", primary_metric="rmse", ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Strange, please let us know if you find a way that can constantly reproduce the issue. We’ll also investigate.
We’ve encountered a similar issue before where
pip install
(not%pip install
) overwrites python scripts and causesImportError
. Here’s a quick breakdown of the traceback.This block looks strange.
The arrow (
-->
) points to line 797, which is a docstring that never causesImportError
.Line 797 of
mlflow/pyfunc/__init__.py
in mlflow 1.20.2 looks like this:https://github.com/mlflow/mlflow/blob/36869c02a07e30d3d7c18fcf5a31cb7febd5dc9e/mlflow/pyfunc/__init__.py#L797
Line 797 of
mlflow/pyfunc/__init__.py
in mlflow 1.26.1 looks like this:https://github.com/mlflow/mlflow/blob/d42864b0168ef328ae9aec6bbe39e05a0c0f76fe/mlflow/pyfunc/__init__.py#L797
This indicates
mlflow.pyfunc
was loaded frommlflow/pyfunc/__init__.py
in mlflow 1.20.2, and then mlflow 1.26.1 was installed, which updatedmlflow/pyfunc/__init__.py
.This block also looks strange.
from mlflow.utils._spark_utils import _SparkDirectoryDistributor
indicates mlflow >= 1.25.0 is installed but,cannot import name '_SparkDirectoryDistributor' from 'mlflow.utils._spark_utils'
indicates mlflow < 1.25.0 is installed. Strange.