question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Failed to save the model to the HDFS

See original GitHub issue

Willingness to contribute

Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.

MLflow version

mlflow, version 1.27.0

System information

Centos 7.9

Describe the problem

I set up : mlflow server \ --backend-store-uri mysql://xxxx:.xxxx@hd01:3306/mlflow_test \ --host 0.0.0.0 -p 5001 \ --default-artifact-root hdfs://mycluster/mlprojects/models

running save model : 2022/08/17 17:02:54 WARNING mlflow.utils.autologging_utils: MLflow autologging encountered a warning: "/home/anaconda3/envs/mlflow-1.27.0/lib/python3.7/site-packages/mlflow/store/artifact/hdfs_artifact_repo.py:186: FutureWarning: pyarrow.hdfs.connect is deprecated as of 2.0.0, please use pyarrow.fs.HadoopFileSystem instead." 2022/08/17 17:02:54 WARNING mlflow.utils.autologging_utils: Encountered unexpected error during autologging: Prior attempt to load libhdfs failed

I think is hdfs_artifact_repo.py prothem and pyarrow version

I tried lowering the PyArrow version to 1.0, But the same error is reported

Tracking information

No response

Code to reproduce issue

mlflow server
–backend-store-uri mysql://xxxx:.xxxx@hd01:3306/mlflow_test
–host 0.0.0.0 -p 5001
–default-artifact-root hdfs://mycluster/mlprojects/models

Stack trace

2022/08/17 17:02:54 WARNING mlflow.utils.autologging_utils: MLflow autologging encountered a warning: “/home/anaconda3/envs/mlflow-1.27.0/lib/python3.7/site-packages/mlflow/store/artifact/hdfs_artifact_repo.py:186: FutureWarning: pyarrow.hdfs.connect is deprecated as of 2.0.0, please use pyarrow.fs.HadoopFileSystem instead.” 2022/08/17 17:02:54 WARNING mlflow.utils.autologging_utils: Encountered unexpected error during autologging: Prior attempt to load libhdfs failed 258/258 [==============================] - 2s 5ms/step - loss: 1.1702 - accuracy: 0.5270 - val_loss: 0.9022 - val_accuracy: 0.6350 WARNING:absl:Found untraced functions such as _jit_compiled_convolution_op, _jit_compiled_convolution_op, _jit_compiled_convolution_op while saving (showing 3 of 3). These functions will not be directly callable after loading. INFO:tensorflow:Assets written to: /tmp/tmpkjs57s12/model/data/model/assets INFO:tensorflow:Assets written to: /tmp/tmpkjs57s12/model/data/model/assets 2022/08/17 17:03:00 WARNING mlflow.utils.autologging_utils: MLflow autologging encountered a warning: “/home/anaconda3/envs/mlflow-1.27.0/lib/python3.7/site-packages/mlflow/store/artifact/hdfs_artifact_repo.py:186: FutureWarning: pyarrow.hdfs.connect is deprecated as of 2.0.0, please use pyarrow.fs.HadoopFileSystem instead.” 2022/08/17 17:03:00 WARNING mlflow.utils.autologging_utils: Encountered unexpected error during tensorflow autologging: Prior attempt to load libhdfs failed 79/79 - 0s - loss: 0.8998 - accuracy: 0.6334 - 202ms/epoch - 3ms/step /ntest loss: 0.8997762203216553 test accuracy: 0.6333597302436829

Other info / logs

No response

What component(s) does this bug affect?

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/pipelines: Pipelines, Pipeline APIs, Pipeline configs, Pipeline Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

What language(s) does this bug affect?

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
gaogao110commented, Aug 18, 2022

Thanks for your reply , I try python 3.8 is successed

0reactions
gaogao110commented, Aug 18, 2022

Thanks for your reply , I try python 3.8 is successed

Read more comments on GitHub >

github_iconTop Results From Across the Web

my model cannot be saved to HDFS with more than one spark ...
2. Describe the bug: A single execute can successfully save the final model to HDFS. But when more than one executes are used,...
Read more >
Unable to save the tensorflow model file into HDFS
I took a sample code to train the model and save the model h5py into HDFS and want to reload the model. Below...
Read more >
fix cache table bug, add save_paddle_inference_model, fix hdfs util ...
fix cache table bug, add save_paddle_inference_model, fix hdfs util bug (#21052) ... save paddle inference model, and upload to hdfs dnn_plugin path.
Read more >
Error "Failed to save the model in the file" occurs randomly
Error is returned "Failed to save the model in the file" while saving a model in either binary or XML file format. No...
Read more >
Known Hadoop Errors - RapidMiner Documentation
Workaround: This issue usually happens with the Apply Model operator with very large models (like Trees). Set the use general applier parameter to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found