question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[FR] Add argument (e.g. `save_format`) to specify model save format to `mlflow.xgboost.log_model`

See original GitHub issue

Willingness to contribute

No. I cannot contribute this feature at this time.

Proposal Summary

The XGBoost model file name is currently hardcoded as model.xgb:

https://github.com/mlflow/mlflow/blob/cf96d39d81932423d70d267f8a4ab6640e14ec7e/mlflow/xgboost/__init__.py#L154

This makes it impossible to save an XGBoost model in JSON format. A new argument to specify saving format should be added and the line above should be fixed as follows:

- model_data_subpath = "model.xgb" 
+ model_data_subpath = f"model.{save_format}" 

References:

Motivation

What is the use case for this feature?

To provide a way to save an XGBoost model in JSON format.

Why is this use case valuable to support for MLflow users in general?

Why is this use case valuable to support for your project(s) or organization?

Why is it currently difficult to achieve this use case?

Details

No response

What component(s) does this bug affect?

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/pipelines: Pipelines, Pipeline APIs, Pipeline configs, Pipeline Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

What language(s) does this bug affect?

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:11 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
olbapjosecommented, Sep 8, 2022

Just a comment: maybe this would also fix the error thrown when trying to xgboost.log_model with an XGBoost model with categorical variables using the (experimental) support for DataFrame category data type. The error says:

xgboost.core.XGBoostError: [16:56:47] ../src/tree/tree_model.cc:871: Check failed: !HasCategoricalSplit(): 
Please use JSON/UBJSON for saving models with categorical splits.
1reaction
harupycommented, Oct 13, 2022

@AvikantSrivastava Yes, go ahead!

Read more comments on GitHub >

github_iconTop Results From Across the Web

mlflow.xgboost — MLflow 2.0.1 documentation
xgboost module provides an API for logging and loading XGBoost models. This module exports XGBoost models with the following flavors: XGBoost (native) format....
Read more >
Introduction to Model IO — xgboost 1.7.2 documentation
0, we introduced support of using JSON for saving/loading XGBoost models and related hyper-parameters for training, aiming to replace the old binary internal ......
Read more >
MLflow Custom Pyfunc for Saving and Loading Model - Medium
MLflow Models : are a standard format for packaging machine learning models that can be used in a variety of downstream tools. For...
Read more >
Logging MLflow models - Azure Machine Learning
Somehow the default behavior of autolog doesn't fill your purpose. The following example code logs a model for an XGBoost classifier: Python
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found