Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[FR] Add argument (e.g. `save_format`) to specify model save format to `mlflow.xgboost.log_model`

See original GitHub issue

Willingness to contribute

No. I cannot contribute this feature at this time.

Proposal Summary

The XGBoost model file name is currently hardcoded as model.xgb:

https://github.com/mlflow/mlflow/blob/cf96d39d81932423d70d267f8a4ab6640e14ec7e/mlflow/xgboost/__init__.py#L154

This makes it impossible to save an XGBoost model in JSON format. A new argument to specify saving format should be added and the line above should be fixed as follows:

- model_data_subpath = "model.xgb" 
+ model_data_subpath = f"model.{save_format}"

References:

Motivation

What is the use case for this feature?

To provide a way to save an XGBoost model in JSON format.

Why is this use case valuable to support for MLflow users in general?

Why is this use case valuable to support for your project(s) or organization?

Why is it currently difficult to achieve this use case?

Details

No response

What component(s) does this bug affect?

area/artifacts: Artifact stores and artifact logging
area/build: Build and test infrastructure for MLflow
area/docs: MLflow documentation pages
area/examples: Example code
area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
area/models: MLmodel format, model serialization/deserialization, flavors
area/pipelines: Pipelines, Pipeline APIs, Pipeline configs, Pipeline Templates
area/projects: MLproject format, project running backends
area/scoring: MLflow Model server, model deployment tools, Spark UDFs
area/server-infra: MLflow Tracking server backend
area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow’s components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

What language(s) does this bug affect?

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

What integration(s) does this bug affect?

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

Issue Analytics

State:
Created a year ago
Comments:11 (7 by maintainers)

Top GitHub Comments

2reactions

olbapjosecommented, Sep 8, 2022

Just a comment: maybe this would also fix the error thrown when trying to xgboost.log_model with an XGBoost model with categorical variables using the (experimental) support for DataFrame category data type. The error says:

xgboost.core.XGBoostError: [16:56:47] ../src/tree/tree_model.cc:871: Check failed: !HasCategoricalSplit(): 
Please use JSON/UBJSON for saving models with categorical splits.

1reaction

harupycommented, Oct 13, 2022

@AvikantSrivastava Yes, go ahead!

Top Results From Across the Web

mlflow.xgboost — MLflow 2.0.1 documentation

xgboost module provides an API for logging and loading XGBoost models. This module exports XGBoost models with the following flavors: XGBoost (native) format....

Introduction to Model IO — xgboost 1.7.2 documentation

0, we introduced support of using JSON for saving/loading XGBoost models and related hyper-parameters for training, aiming to replace the old binary internal ......

MLflow Custom Pyfunc for Saving and Loading Model - Medium

MLflow Models : are a standard format for packaging machine learning models that can be used in a variety of downstream tools. For...

Logging MLflow models - Azure Machine Learning

Somehow the default behavior of autolog doesn't fill your purpose. The following example code logs a model for an XGBoost classifier: Python