question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] MLFlow does not support a predictions DataFrame with a MultiIndex

See original GitHub issue

I am using MLFlow to deploy a model that returns a pd.DataFrame with a pd.MultiIndex. Whenever I run the MLFlow wrapper to predict, I see this error that comes from calling json.dump on the MultiIndex DataFrame:

mlflow models predict -m example_model -i data.json -t json --env-manager local 2>&1

[{Traceback (most recent call last): File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/bin/mlflow", line 11, in <module> sys.exit(cli()) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/site-packages/click/core.py", line 1130, in __call__ return self.main(*args, **kwargs) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/site-packages/click/core.py", line 1657, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/site-packages/click/core.py", line 1657, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/site-packages/click/core.py", line 760, in invoke return __callback(*args, **kwargs) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/site-packages/mlflow/models/cli.py", line 125, in predict return _get_flavor_backend( File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/site-packages/mlflow/pyfunc/backend.py", line 137, in predict scoring_server._predict(local_uri, input_path, output_path, content_type, json_format) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/site-packages/mlflow/pyfunc/scoring_server/__init__.py", line 345, in _predict predictions_to_json(pyfunc_model.predict(df), sys.stdout) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/site-packages/mlflow/pyfunc/scoring_server/__init__.py", line 193, in predictions_to_json json.dump(predictions, output, cls=NumpyEncoder) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/json/__init__.py", line 179, in dump for chunk in iterable: File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/json/encoder.py", line 429, in _iterencode yield from _iterencode_list(o, _current_indent_level) File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/json/encoder.py", line 325, in _iterencode_list yield from chunks File "/Users/vsridhar/yes/envs/jupyter_env/envs/FAI-dev/lib/python3.8/json/encoder.py", line 376, in _iterencode_dict raise TypeError(f'keys must be str, int, float, bool or None, ' TypeError: keys must be str, int, float, bool or None, not tuple

I believe this comes from using the predictions_to_json function, which converts a MultiIndex DataFrame into a dictionary like this: [{('top_index_1', 'a'): 10.0, ('top_index_1', 'b'): 5.0, ('top_index_2', 'c'): 15.0, ('top_index_3', 'd'): 20.0}]

The keys here are tuples, which results in the error above.

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:5

github_iconTop GitHub Comments

1reaction
dbczumarcommented, Jun 29, 2022

@arjundc-db Would you be able to take a look here? cc also @WeichenXu123

0reactions
dbczumarcommented, Jul 7, 2022

@arjundc-db @sueann @yunpark93 Pinging here. Can you take a look?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Source code for mlflow.gluon
DataFrame, the predictions are returned in a pandas.DataFrame. If the input is a numpy array, the predictions are returned as either a numpy.ndarray...
Read more >
Python Pandas MultiIndex and reading data from SQL Server
Python Pandas multiIndex is a hierarchical indexing over multiple tuples or arrays of data, enabling advanced dataframe wrangling and ...
Read more >
TypeError: Object of type 'DataFrame' is not JSON serializable
Your df is still a data frame because you haven't assigned it as json. df = df.to_json(). This should work. Let me know...
Read more >
print dataframe metadata
If this is not the case, you may see the error HBase if I could i'd give you two ^ ... with MLflow...
Read more >
API Reference — QLib 0.8.6 documentation
Local provider class It is a set of interface that allow users to access data. Because PITD is not exposed publicly to users,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found