question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Float64-only pandas Dataframes not supported as flow arguments

See original GitHub issue

First check

  • I added a descriptive title to this issue.
  • I used the GitHub search to find a similar issue and didn’t find it.
  • I searched the Prefect documentation for this issue.
  • I checked that this issue is related to Prefect and not one of its dependencies.

Bug summary

Flows don’t support pandas Dataframes if they contain only float64 columns. Once we add any non-float64 (float32/int etc) column to the dataframe, serialization of the parameters works fine. I’m not sure why this makes any difference.

Adding orjson.OPT_SERIALIZE_NUMPY flag to prefect.orion.utilities.schemas.orjson_dumps_non_str_keys solves this issue.

Reproduction

import pandas as pd
from prefect import flow


@flow(validate_parameters=False)
def transformer(df: pd.DataFrame):
    return df


x = pd.DataFrame({
    "a": [0.],
    # "b": [0]  # uncomment to make it work
})
print(x.dtypes)
transformer(x)

Error

Traceback (most recent call last):
  File "/Users/michal.augoff/Documents/GitHub/mls-mlops/local_run.py", line 15, in <module>
    transformer(x)
  File "/Users/michal.augoff/anaconda3/envs/mlops2/lib/python3.8/site-packages/prefect/flows.py", line 447, in __call__
    return enter_flow_run_engine_from_flow_call(
  File "/Users/michal.augoff/anaconda3/envs/mlops2/lib/python3.8/site-packages/prefect/engine.py", line 162, in enter_flow_run_engine_from_flow_call
    return anyio.run(begin_run)
  File "/Users/michal.augoff/anaconda3/envs/mlops2/lib/python3.8/site-packages/anyio/_core/_eventloop.py", line 70, in run
    return asynclib.run(func, *args, **backend_options)
  File "/Users/michal.augoff/anaconda3/envs/mlops2/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 292, in run
    return native_run(wrapper(), debug=debug)
  File "/Users/michal.augoff/anaconda3/envs/mlops2/lib/python3.8/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/Users/michal.augoff/anaconda3/envs/mlops2/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/Users/michal.augoff/anaconda3/envs/mlops2/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 287, in wrapper
    return await func(*args)
  File "/Users/michal.augoff/anaconda3/envs/mlops2/lib/python3.8/site-packages/prefect/client/utilities.py", line 47, in with_injected_client
    return await fn(*args, **kwargs)
  File "/Users/michal.augoff/anaconda3/envs/mlops2/lib/python3.8/site-packages/prefect/engine.py", line 219, in create_then_begin_flow_run
    flow_run = await client.create_flow_run(
  File "/Users/michal.augoff/anaconda3/envs/mlops2/lib/python3.8/site-packages/prefect/client/orion.py", line 449, in create_flow_run
    flow_run_create_json = flow_run_create.dict(json_compatible=True)
  File "/Users/michal.augoff/anaconda3/envs/mlops2/lib/python3.8/site-packages/prefect/orion/utilities/schemas.py", line 268, in dict
    return json.loads(self.json(*args, **kwargs))
  File "/Users/michal.augoff/anaconda3/envs/mlops2/lib/python3.8/site-packages/prefect/orion/utilities/schemas.py", line 238, in json
    return super().json(*args, **kwargs)
  File "pydantic/main.py", line 505, in pydantic.main.BaseModel.json
  File "/Users/michal.augoff/anaconda3/envs/mlops2/lib/python3.8/site-packages/prefect/orion/utilities/schemas.py", line 135, in orjson_dumps_non_str_keys
    return orjson.dumps(v, default=default, option=orjson.OPT_NON_STR_KEYS).decode()
TypeError: Type is not JSON serializable: numpy.float64

Versions

2.7.1

Additional context

No response

Issue Analytics

  • State:open
  • Created 9 months ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
peytonrunyancommented, Dec 16, 2022

I did some work on parameter validation error handling in https://github.com/PrefectHQ/prefect/pull/6091. I’d need to check a bit more thoroughly, but I don’t think that it handles serialization errors.

We did handle a related pandas serialization error in https://github.com/PrefectHQ/prefect/pull/7385. We solved that by moving to orjson with the option OPT_NON_STR_KEYS.

0reactions
madkinszcommented, Dec 15, 2022

For what it’s worth, we probably shouldn’t be sending these kinds of parameters to the database anyway unless they’re really small.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Not supported type: <class 'numpy.float64'> - Stack Overflow
While trying to create dataframe from a dictionary, ...
Read more >
Overview of Pandas Data Types - Practical Business Python -
Introduction to pandas data types and how to convert data columns to correct dtypes.
Read more >
What's new in 1.5.0 (September 19, 2022) - Pandas
As with DataFrame.groupby() , this argument controls the whether each group ... to_orc() is not supported on Windows yet, you can find valid...
Read more >
Load a pandas DataFrame | TensorFlow Core
This tutorial provides examples of how to load pandas DataFrames into TensorFlow. ... Could not load dynamic library 'libnvinfer_plugin.so.7'; ...
Read more >
MLflow Models — MLflow 2.0.1 documentation
If the types cannot be made compatible, MLflow will raise an error. ... This loaded PyFunc model can be scored with only DataFrame...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found