Float64-only pandas Dataframes not supported as flow arguments
See original GitHub issueFirst check
- I added a descriptive title to this issue.
- I used the GitHub search to find a similar issue and didn’t find it.
- I searched the Prefect documentation for this issue.
- I checked that this issue is related to Prefect and not one of its dependencies.
Bug summary
Flows don’t support pandas Dataframes if they contain only float64 columns. Once we add any non-float64 (float32/int etc) column to the dataframe, serialization of the parameters works fine. I’m not sure why this makes any difference.
Adding orjson.OPT_SERIALIZE_NUMPY
flag to prefect.orion.utilities.schemas.orjson_dumps_non_str_keys
solves this issue.
Reproduction
import pandas as pd
from prefect import flow
@flow(validate_parameters=False)
def transformer(df: pd.DataFrame):
return df
x = pd.DataFrame({
"a": [0.],
# "b": [0] # uncomment to make it work
})
print(x.dtypes)
transformer(x)
Error
Traceback (most recent call last):
File "/Users/michal.augoff/Documents/GitHub/mls-mlops/local_run.py", line 15, in <module>
transformer(x)
File "/Users/michal.augoff/anaconda3/envs/mlops2/lib/python3.8/site-packages/prefect/flows.py", line 447, in __call__
return enter_flow_run_engine_from_flow_call(
File "/Users/michal.augoff/anaconda3/envs/mlops2/lib/python3.8/site-packages/prefect/engine.py", line 162, in enter_flow_run_engine_from_flow_call
return anyio.run(begin_run)
File "/Users/michal.augoff/anaconda3/envs/mlops2/lib/python3.8/site-packages/anyio/_core/_eventloop.py", line 70, in run
return asynclib.run(func, *args, **backend_options)
File "/Users/michal.augoff/anaconda3/envs/mlops2/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 292, in run
return native_run(wrapper(), debug=debug)
File "/Users/michal.augoff/anaconda3/envs/mlops2/lib/python3.8/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/Users/michal.augoff/anaconda3/envs/mlops2/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "/Users/michal.augoff/anaconda3/envs/mlops2/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 287, in wrapper
return await func(*args)
File "/Users/michal.augoff/anaconda3/envs/mlops2/lib/python3.8/site-packages/prefect/client/utilities.py", line 47, in with_injected_client
return await fn(*args, **kwargs)
File "/Users/michal.augoff/anaconda3/envs/mlops2/lib/python3.8/site-packages/prefect/engine.py", line 219, in create_then_begin_flow_run
flow_run = await client.create_flow_run(
File "/Users/michal.augoff/anaconda3/envs/mlops2/lib/python3.8/site-packages/prefect/client/orion.py", line 449, in create_flow_run
flow_run_create_json = flow_run_create.dict(json_compatible=True)
File "/Users/michal.augoff/anaconda3/envs/mlops2/lib/python3.8/site-packages/prefect/orion/utilities/schemas.py", line 268, in dict
return json.loads(self.json(*args, **kwargs))
File "/Users/michal.augoff/anaconda3/envs/mlops2/lib/python3.8/site-packages/prefect/orion/utilities/schemas.py", line 238, in json
return super().json(*args, **kwargs)
File "pydantic/main.py", line 505, in pydantic.main.BaseModel.json
File "/Users/michal.augoff/anaconda3/envs/mlops2/lib/python3.8/site-packages/prefect/orion/utilities/schemas.py", line 135, in orjson_dumps_non_str_keys
return orjson.dumps(v, default=default, option=orjson.OPT_NON_STR_KEYS).decode()
TypeError: Type is not JSON serializable: numpy.float64
Versions
2.7.1
Additional context
No response
Issue Analytics
- State:
- Created 9 months ago
- Comments:5 (1 by maintainers)
Top Results From Across the Web
Not supported type: <class 'numpy.float64'> - Stack Overflow
While trying to create dataframe from a dictionary, ...
Read more >Overview of Pandas Data Types - Practical Business Python -
Introduction to pandas data types and how to convert data columns to correct dtypes.
Read more >What's new in 1.5.0 (September 19, 2022) - Pandas
As with DataFrame.groupby() , this argument controls the whether each group ... to_orc() is not supported on Windows yet, you can find valid...
Read more >Load a pandas DataFrame | TensorFlow Core
This tutorial provides examples of how to load pandas DataFrames into TensorFlow. ... Could not load dynamic library 'libnvinfer_plugin.so.7'; ...
Read more >MLflow Models — MLflow 2.0.1 documentation
If the types cannot be made compatible, MLflow will raise an error. ... This loaded PyFunc model can be scored with only DataFrame...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I did some work on parameter validation error handling in https://github.com/PrefectHQ/prefect/pull/6091. I’d need to check a bit more thoroughly, but I don’t think that it handles serialization errors.
We did handle a related pandas serialization error in https://github.com/PrefectHQ/prefect/pull/7385. We solved that by moving to
orjson
with the optionOPT_NON_STR_KEYS
.For what it’s worth, we probably shouldn’t be sending these kinds of parameters to the database anyway unless they’re really small.