Add support for pandas 1.4.0
See original GitHub issueLooks like the root cause may be that converting a nullable type to Categorical
preserves the nullable types in the pandas Categorical categories. On the top is the latest pandas and on the bottom is 1.3.4 pandas.
This causes np.asarray calls in our imputers to introduce the new pandas null value place-holder, <NA>
which throws off sklearn:
_Originally posted by @freddyaboulton in https://github.com/alteryx/evalml/issues/3272#issuecomment-1020261986_
This change in behavior breaks tests in three places:
- In our simple imputer, we convert NaturalLanguage to Categorical in order to support imputing natural language via the
most_frequent
strategy. (test_simple_imputer_supports_natural_language_constant
) - In our imputer tests, we test running the imputer on a dataframe that had a nullable int column converted to categorical (
test_imputer_woodwork_custom_overrides_returned_by_components
) - In our tests for EmailFeaturizer, and URLFeaturizer, this causes the categorical features created from Email and URL logical types to have nullable types in the categories because the physical type for Email and URL is text. See (
test_ft_transform_primitive_components.py::test_component_fit_transform[component1-make_data_email_fit_transform_missing_values-make_answer_email_fit_transform_missing_values-make_expected_logical_types_email_fit_transform_missing_values]
)
Additionally:
- There is a warning being emitted by xgboost using
pd.Int64Index
indata.py
(https://github.com/dmlc/xgboost/blob/v1.5.0/python-package/xgboost/data.py#L250). This causestest_xgboost_catch_warnings_label_encoder
to fail.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Installation — pandas 1.4.0 documentation
The easiest way to install pandas is to install it as part of the Anaconda distribution, a cross platform distribution for data analysis...
Read more >pandas · PyPI
pandas is a Python package that provides fast, flexible, and expressive data ... NumPy - Adds support for large, multi-dimensional arrays, matrices and ......
Read more >Could not find a version that satisfies the requirement pandas ...
Upgrade Python or use lower version of pandas. Just pip install pandas should find compatible version. Share.
Read more >How To Install Pandas In Python? An Easy Step By Step ...
Enter the command “pip install pandas” on the terminal. This should launch the pip installer. The required files will be downloaded, and Pandas...
Read more >How To Install Pandas In Python 3.10 (Windows 10) - YouTube
how to install pandas in python windows 10In this video I will show you how to install pandas in python 3.10.By the end...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Looks like the XGBoost warnings are remedied, per https://github.com/dmlc/xgboost/pull/7595! Just need to wait for a release from XGBoost.
@chukarsten Just a heads up, Featuretools doesn’t yet fully support
pandas==1.4.0
either. It might work for you though, but no guarantees at this point.https://github.com/alteryx/featuretools/issues/1865