Throwing a warning for Pandas SparseArrays
See original GitHub issueDescription
Echoing OP’s sentiments from this reddit thread because it’s something I’ve had to learn the hard way as well.
Right now, sklearn secretly inflates Pandas SparseArrays without warning the user. IMO there should be a warning thrown at the very least.
Steps/Code to Reproduce
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
df = pd.DataFrame(np.random.randn(10000, 4))
df.iloc[:9998] = 0
for col in df.columns:
df[col] = pd.SparseArray(df[col], fill_value=0)
l = LinearRegression()
l.fit(df[df.columns[0:2]], df[df.columns[3]])
Using guppy to analyze memory usage, it’s clear that sklearn is inflating this matrix behind the scenes.
Expected Results
sklearn warns the user when inflating a sparse array
Actual Results
sklearn does not warn the user when inflating a sparse arra
Versions
System:
python: 3.7.4 (tags/v3.7.4:e09359112e, Jul 8 2019, 20:34:20) [MSC v.1916 64 bit (AMD64)]
executable: c:\python37\python.exe
machine: Windows-10-10.0.18362-SP0
Python deps:
pip: 19.2.1
setuptools: 40.8.0
sklearn: 0.21.3
numpy: 1.17.0
scipy: 1.3.1
Cython: None
pandas: 0.25.0
Issue Analytics
- State:
- Created 4 years ago
- Comments:9 (8 by maintainers)
Top Results From Across the Web
What's new in 1.5.0 (September 19, 2022) - Pandas
Warning. This feature is experimental, and the API can change in a future release ... PerformanceWarning is now thrown when using string[pyarrow] dtype...
Read more >How to deal with SettingWithCopyWarning in Pandas
None will suppress the warning entirely, and "raise" will throw a SettingWithCopyError , preventing the operation from going through.
Read more >2.21 Returning a view versus a copy — Pandas Doc
What's up with the SettingWithCopy warning? We don't usually throw warnings around when you do something that might cost a few extra milliseconds!...
Read more >SettingwithCopyWarning: How to Fix This Warning in Pandas
Pandas generates the warning when it detects something called chained assignment. Let's define a few terms we'll be using to explain things:.
Read more >v0.25.0 版本特性(2019年7月18日) - Pandas 中文
从0.25.x系列版本开始,Pandas仅支持Python 3.5.3及更高版本。 ... an erroneous warning indicating that a KeyError will be thrown in the future ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Pandas have deprecated the
SparseDataframe
and it suggests to create a normalDataframe
withSparseArray
columns. I was thinking of determining the Sparse matrices usinghasatrr
for the unique attributes of theSparseDataframe
. But as the inputs are Dataframe having Sparse columns there will be the same attributes as the Dataframe. Any other way to differentiate both of them?See https://github.com/pandas-dev/pandas/issues/26706 for a discussion on how you can get for sparse columns in a dataframe, i.e. something like
df.dtypes.apply(pd.api.types.is_sparse).any/all()
(any/all depending on if you want to check for at least 1 or all columns being sparse)