ENH: allow rolling with non-numerical (eg string) data
See original GitHub issueHi the Pandas
dream team.
I think it would be nice if rolling
could accept strings
as well (see https://stackoverflow.com/questions/52657429/rolling-with-string-variables)
With the abundance of textual data nowadays, we want Pandas
to stay at the top of the curve!
import pandas as pd
import numpy as np
df = pd.DataFrame({'mytime' : [pd.to_datetime('2018-01-01 14:34:12.340'),
pd.to_datetime('2018-01-01 14:34:13.0'),
pd.to_datetime('2018-01-01 14:34:15.342'),
pd.to_datetime('2018-01-01 14:34:16.42'),
pd.to_datetime('2018-01-01 14:34:28.742')],
'myvalue' : [1,2,np.NaN,3,1],
'mychart' : ['a','b','c','d','e']})
df.set_index('mytime', inplace = True)
df
Out[15]:
mychart myvalue
mytime
2018-01-01 14:34:12.340 a 1.0
2018-01-01 14:34:13.000 b 2.0
2018-01-01 14:34:15.342 c NaN
2018-01-01 14:34:16.420 d 3.0
2018-01-01 14:34:28.742 e 1.0
Here I want to concatenate the strings in mychart using the values in the last 2 seconds (not the last two observations).
Unfortunately, both attempts below fail miserably
df.mychart.rolling(window = '2s', closed = 'right').apply(lambda x: ' '.join(x), raw = False)
df.mychart.rolling(window = '2s', closed = 'right').apply(lambda x: (x + ' ').cumsum(), raw = False)
TypeError: cannot handle this type -> object
What do you think? Thanks!
Issue Analytics
- State:
- Created 5 years ago
- Reactions:27
- Comments:16 (7 by maintainers)
Top Results From Across the Web
rolling majority on non-numeric data - pandas - Stack Overflow
Here is a way, using pd.Categorical: import scipy.stats as stats import pandas as pd def majority(window): freqs = stats.itemfreq(window) ...
Read more >What's new in 1.4.0 (January 22, 2022) - Pandas
Now, a warning will be raised if a date string cannot be parsed accordance to the given dayfirst value when the value is...
Read more >360 Core (Client Center, E-Journal Portal): Release Notes ...
Prevent non-numeric data from being entered into some License fields. Allow creation of new LSH without SQL error message. 360 Counter and Intota:...
Read more >Package 'data.table'
i.e., it sees column names as if they are variables. This allows to not just select columns in j, but also compute on...
Read more >In data science, how is non-numerical data, like dates ... - Quora
The simplest case is pure categorical data, stuff that has no numerical ... or whatever—there's a way of converting that date from a...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
+1 to this!
Even if most ops require a cast to float, apply should work on strings. I have a table with timestamps and strings, and I was hoping to group records with time windows and process the strings using apply and a custom function. While there may be a workaround, this seems to me the most natural way of doing it.
Hopefully this can be fixed at some point 😃
@alexanderfrey Thanks! For what it is worth, we ended up finding that support is a little better than we realized. For example, this works:
df.groupby("group")['vector'].apply(lambda x: np.mean(x, axis=0))
I know it’s not quite the same, but just pointing it out in case it helps.
For our “work-around” we just ended up not using rolling. We wanted to use rolling to apply a simple noise-reduction filter but ultimately decided the benefit of doing was going to be marginal at best because of additional post-processing that was also happening.