ENH: Add option to resample data by a non-timeseries column (e.g. Price)
See original GitHub issueIs your feature request related to a problem?
Renko Chart Wiki: https://en.wikipedia.org/wiki/Renko_chart
I’m trying to generate a renko chart using the trade tick data. The data contains Timestamp, Price, Volume
. The Timestamp is in unix milliseconds format. e.g. 1649289600174
.
Pandas already supports OHLC resampling via df.resample('10Min').agg({'Price': 'ohlc'})
. However, I would like resample trade data based on price. Not by Time.
Describe the solution you’d like
I’m looking for a solution that would sort of look like
df.resample('10Num').agg({'Price': 'ohlc', 'Timestamp': 'last'})
.
Here 10 is the brick size
and it is based on the close price. The keyword Num
says, treat this as a numeric value resampling instead of timeseries
resampling. i.e. If the close price hits +10 or -10, then I would like to aggregate that data.
We should also have a flag to ignore down movement.
if ignore_down set to True, then the agg function should ignore the down side movement. e.g. 100 to 90.
API breaking implications
N/A
Describe alternatives you’ve considered
At the moment, I’m creating the renko chart manually using a python loop.
Additional context
N/A
Issue Analytics
- State:
- Created a year ago
- Comments:15 (8 by maintainers)
Top GitHub Comments
Thanks Joris, some good points there
With regards to closing issues - people’s time is very limited, and there’s a lot of open issues, and if there’s one without a clear example with expected output then arguably it’s not worth spending too long on it. But I acknowledge that I locked this one prematurely, apologies!
@jreback @MarcoGorelli The line between genuine usage questions and feature requests is always a bit fuzzy (in the end, many feature requests are backed by a use case, which is often already somehow possible to do in pandas, but the feature request is about making this easier to do). For example, I think there is some feature request hidden here. So as long as we don’t have a better place or discussion forum for such questions/requests (StackOverflow also doesn’t allow any discussion), I personally think we need to be more tolerant in accepting such questions here. Or at least try to first ask for clarification and allow some discussion, before closing the issue. It’s not very welcoming to be directly shut down.
@dsstex Thank you for thinking about how pandas can be improved. Now, I have to say that also for me your question was not very clear. It might be a bit late now, but I still wanted to give you some tips:
For your actual feature request, I think in the meantime it has been answered on StackOverflow by @MarcoGorelli. I also think it is not really a “resample” operation (using pandas’ terminology), because a resample will group all data that fall into a certain (time) interval together, regardless of order of the rows in your DataFrame. After doing the
cut
step to create the actual group key values, what you then want (as far as I understand) is a logic of “group by this key, but only group contiguous values of a given key value”.That can be solved somewhat with the shift+cumsum trick (as shown in the SO answer), but personally I think this is something that we should actually try to make easier to do in pandas. This was long time ago reported in https://github.com/pandas-dev/pandas/issues/5494 as well (which is closed now, but I suppose in favor of this issue: https://github.com/pandas-dev/pandas/issues/4059).