Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ENH: Add option to resample data by a non-timeseries column (e.g. Price)

See original GitHub issue

Is your feature request related to a problem?

Renko Chart Wiki: https://en.wikipedia.org/wiki/Renko_chart

I’m trying to generate a renko chart using the trade tick data. The data contains Timestamp, Price, Volume. The Timestamp is in unix milliseconds format. e.g. 1649289600174.

Pandas already supports OHLC resampling via df.resample('10Min').agg({'Price': 'ohlc'}). However, I would like resample trade data based on price. Not by Time.

Describe the solution you’d like

I’m looking for a solution that would sort of look like

df.resample('10Num').agg({'Price': 'ohlc', 'Timestamp': 'last'}).

Here 10 is the brick size and it is based on the close price. The keyword Num says, treat this as a numeric value resampling instead of timeseries resampling. i.e. If the close price hits +10 or -10, then I would like to aggregate that data.

We should also have a flag to ignore down movement.

if ignore_down set to True, then the agg function should ignore the down side movement. e.g. 100 to 90.

API breaking implications

N/A

Describe alternatives you’ve considered

At the moment, I’m creating the renko chart manually using a python loop.

Additional context

N/A

Issue Analytics

State:
Created a year ago
Comments:15 (8 by maintainers)

Top GitHub Comments

1reaction

MarcoGorellicommented, May 2, 2022

Thanks Joris, some good points there

With regards to closing issues - people’s time is very limited, and there’s a lot of open issues, and if there’s one without a clear example with expected output then arguably it’s not worth spending too long on it. But I acknowledge that I locked this one prematurely, apologies!

1reaction

jorisvandenbosschecommented, May 2, 2022

@jreback @MarcoGorelli The line between genuine usage questions and feature requests is always a bit fuzzy (in the end, many feature requests are backed by a use case, which is often already somehow possible to do in pandas, but the feature request is about making this easier to do). For example, I think there is some feature request hidden here. So as long as we don’t have a better place or discussion forum for such questions/requests (StackOverflow also doesn’t allow any discussion), I personally think we need to be more tolerant in accepting such questions here. Or at least try to first ask for clarification and allow some discussion, before closing the issue. It’s not very welcoming to be directly shut down.

@dsstex Thank you for thinking about how pandas can be improved. Now, I have to say that also for me your question was not very clear. It might be a bit late now, but I still wanted to give you some tips:

Try to provide an actual reproducible example (Marco gave already a link above, and another one is https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports). This ideally means that there is some code to construct an example dataframe that can be copy-pasted (which in this case should be possible, and for example was done in this SO answer: https://stackoverflow.com/a/71935563).
Try to avoid jargon or don’t assume we know trade data (eg I don’t know what “brick size” is). It that sense, it can also help to more clearly explain how you get from the example input data to the expected result (which steps, in logic or pseudo-code, are taken to obtain the expected result).
In general for feature requests, it is also good to think about how this “generalizes”. Currently it sounds very specific to finance, and if that is the case that can actually be a reason to not include a feature in pandas (pandas already has a vast feature set, and so the bar should be quite high to add yet another feature). To be clear, this is not easy.

For your actual feature request, I think in the meantime it has been answered on StackOverflow by @MarcoGorelli. I also think it is not really a “resample” operation (using pandas’ terminology), because a resample will group all data that fall into a certain (time) interval together, regardless of order of the rows in your DataFrame. After doing the cut step to create the actual group key values, what you then want (as far as I understand) is a logic of “group by this key, but only group contiguous values of a given key value”.
That can be solved somewhat with the shift+cumsum trick (as shown in the SO answer), but personally I think this is something that we should actually try to make easier to do in pandas. This was long time ago reported in https://github.com/pandas-dev/pandas/issues/5494 as well (which is closed now, but I suppose in favor of this issue: https://github.com/pandas-dev/pandas/issues/4059).