expected_cumulative_transactions will calculate actual transactions incorrectly
See original GitHub issueWhile I am using plot_cumulative_transactions, I think I found a bug in utils.expected_cumulative_transactions.
Let’s say I want to build a beta_geo_fitter model, and freq=‘D’. First I will call utils.summary_data_from_transaction_data to summarize my transaction. That function will treat transactions which happened on the same day as one transaction when it calculates frequency.
Later on, when I call plot_cumulative_transactions, I see that it calls utils.expected_cumulative_transactions, but that function will treat multiple transactions happening on the same day as multiple transactions (see the line below).
act_transactions = (transactions_current.groupby(customer_id_col).size() - 1).sum()
As a result, the actual and expected lines I got from plot_cumulative_transactions look very different. I worked around this by calling df.drop_duplicates([customer_id_col, date_col]) on my transaction dataframe first before I call plot_cumulative_transactions.
Issue Analytics
- State:
- Created 6 years ago
- Reactions:1
- Comments:5 (1 by maintainers)
Top GitHub Comments
@CamDavidsonPilon raw transactions are needed to get actual values. That cannot be extracted from final summary matrix. Only if you calculate summary for each time period. P.S. I’m in process of writing tests for that function to get the same values as in BTYD walkthrough.
The
expected_cumulative_transactions
is a bit strange: everywhere else we deal with a summary dataset, but here we ask for the raw transactions.Anyways, the function is broken, but I have a fix like what @patng323 suggests.