Some problem with the calculation of monetary value
See original GitHub issueHey, I’m confused about the calculation of monetary value
. It seems that it is defined as average transaction value = Total Monetary value/ transaction frequency
. I don’t know whether this frequency means the number of distinct purchase date (if I use daily data) or purchase times. For example, I test the function calibration_and_holdout_data()
with the following dataset.
user_id purchase_date net_gmv
3 2 2017-01-01 110
4 2 2017-01-05 120
0 1 2017-11-19 100
1 1 2017-11-19 150
2 1 2017-12-19 200
7 1 2017-12-19 300
6 1 2017-12-20 250
5 3 2018-01-01 150
8 1 2018-01-01 500
9 1 2018-01-01 700
10 1 2018-02-01 50
11 2 2018-09-01 125
12 2 2018-09-02 100
And I run the function like this:
summary_cal_holdout = calibration_and_holdout_data(
data,
'user_id',
'purchase_date',
monetary_value_col='net_gmv',
calibration_period_end='2017-12-31',
observation_period_end='2018-09-25'
)
The result I got is like this:
user_id frequency_cal recency_cal T_cal monetary_value_cal frequency_holdout monetary_value_holdout duration_holdout
1 2.0 31.0 42.0 375.0 2 416.666667 268
2 1.0 4.0 364.0 120.0 2 112.500000 268
The frequency cal is 2
and the total repeat purchase value for user id =1
in calibration time is:
200+300+250=750
. However, there are 3
transactions. But the monetary value calculated by this function is 375.5
which equals to 750/2
. For the calibration period, the monetary value = total purchase value/frequency
.
However, when we look at the holdout period. For user_id = 1
, the frequency = 2
and the total purchase time=3
, total_purchase_value=150+500+700=1350
. The monetary value = 416.67
which equals to total purchase value/total purchase times
instead of frequency
. Are we supposed to use a different equation for the calculation of monetary value
in calibration and holdout period
???
With this confusion, I could not understand what exactly are these two functions predicting:
-
bgf.conditional_expected_number_of_purchases_up_to_time()
- Should this be frequency or purchase times. If one day one user purchased two times, should this be counted as 2 or 1?
-
ggf.conditional_expected_average_profit()
- For the average profit here, should it be total purchase value/total transaction times or total purchase value/frequency?
Thanks a lot for the explanation!!!
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:5
Top GitHub Comments
This doesn’t seem right. seems like a bug!
This is because summary_data_from_transaction_data() aggregates the rows with the same days.
0 1 2017-11-19 100 1 1 2017-11-19 150 - >
2017-11-19 2502 1 2017-12-19 200 7 1 2017-12-19 300 - > 2017-12-19 500
6 1 2017-12-20 250 - > 2017-12-20 250
The first transaction is always ignored for the RFM calculations (Only used for T if it’s the only transaction).
Hence you’re only left with averaging 500 and 250, with the frequency of 2. Hence 750/2 = 375.5