Some problem with the calculation of monetary value
See original GitHub issueHey, I’m confused about the calculation of monetary value
. It seems that it is defined as average transaction value = Total Monetary value/ transaction frequency
. I don’t know whether this frequency means the number of distinct purchase date (if I use daily data) or purchase times. For example, I test the function calibration_and_holdout_data()
with the following dataset.
user_id purchase_date net_gmv
3 2 20170101 110
4 2 20170105 120
0 1 20171119 100
1 1 20171119 150
2 1 20171219 200
7 1 20171219 300
6 1 20171220 250
5 3 20180101 150
8 1 20180101 500
9 1 20180101 700
10 1 20180201 50
11 2 20180901 125
12 2 20180902 100
And I run the function like this:
summary_cal_holdout = calibration_and_holdout_data(
data,
'user_id',
'purchase_date',
monetary_value_col='net_gmv',
calibration_period_end='20171231',
observation_period_end='20180925'
)
The result I got is like this:
user_id frequency_cal recency_cal T_cal monetary_value_cal frequency_holdout monetary_value_holdout duration_holdout
1 2.0 31.0 42.0 375.0 2 416.666667 268
2 1.0 4.0 364.0 120.0 2 112.500000 268
The frequency cal is 2
and the total repeat purchase value for user id =1
in calibration time is:
200+300+250=750
. However, there are 3
transactions. But the monetary value calculated by this function is 375.5
which equals to 750/2
. For the calibration period, the monetary value = total purchase value/frequency
.
However, when we look at the holdout period. For user_id = 1
, the frequency = 2
and the total purchase time=3
, total_purchase_value=150+500+700=1350
. The monetary value = 416.67
which equals to total purchase value/total purchase times
instead of frequency
. Are we supposed to use a different equation for the calculation of monetary value
in calibration and holdout period
???
With this confusion, I could not understand what exactly are these two functions predicting:

bgf.conditional_expected_number_of_purchases_up_to_time()
 Should this be frequency or purchase times. If one day one user purchased two times, should this be counted as 2 or 1?

ggf.conditional_expected_average_profit()
 For the average profit here, should it be total purchase value/total transaction times or total purchase value/frequency?
Thanks a lot for the explanation!!!
Issue Analytics
 State:
 Created 5 years ago
 Reactions:1
 Comments:5
Top GitHub Comments
This doesn’t seem right. seems like a bug!
This is because summary_data_from_transaction_data() aggregates the rows with the same days.
0 1 20171119 100 1 1 20171119 150  >
20171119 2502 1 20171219 200 7 1 20171219 300  > 20171219 500
6 1 20171220 250  > 20171220 250
The first transaction is always ignored for the RFM calculations (Only used for T if it’s the only transaction).
Hence you’re only left with averaging 500 and 250, with the frequency of 2. Hence 750/2 = 375.5