Some problem with the calculation of monetary value

See original GitHub issue

Hey, I’m confused about the calculation of monetary value. It seems that it is defined as average transaction value = Total Monetary value/ transaction frequency. I don’t know whether this frequency means the number of distinct purchase date (if I use daily data) or purchase times. For example, I test the function calibration_and_holdout_data() with the following dataset.

	user_id	purchase_date	net_gmv
3	2	2017-01-01	110
4	2	2017-01-05	120
0	1	2017-11-19	100
1	1	2017-11-19	150
2	1	2017-12-19	200
7	1	2017-12-19	300
6	1	2017-12-20	250
5	3	2018-01-01	150
8	1	2018-01-01	500
9	1	2018-01-01	700
10	1	2018-02-01	50
11	2	2018-09-01	125
12	2	2018-09-02	100

And I run the function like this:

summary_cal_holdout = calibration_and_holdout_data(

The result I got is like this:

user_id	frequency_cal recency_cal	T_cal monetary_value_cal frequency_holdout  monetary_value_holdout	duration_holdout							
1	2.0	31.0	42.0	375.0	2	416.666667	268
2	1.0	4.0	364.0	120.0	2	112.500000	268

The frequency cal is 2 and the total repeat purchase value for user id =1 in calibration time is: 200+300+250=750. However, there are 3 transactions. But the monetary value calculated by this function is 375.5 which equals to 750/2. For the calibration period, the monetary value = total purchase value/frequency.

However, when we look at the holdout period. For user_id = 1, the frequency = 2 and the total purchase time=3, total_purchase_value=150+500+700=1350. The monetary value = 416.67 which equals to total purchase value/total purchase times instead of frequency. Are we supposed to use a different equation for the calculation of monetary value in calibration and holdout period???

With this confusion, I could not understand what exactly are these two functions predicting:

  1. bgf.conditional_expected_number_of_purchases_up_to_time()

    • Should this be frequency or purchase times. If one day one user purchased two times, should this be counted as 2 or 1?
  2. ggf.conditional_expected_average_profit()

    • For the average profit here, should it be total purchase value/total transaction times or total purchase value/frequency?

Thanks a lot for the explanation!!!

aggie13commented, Oct 18, 2018

aggie13commented, Oct 18, 2018

This doesn’t seem right. seems like a bug!

Trollgeircommented, Sep 29, 2018

This is because summary_data_from_transaction_data() aggregates the rows with the same days.

0 1 2017-11-19 100 1 1 2017-11-19 150 - > 2017-11-19 250

2 1 2017-12-19 200 7 1 2017-12-19 300 - > 2017-12-19 500

6 1 2017-12-20 250 - > 2017-12-20 250

The first transaction is always ignored for the RFM calculations (Only used for T if it’s the only transaction).

Hence you’re only left with averaging 500 and 250, with the frequency of 2. Hence 750/2 = 375.5

