How to Recognize if your data is fit for (this library's) Customer Segmentation and CLV?
See original GitHub issueI’m trying to use this library with my data — which is real data and consists of more than 500,000 customers —, however I’m getting weird behavior with respect to the fitting of the distributions, more specifically, my alpha
is way too large (approaching 320 without any penalty). If I put an L2 Penalty of 0.5, I can get a much smaller alpha
(still very very big), but a
and b
get frighteningly small:
parameter | coef | se(coef) | lower 95% bound | upper 95% bound |
---|---|---|---|---|
r | 0.628972 | 0.000931 | 0.627148 | 0.630795 |
alpha | 57.421640 | 0.148712 | 57.130166 | 57.713115 |
a | 0.002099 | 0.000076 | 0.001950 | 0.002247 |
b | 0.062845 | 0.001426 | 0.060050 | 0.065641 |
What is the intuition behind r
, alpha
, a
and b
? If you think that these questions are not adequate as a discussion on github
, feel free to point me to another forum.
I suspect that this problem comes from wrong assumptions with respect to the distribution of my RF variables (perhaps, too skewed a frequency distribution). However, I haven’t yet understood what those are specifically (are there any really?), since what I’ve found so far in the literature mostly discusses the assumptions of individual behavior (e.g. number of transactions of a customer follows a Poisson distribution). In my case, the histograms for x
, t_x
, T
give me:
Needless to say, my model fitting is pretty bad:
Btw, I think you’ve forgot to place a Paypal link on your main page in order for us to donate money to you 😉
Issue Analytics
- State:
- Created 4 years ago
- Comments:12
Top GitHub Comments
Dear Philippe,
I hope my comment below can help
I tested lifetimes in Feb with 3 retail related datasets that I could find on the internet before I apply my companies’ data. In 2 cases, the forecast performances were quite well, but in one case the forecast result exhibited similar forecast performance that you showed. In the one case with a bad forecast, I noticed the frequency graph did not exhibit exponential like distribution so the forecast result was not good
But it looks like your frequency data is not the same as the case I encountered before. May I understand more detail about your use case and the background of your dataset?
In my case, when I use lifetimes with my company’s dataset, forecast performance went well
hmikelee
The changes suggested in the interesting #280 issue didn’t amount to much unfortunately, but thanks for the suggestion.
How would you suggest I fit my plots better more specifically? As far as I’m aware of, the only parameter that could amount to some change is the model’s
L2 Penalty
, and I couldn’t get much change by varying it, especially in the last graph.I’m starting to believe the issue might actually be related to my assumptions about the dataset. Besides the fact that this is a homologation dataset, these specific clients’ behavior will most likely differ from a strictly non-contractual one, because there is some periodicity to their purchases, since people do go to supermarkets on a regular basis. It seems inherently different to people who, for example, would buy DVDs, which is something more sporadic.
That would more or less explain why the model’s fit is understanding that the clients are basically always alive. In the end, I might have to compare my results to simply using contractual formulas — which is something I’ve never studied T.T.