Matchbox recommender message 7
See original GitHub issueHi everyone,
I’ve been looking for where message 7 (m⁎->s as labelled in the Matchbox paper [1] factor graph) is implemented in the infer.net code, but I can’t seem to find it. As a study excercise, me and @glandfried have been trying to implement the collaborative filtering version of the Matchbox recommender algorithm, as described in the Matchbox paper. We have been able to implement almost everything. However, our implementation of approximate message 7 differs greatly from the exact version of the message, the bimodal one. In contrast, we had no issues with the approximate message 2.
If anyone could help me find the infer.net implementation of the message so we can compare I would appreciate it. So far I could only find approximate message 2 (m⁎->z) at
infer/src/Runtime/Factors/Product.cs#ProductAverageLogarithm
.
As a reminder, I copy here the factor graph,
and the approximation equations (approximate messages 2 and 7 respectively),
From the original Matchbox paper. I interpret this as,
(μt and σ2t denote the mean and variance of the (Gaussian) marginal distribution p(t). μz ->⁎ σ2z ->⁎ denote the mean and variance of message 6).
It would also be nice to get a hint to derive the approximations for 2 and 7 on our own (or a reference).
Thanks in advance!
[1] Matchbox Paper: https://www.microsoft.com/en-us/research/publication/matchbox-large-scale-bayesian-recommendations/
Issue Analytics
- State:
- Created 9 months ago
- Comments:6 (3 by maintainers)
Top GitHub Comments
Message (7) is implemented in infer/src/Runtime/Factors/Product.cs#AAverageLogarithm. Messages (2) and (7) come from Variational Message Passing section 4.2. A concise form of those messages can be found in Table 1 of Gates: A Graphical Notation for Mixture Models.
Our original goal was to implement the matchbox model from scratch as an exercise to learn as much as possible from the methods created by you. Matchbox is particularly interesting because it offers an analytical approximation (the proposed message 2 and 7 in the paper) to a common problem, the multiplication of two Gaussian variables. The issue is that during the implementation process, we found that the approximate message 7 proposed by the matchbox paper does not minimize the reverse KL divergence with respect to the exact message 7. Since we couldn’t find the error in our calculations, we decided to examine the original code implemented by you. Thanks to your initial response, we were able to verify that the exact message 7 calculated by us was correct, as when we apply the AverageLog to it, we obtain exactly the approximate Gaussian proposed in the MatchBox paper. Now the question is why does the approximate message 7 proposed by the paper not minimize the reverse KL divergence? (at least as indicated by our calculations)
Exact analytic message
The following is the exact analytic message.
Each message 7 represents an integral of the joint distribution below the isoline defined by s_k. In the following images, we present the joint distribution with four isolines on the left side, and the corresponding areas below those isolines on the right side.
Collectively, all of these integrals create the likelihood that sends the exact message 7 to the latent variable $s_k$, which naturally exhibits two peaks and two very long tails.
The reverse KL divergence
To evaluate the reverse KL divergence, we implemented a simple numerical integration. For example, when approximating a mixture of Gaussians with a unimodal Gaussian using reverse KL divergence, we obtained the following result. There are two local minima, one for each peak. And there is a single global minimum, corresponding to the highest peak.
Reverse KL divergence between the approximate and the exact messages
The exact message 7 has a similar structure. However, unlike the mixture of Gaussians, the tails of the exact message 7 are extremely wide, which requires the reverse KL minimization approximation to find a very different compromise compared to the example of the mixture of Gaussians.
It seems that the width of the tails has a more significant impact than the two peaks, which, ultimately, are not that far apart.
Thanks for all
We provided this explanation because we are genuinely interested in understanding the details of the valuable methodology that you have developed. We will always be indebted to you. Really thanks @tminka