[Feature Request] custom fusion method in optimize_fusion
See original GitHub issueIs your feature request related to a problem? Please describe. Hi, you’ve done a great job implementing plenty of different fusion algorithms, but I think it will always be a bottleneck. What would you think about letting the user define their own training function?
Describe the solution you’d like
For example, in optimize_fusion, allow method
to be a callable
and in this case, do not call has_hyperparams
and optimization_switch
.
Describe alternatives you’ve considered
- Open a feature request every time I want to try out something new 😃
- Fork ranx and implement new fusion methods there
My use case/ Ma et al. By the way, at the moment, my use case is to use the default-minimum trick of Ma et al.: when combining results from systems A and B, it consists in giving the minimum score of A’s results if a given document was only retrieved by system B, and vice-versa.
Maybe this is already possible in ranx via some option/method named differently? Or maybe you’d like to add it in the core ranx fusion algorithms?
Issue Analytics
- State:
- Created 10 months ago
- Comments:14 (6 by maintainers)
Top GitHub Comments
I never used
ZMUV
, to be honest. I implemented it for completeness and tried it for comparison purposes but never got better results thanmin-max
,max
, orsum
, which sometimes works the best.In general, I prefer local normalization schemes because they are “unsupervised” and can be used out of the box. Without strong empirical evidence that
default-minimum
(w/ or w/oZMUV
) works better thanmin-max
,max
, orsum
, I would not use it.Also, without a standardized way of normalizing/fusing results is often difficult to understand what brings improvements over the state-of-the-art. Conducting in-depth ablation studies is costly, and we often lack enough space on conference papers to write about them.
Thank you very much, Paul!
I am happy to see that
max-norm
outperformsdefault-minimum
. To give you some context, I added/inventedmax norm
because the minimum score is often unknown. We usually fuse only the top retrieved documents from each model, which makesmin-max
(in this specific context) not very sound to me. I did not do extensive experimentation but from my experiencemax norm
outperformsmin-max
very often.