Explanation of the RepeatFactorTrainingSampler.repeat_factors_from_category_frequency(`repeat_thresh`)
See original GitHub issue📚 RepeatFactorTrainingSampler.repeat_factors_from_category_frequency
From the docs repeat_factors_from_category_frequency
repeat_thresh (float) – frequency threshold below which data is repeated. If the frequency is half of repeat_thresh, the image will be repeated twice.
In the source code i find these lines:
# 2. For each category c, compute the category-level repeat factor:
# r(c) = max(1, sqrt(t / f(c)))
Now if f(c) = frequency = 0.5
and t = repeat_thresh = 1
then r(c) = 1.41
Can someone explain the docstring “If the frequency is half of repeat_thresh, the image will be repeated twice.” to me? Based on the example above i would expect that every image in c
to be repeated 1.41
times, not 2.0
as the doc suggests.
Issue Analytics
- State:
- Created a year ago
- Reactions:2
- Comments:5
Top Results From Across the Web
No results found
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks for the answer! In my case, I created a custom
RepeatFactorTrainingSampler
just without thesqrt()
.I still dont really understand the docstring but this could be due to my limited understanding of the sampling method. The implementation references LVIS paper appendix B2 which gives a more in depth description of Mask R-CNN with Data Resampling.
If you want you can make the
RepeatFactorTrainingSampler
balanced by calculating repeat_factors yourself. Maybe thats of any help