MetricsSaver Bug when filename is long
See original GitHub issueDescribe the bug
When using the MetricsSaver
handler, a list of filenames will be joined as a string first (see here), and then send to write the metric report.
In distributed mode, the way to join all filenames into a string is directly connect all filenames by the setted delimiter (see here), and all filenames need to be gathered via ignite’s all_gather
function first. However, this method has a limitation that if the length of the string to be gathered is larger than 1024, the string will be truncated and only keep the first 1024 characters (the ignite source code of this function is in here).
Therefore, if the filename string is truncated, the number of metrics will be different from the number of filenames.
It is usual to get a larger than 1024 length string, for instance, a filename on my current working dataset is:
'/workspace/data/medical/Task04_Hippocampus/imagesTr/hippocampus_033.nii.gz'
that has the length 74, thus more than 14 samples will introduce the bug.
Hi @Nic-Ma @wyli , we may need to modify the metric saving way, and avoid gathering the string (tensor or float types do not have the limitation, see here). What do you think?
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (6 by maintainers)
Top GitHub Comments
@Nic-Ma thanks for pinging !
Well, the limitation of 1024 is artificial and was done in a way to pad data into a fixed size tensor. In case of MetricsSaver, maybe, it is better to collect names one by one in order to avoid going over 1024 chars.
@yiheng-wang-nv I do not quite understand your point here. Filepath is 74 chars which is far smaller than 1024, right ? I think also there is a max path lenght on unix systems, like 256 chars … to check that.EDIT: OK, I get it: 74 * 14 = 1036 and this is > 1024.Anyway, @yiheng-wang-nv thanks for reporting that. I think, we can also think about on how to fix this issue from ignite side as well.
Hi @vfdev-5 ,
Thanks for your help here, hopefully you guys will update it in ignite 0.5. In order to fix it in MONAI first, I submitted a PR: https://github.com/Project-MONAI/MONAI/pull/1634.
Thanks.