Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Lack of seed during random labeled image selection may be leading to better performance on training resumption

See original GitHub issue

In the function x_u_split(…) in dataset/cifar.py the labeled images are generated without a seed. If training runs consist of multiple start and stops then it is possible that total number of labeled images that the model sees exceeds the set value. For instance, training on 40 labels with 2 stops will lead to 120 unique labeled images over the entire course of training even though the model only sees 40 labeled images at a time. I think this can explain the much higher accuracy obtained by this implementation, especially for the low label tasks.

A quick fix would be adding below snippet before random label generation in the x_u_split(…) function.

np.random.seed(args.seed)

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:5

Top GitHub Comments

1reaction

zou-yiqicommented, Nov 5, 2021

What I mean is that the the seed has been declared and called in main() function, and x_u_split() is used in DATASET_GETTERS() function, which is included in main(), so the seed declared in main() can take effect in the x_u_split(). Thus, we don’t need to add np.random.seed(args.seed) in the x_u_split(…) function. ^ ^

0reactions

dsouzinatorcommented, Nov 9, 2021

@zou-yiqi Yes you’re right. The seed does indeed work in that function. Thanks!