Lack of seed during random labeled image selection may be leading to better performance on training resumption
See original GitHub issueIn the function x_u_split(…) in dataset/cifar.py the labeled images are generated without a seed. If training runs consist of multiple start and stops then it is possible that total number of labeled images that the model sees exceeds the set value. For instance, training on 40 labels with 2 stops will lead to 120 unique labeled images over the entire course of training even though the model only sees 40 labeled images at a time. I think this can explain the much higher accuracy obtained by this implementation, especially for the low label tasks.
A quick fix would be adding below snippet before random label generation in the x_u_split(…) function.
np.random.seed(args.seed)
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:5
Top Results From Across the Web
Properly Setting the Random Seed in ML Experiments. Not as ...
Upsampling can lead to overfitting because you're showing the model the same example multiple times. 4. Weight initialization — the initial ...
Read more >How to Use Random Seeds Effectively | by Jai Bansal
This post is about an aspect of the predictive model-building process that doesn't typically get much attention: random seeds.
Read more >Properly Setting the Random Seed in ML Experiments
Let's explore areas where randomness appears in machine learning and how to achieve reproducible, deterministic, and more generalizable ...
Read more >can't reproduce results even set all random seeds #7068
I set all random seeds but I still can't reproduce results. ... This means, if you use such modules in the training graph,...
Read more >Trainer - Hugging Face
We're on a journey to advance and democratize artificial intelligence through open source and open science.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
What I mean is that the the seed has been declared and called in main() function, and x_u_split() is used in DATASET_GETTERS() function, which is included in main(), so the seed declared in main() can take effect in the x_u_split(). Thus, we don’t need to add np.random.seed(args.seed) in the x_u_split(…) function. ^ ^
@zou-yiqi Yes you’re right. The seed does indeed work in that function. Thanks!