question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Lack of seed during random labeled image selection may be leading to better performance on training resumption

See original GitHub issue

In the function x_u_split(…) in dataset/cifar.py the labeled images are generated without a seed. If training runs consist of multiple start and stops then it is possible that total number of labeled images that the model sees exceeds the set value. For instance, training on 40 labels with 2 stops will lead to 120 unique labeled images over the entire course of training even though the model only sees 40 labeled images at a time. I think this can explain the much higher accuracy obtained by this implementation, especially for the low label tasks.

A quick fix would be adding below snippet before random label generation in the x_u_split(…) function.

np.random.seed(args.seed)

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:5

github_iconTop GitHub Comments

1reaction
zou-yiqicommented, Nov 5, 2021

What I mean is that the the seed has been declared and called in main() function, and x_u_split() is used in DATASET_GETTERS() function, which is included in main(), so the seed declared in main() can take effect in the x_u_split(). Thus, we don’t need to add np.random.seed(args.seed) in the x_u_split(…) function. ^ ^

0reactions
dsouzinatorcommented, Nov 9, 2021

@zou-yiqi Yes you’re right. The seed does indeed work in that function. Thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Properly Setting the Random Seed in ML Experiments. Not as ...
Upsampling can lead to overfitting because you're showing the model the same example multiple times. 4. Weight initialization — the initial ...
Read more >
How to Use Random Seeds Effectively | by Jai Bansal
This post is about an aspect of the predictive model-building process that doesn't typically get much attention: random seeds.
Read more >
Properly Setting the Random Seed in ML Experiments
Let's explore areas where randomness appears in machine learning and how to achieve reproducible, deterministic, and more generalizable ...
Read more >
can't reproduce results even set all random seeds #7068
I set all random seeds but I still can't reproduce results. ... This means, if you use such modules in the training graph,...
Read more >
Trainer - Hugging Face
We're on a journey to advance and democratize artificial intelligence through open source and open science.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found