Bert (sentence classification) output is non-deterministic(have checked previous issue, SET model.eval() )
See original GitHub issueEnvironment info
transformers
version:- Platform: Ubuntu 18.04
- Python version: 3.7.6
- PyTorch version (GPU?): 1.5.1
- Tensorflow version (GPU?): /
- Using GPU in script?: Yes for trainning, Both GPU and CPU for testing scripts
- Using distributed or parallel set-up in script?: Yes for trainning
Who can help
Information
Model I am using (Bert, XLNet …): Bert
The problem arises when using:
- the official example scripts: (give details below)
- [ v] my own modified scripts: (give details below)
The tasks I am working on is:
- an official GLUE/SQUaD task: (give the name)
- [v ] my own task or dataset: (give details below)
I’m using chinese bert to match the similar tags and reduce the size of the database. So I use some mannually merged tags as the dataset, trainning a bert with inputting two tags and outputting the probability of they are similar. It did well after training in the calling of test() function I wrote(of course with model.eval()). But when I save the model to a .pth file and load it in another script, the output is non deterministic.
To reproduce
The whole test scripts is too long, but I have a short test snippet, It should cover the core of this issue.
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("hfl/chinese-bert-wwm-ext")
model = AutoModelForSequenceClassification.from_pretrained("hfl/chinese-bert-wwm-ext")
model.state_dict(torch.load('./weights/best_bert.pth', map_location='cpu'))
# model.cuda()
for i in range(100): # use to control the time to call model.eval()
foo = 1
# model = model.eval()
model.eval()
with torch.no_grad():
srcText = '春天' # 'spring'
tgtText = '春季' # 'spring time'
predict = model(
**tokenizer(text=srcText, text_pair=tgtText,
truncation=True, return_tensors='pt', max_length=256)
)
# NON DETERMINISTIC
print(torch.softmax(predict.logits, dim=1))
Steps to reproduce the behavior:
- run the script above
- change the iterative times for foo = 1, or just do nothing
- run again
- get different outputs logits and probabilities
Expected behavior
Get identical outputs in step 1 and 3
Additional information
I have read issue #4769 and some other similar issues, but I checked again and confirmed I called the function eval()
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (1 by maintainers)
I just managed finally to have deterministic results. If you are still struggling, see https://discuss.huggingface.co/t/initializing-the-weights-of-the-final-layer-of-e-g-bertfortokenclassification-with-a-manual-seed/1377/3
Thank you, I was struggling with trying to figure out why this was happening. I assumed that “random initialization” just meant it was randomly initialized once when the model was instantiated, not every time it’s called. Do you know why it has that behavior? Why wouldn’t it just be initialized randomly once? What tells it to stop being random? (A round of training? A flag?)