Unexpected sequences_scores in BeamSearchDecoderOnlyOutput
See original GitHub issueEnvironment info
transformers
version: 4.11.3- Platform: Linux-5.4.104±x86_64-with-Ubuntu-18.04-bionic
- Python version: 3.7.12
- PyTorch version (GPU?): 1.9.0+cu111 (False)
- Tensorflow version (GPU?): 2.6.0 (False)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No
Who can help
Information
This is a follow up to issue https://github.com/huggingface/transformers/issues/14065. I calculate the probability of each generated token conditional on the previous generated tokens as suggested in that previous issue:
import torch
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("gpt2", return_dict_in_generate=True)
tokenizer = AutoTokenizer.from_pretrained("gpt2")
input_ids = tokenizer("Today is a nice day", return_tensors="pt").input_ids
generated_outputs = model.generate(input_ids, num_beams=2, max_length=8, output_scores=True)
gen_ids = generated_outputs["sequences"][0, input_ids.shape[-1]:]
vocab_size = generated_outputs["scores"][0].shape[-1]
print(gen_ids) # tensor([ 11, 475, 314])
# Here we find out at each time-step, which beam each generated id belongs to.
values, indices = torch.topk(generated_outputs["scores"][0].view(-1), k=2)
print(values, (indices % vocab_size), (indices / vocab_size).long()) # tensor([-1.5148, -1.8792]) tensor([329, 11]) tensor([0, 0])
values, indices = torch.topk(generated_outputs["scores"][1].view(-1), k=2)
print(values, (indices % vocab_size), (indices / vocab_size).long()) # tensor([-3.6481, -3.9508]) tensor([475, 262]) tensor([1, 0])
values, indices = torch.topk(generated_outputs["scores"][2].view(-1), k=2)
print(values, (indices % vocab_size), (indices / vocab_size).long()) # tensor([-5.6957, -5.8748]) tensor([314, 340]) tensor([0, 0])
# So we know at time-step 1, the token id `475` belongs to beam 1, others belong to beam 0.
logprob_gen_token0 = generated_outputs["scores"][0][0, gen_ids[0]]
logprob_gen_token1 = generated_outputs["scores"][1][1, gen_ids[1]] - logprob_gen_token0
logprob_gen_token2 = generated_outputs["scores"][2][0, gen_ids[2]] - logprob_gen_token0 - logprob_gen_token1
print(logprob_gen_token0.exp(), logprob_gen_token1.exp(), logprob_gen_token2.exp())
# Outputs:
# tensor(0.1527) tensor(0.1705) tensor(0.1290)
Now I look at the sequences_scores
:
# tensor([0.4907])
generated_outputs["sequences_scores"].exp()
I would have expected the sequences_scores
to be equal to the product of the probabilities of each generated token conditional on the previous generated tokens: 0.1527 * 0.1705 * 0.1290 = 0.0034.
I’m probably misunderstanding something. Thanks in advance for your help!
To reproduce
See this colab: https://colab.research.google.com/drive/11rRAFuNycLLDiDDwU02mBgXjpBOXCe4P#scrollTo=55CjTHwLc_gE
Expected behavior
I would have expected the sequences_scores
to be equal to the product of the probabilities of each generated token conditional on the previous generated tokens: 0.1527 * 0.1705 * 0.1290 = 0.0034
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (5 by maintainers)
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.