question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unexpected sequences_scores in BeamSearchDecoderOnlyOutput

See original GitHub issue

Environment info

  • transformers version: 4.11.3
  • Platform: Linux-5.4.104±x86_64-with-Ubuntu-18.04-bionic
  • Python version: 3.7.12
  • PyTorch version (GPU?): 1.9.0+cu111 (False)
  • Tensorflow version (GPU?): 2.6.0 (False)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: No
  • Using distributed or parallel set-up in script?: No

Who can help

@qqaatw

Information

This is a follow up to issue https://github.com/huggingface/transformers/issues/14065. I calculate the probability of each generated token conditional on the previous generated tokens as suggested in that previous issue:

import torch

from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("gpt2", return_dict_in_generate=True)
tokenizer = AutoTokenizer.from_pretrained("gpt2")

input_ids = tokenizer("Today is a nice day", return_tensors="pt").input_ids

generated_outputs = model.generate(input_ids, num_beams=2, max_length=8, output_scores=True)
gen_ids = generated_outputs["sequences"][0, input_ids.shape[-1]:]
vocab_size = generated_outputs["scores"][0].shape[-1]


print(gen_ids) # tensor([ 11, 475, 314])

# Here we find out at each time-step, which beam each generated id belongs to.
values, indices = torch.topk(generated_outputs["scores"][0].view(-1), k=2)
print(values, (indices % vocab_size), (indices / vocab_size).long()) # tensor([-1.5148, -1.8792]) tensor([329,  11]) tensor([0, 0])
values, indices = torch.topk(generated_outputs["scores"][1].view(-1), k=2)
print(values, (indices % vocab_size), (indices / vocab_size).long()) # tensor([-3.6481, -3.9508]) tensor([475, 262]) tensor([1, 0])
values, indices = torch.topk(generated_outputs["scores"][2].view(-1), k=2)
print(values, (indices % vocab_size), (indices / vocab_size).long()) # tensor([-5.6957, -5.8748]) tensor([314, 340]) tensor([0, 0])

# So we know at time-step 1, the token id `475` belongs to beam 1, others belong to beam 0.

logprob_gen_token0 = generated_outputs["scores"][0][0, gen_ids[0]]
logprob_gen_token1 = generated_outputs["scores"][1][1, gen_ids[1]] - logprob_gen_token0
logprob_gen_token2 = generated_outputs["scores"][2][0, gen_ids[2]] - logprob_gen_token0 - logprob_gen_token1

print(logprob_gen_token0.exp(), logprob_gen_token1.exp(), logprob_gen_token2.exp()) 

# Outputs:
# tensor(0.1527) tensor(0.1705) tensor(0.1290)

Now I look at the sequences_scores:

# tensor([0.4907])
generated_outputs["sequences_scores"].exp()

I would have expected the sequences_scores to be equal to the product of the probabilities of each generated token conditional on the previous generated tokens: 0.1527 * 0.1705 * 0.1290 = 0.0034.

I’m probably misunderstanding something. Thanks in advance for your help!

To reproduce

See this colab: https://colab.research.google.com/drive/11rRAFuNycLLDiDDwU02mBgXjpBOXCe4P#scrollTo=55CjTHwLc_gE

Expected behavior

I would have expected the sequences_scores to be equal to the product of the probabilities of each generated token conditional on the previous generated tokens: 0.1527 * 0.1705 * 0.1290 = 0.0034

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:9 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
qqaatwcommented, Oct 21, 2021
# Get average of log probs.
sum_logprob = sum((logprob_beam0_gen_token0, logprob_beam0_gen_token1, logprob_beam0_gen_token2))
seq_scores = (sum_logprob / generated_outputs["sequences"].shape[-1])

print(seq_scores == generated_outputs["sequences_scores"][0]) # True
0reactions
github-actions[bot]commented, Dec 31, 2021

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Utilities for Generation - Hugging Face
Base class for outputs of decoder-only generation models using greedy search. class transformers.generation.GreedySearchEncoderDecoderOutput.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found