question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Large model (762M) keeps repeating the same utterance

See original GitHub issue

I tried running the large model (in a colab notebook) using the approach described in the model card from the huggingface library:

from transformers import AutoModelWithLMHead, AutoTokenizer
import torch


tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-large")
model = AutoModelWithLMHead.from_pretrained("microsoft/DialoGPT-large")

# Let's chat for 5 lines
for step in range(5):
    # encode the new user input, add the eos_token and return a tensor in Pytorch
    new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='pt')

    # append the new user input tokens to the chat history
    bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids

    # generated a response while limiting the total chat history to 1000 tokens, 
    chat_history_ids = model.generate(bot_input_ids, max_length=1000, pad_token_id=tokenizer.eos_token_id)

    # pretty print last ouput tokens from bot
    print("DialoGPT: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))

Unfortunately, the output seems to quickly repeat the same sentence over and over again. In example 1 and 2, it repeats the same sentence from the beginning. In the 3rd case, the model starts fine, but as the conversation progresses, it starts repeating the same thing (or parts of it).

Is that an intended behavior?

Example 1

>> User:Today, I bought some ice cream.
DialoGPT: I'm so sorry.
>> User:Why are you sorry?
DialoGPT: I'm sorry for your loss.
>> User:What loss? I didn't lose anything.
DialoGPT: I'm sorry for your loss.
>> User:I'm very confused by what you are saying.
DialoGPT: I'm sorry for your loss.
>> User:Ok, nevermind.
DialoGPT: I'm sorry for your loss.

Example 2

>> User:Can't wait to upvote this 17 different times later this week.
DialoGPT: I'm going to upvote it every time.
>> User:What are you going to upvote every time?
DialoGPT: I'm going to upvote it every time.
>> User:Uh sure, if that's what you want.
DialoGPT: I'm going to upvote it every time.
>> User:I'm very confused by what you are saying.
DialoGPT: I'm going to upvote it every time.
>> User:What's your name?
DialoGPT: I'm going to upvote it every time.

Example 3

>> User:I like to walk my dog in the park.
DialoGPT: I like to walk my dog in the park.
>> User:That's great. What's your dog breed?
DialoGPT: He's a mix of some sort.
>> User:That's funny, you don't even know what mix it is.
DialoGPT: I know what mix it is.
>> User:What is it, then?
DialoGPT: A mix of some sort.
>> User:What sort? "Sort" is not a type of dog.
DialoGPT: I know what mix it is.
>> User:That's not the issue. I'm talking about your dog's breed.
DialoGPT: I know what mix it is.
>> User:Then tell me!
DialoGPT: I know what mix it is.
>> User:What are you waiting for?
DialoGPT: I know what mix?
>> User:Are you serious?
DialoGPT: I know what?
>> User:What?
DialoGPT: I know what?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:3
  • Comments:10 (2 by maintainers)

github_iconTop GitHub Comments

17reactions
xhlucacommented, Aug 26, 2020

I just tried your method, as well as the top-p/top-k method from the huggingface tutorial. Here are the results.

Greedy

# Let's chat for 5 lines
for step in range(5):
    # encode the new user input, add the eos_token and return a tensor in Pytorch
    new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='pt').to('cuda')

    # append the new user input tokens to the chat history
    bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids

    # generated a response while limiting the total chat history to 1000 tokens, 
    chat_history_ids = model.generate(bot_input_ids, max_length=1000, pad_token_id=tokenizer.eos_token_id)

    # pretty print last ouput tokens from bot
    print("DialoGPT: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))

Result:

>> User:Today, I bought some ice cream.
DialoGPT: I'm going to buy some ice cream.
>> User:Ok, but hwy?
DialoGPT: I'm going to buy some ice cream.
>> User:Fine. You already said that
DialoGPT: I'm going to buy some ice cream.
>> User:Ok, goodbye!
DialoGPT: I'm going to buy some ice cream.
>> User:Stop
DialoGPT: I'm going to buy some ice cream.

Temperature + repetition penalty (from @pablonm3)

for step in range(5):
    # encode the new user input, add the eos_token and return a tensor in Pytorch
    new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='pt').to('cuda')

    # append the new user input tokens to the chat history
    bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids

    # generated a response while limiting the total chat history to 1000 tokens, 
    chat_history_ids = model.generate(bot_input_ids, max_length=1000, pad_token_id=tokenizer.eos_token_id, temperature=0.6, repetition_penalty=1.3)

    # pretty print last ouput tokens from bot
    print("DialoGPT: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))

Result:

>> User:Today, I bought some ice cream.
DialoGPT: I'm going to buy a boat.
>> User:Wow that was random! but happy for you
DialoGPT: It's not like he has anything else to do!
>> User:What has what?
DialoGPT: A boat?
>> User:You said he?
DialoGPT: No, I didn't say him. He is the one who said it.
>> User:Ok fine. Whatever you say
DialoGPT: He's just saying that because of his name.

Top-p + top-k (from the tutorial)

# Let's chat for 5 lines
for step in range(5):
    # encode the new user input, add the eos_token and return a tensor in Pytorch
    new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='pt').to('cuda')

    # append the new user input tokens to the chat history
    bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids

    # generated a response while limiting the total chat history to 1000 tokens,
    chat_history_ids = model.generate(
        bot_input_ids,
        do_sample=True, 
        max_length=1000,
        top_k=50, 
        top_p=0.95,
        pad_token_id=tokenizer.eos_token_id
    )

    # pretty print last ouput tokens from bot
    print("DialoGPT: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))

Result:

>> User:Today, I bought some ice cream.
DialoGPT: Me too! :D
>> User:Nice! What brand did you buy?
DialoGPT: Strawberry and Vanilla
>> User:That's not a brand!
DialoGPT: Yup :P
>> User:Ok fine, anyway. What else did you do?
DialoGPT: I ate candy bars
>> User:cool! Were they good?
DialoGPT: They were. It was kinda like a snickerdoodle from my younger years.
5reactions
pablonm3commented, Aug 25, 2020

I tinkered a bit w the temperature and repetition_penalty parameters and got decent results, this is my code:

for step in range(50):
    # encode the new user input, add the eos_token and return a tensor in Pytorch
    new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='pt')

    # append the new user input tokens to the chat history
    bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids

    # generated a response while limiting the total chat history to 1000 tokens, 
    chat_history_ids = model.generate(bot_input_ids, max_length=1000, pad_token_id=tokenizer.eos_token_id, temperature=0.6, repetition_penalty=1.3)

    # pretty print last ouput tokens from bot
    print("DialoGPT: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))
Read more comments on GitHub >

github_iconTop Results From Across the Web

Repeating purposefully - SAGE Journals
The repeated utterance might be heard at a delay of hours, days, ... The same primary author, Prizant, posited a somewhat similar model...
Read more >
PENAL CODE CHAPTER 42. DISORDERLY CONDUCT AND ...
(1) uses abusive, indecent, profane, or vulgar language in a public place, and the language by its very utterance tends to incite an...
Read more >
First-mention lengthening as an automated act
the speaker's ability to model the listener, or at least of the idea that low- ... Speech rate (syllables per second) within the...
Read more >
Attractiveness and distinctiveness between speakers' voices ...
At the same time, averaging decreases distinctiveness by ... the utterances in similar style, intensity and timing as the model could have ...
Read more >
Who Is Speaking to Whom? Learning to Identify Utterance ...
repeat the above steps until the addressees of all utterances are identified. 4 Our W2W Model. In this section, we first describe each...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found