question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

PhraseConstraints apearing only directly after input or at the end of the generated sentence

See original GitHub issue

System Info

  • transformers version: 4.22.0
  • Platform: Linux-3.10.0-1160.25.1.el7.x86_64-x86_64-with-glibc2.17
  • Python version: 3.9.12
  • Huggingface_hub version: 0.9.1
  • PyTorch version (GPU?): 1.12.1 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: No

Who can help?

@patrickvonplaten @Narsil @cwkeam

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

Overview

In the PR that introduced word constraints to the generation function we have an example script --> Example 2: A Mix of Strong Constraint and a Disjunctive Constraint. Following up you see it slightly modified, but the modifications should not have an impact on the output

  • I added the import for GPT2LMHeadModel and GPT2Tokenizer
  • I removed the .to(torch_device) for me to run the script
  • I redid the assertions, so we can run the script on its own --> removing self.....

from transformers import GPT2LMHeadModel, GPT2Tokenizer

model = GPT2LMHeadModel.from_pretrained("gpt2")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

force_word = "scared"
force_flexible = ["scream", "screams", "screaming", "screamed"]

force_words_ids = [
    tokenizer([force_word], add_prefix_space=True, add_special_tokens=False).input_ids,
    tokenizer(force_flexible, add_prefix_space=True, add_special_tokens=False).input_ids,
]

starting_text = ["The soldiers", "The child"]

input_ids = tokenizer(starting_text, return_tensors="pt").input_ids

outputs = model.generate(
    input_ids,
    force_words_ids=force_words_ids,
    num_beams=10,
    num_return_sequences=1,
    no_repeat_ngram_size=1,
    remove_invalid_values=True,
)

generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)

assert generated_text[0] == "The soldiers, who were all scared and screaming at each other as they tried to get out of the"
assert generated_text[1] == "The child was taken to a local hospital where she screamed and scared for her life, police said."

ToDo

  • run the script on transformers==4.20.1it works perfectly well
  • run the script on a version above 4.20.1 it will not pass the assertions

Expected behavior

Problem

The constraining algorithm seems to be somewhat broken in versions above 4.20.1 For example on version 4.22we the script generates the following the outputs:

The soldiers, who had been stationed at the base for more than a year before being evacuated screaming scared The child was taken to a local hospital where he died.\n 'I don’t think screaming scared

You can see that the constraints just get added to the end of the generated sentence. In fact, when trying around with constraints, I found out, that they are either placed right after the input: –> example is made up to show what happens…

_The soldiers screaming scared, who had been stationed at the base for more than a year before being evacuated _ The child screaming scared was taken to a local hospital where he died.\n 'I don’t think

or at the end of the generated sentence:

The soldiers, who had been stationed at the base for more than a year before being evacuated screaming scared The child was taken to a local hospital where he died.\n 'I don’t think screaming scared


  • I expect for the constraints to appear naturally within the generated sentence (like in the testing-script). On versions above 4.20.1 they are just appended in a senseless manner?

  • hope that helps
  • pls ask me if you have further questions, through I am a beginner myself

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:1
  • Comments:13 (7 by maintainers)

github_iconTop GitHub Comments

3reactions
gantecommented, Nov 25, 2022

Reopened (it’s still on my generate task queue, which sadly is quite long) 😃

1reaction
patrickvonplatencommented, Sep 27, 2022

@gante more generally should we maybe mark the disjunctive decoding as experimental and state that we don’t actively maintain them? It’s simply too time-consuming to look into this at the moment IMO

Read more comments on GitHub >

github_iconTop Results From Across the Web

Generate constraint words within the output sentence and not ...
Now the problem is, that the constraint words will almost certainly appear after the input or at the end of the generated sentence....
Read more >
How to generate a meaningful sentence from words only?
The dataset: Just take a dataset constisting of sentences. Tokenize each sentence and shuffle the sentences. These shuffled tokens are your ...
Read more >
The Effect of Number and Presentation Order of High ... - NCBI
Keywords: second language, word learning, sentence constraint, ... or low-constraint sentences ending with known or unknown words. After ...
Read more >
Constraints on sentence comprehension
Phrase -level contingent frequency constraints, defined as the probability of phrases occurring in particular phrase structure contexts. We ...
Read more >
Topic-word-constrained sentence generation with variational ...
Constrained sentence generation involves generating sentences under certain constraints such as word, topic, tense, syntax, or sentiment. This task is ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found