Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

PhraseConstraints apearing only directly after input or at the end of the generated sentence

See original GitHub issue

System Info

transformers version: 4.22.0
Platform: Linux-3.10.0-1160.25.1.el7.x86_64-x86_64-with-glibc2.17
Python version: 3.9.12
Huggingface_hub version: 0.9.1
PyTorch version (GPU?): 1.12.1 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No

Who can help?

@patrickvonplaten @Narsil @cwkeam

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

Overview

In the PR that introduced word constraints to the generation function we have an example script --> Example 2: A Mix of Strong Constraint and a Disjunctive Constraint. Following up you see it slightly modified, but the modifications should not have an impact on the output

I added the import for GPT2LMHeadModel and GPT2Tokenizer
I removed the .to(torch_device) for me to run the script
I redid the assertions, so we can run the script on its own --> removing self.....


from transformers import GPT2LMHeadModel, GPT2Tokenizer

model = GPT2LMHeadModel.from_pretrained("gpt2")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

force_word = "scared"
force_flexible = ["scream", "screams", "screaming", "screamed"]

force_words_ids = [
    tokenizer([force_word], add_prefix_space=True, add_special_tokens=False).input_ids,
    tokenizer(force_flexible, add_prefix_space=True, add_special_tokens=False).input_ids,
]

starting_text = ["The soldiers", "The child"]

input_ids = tokenizer(starting_text, return_tensors="pt").input_ids

outputs = model.generate(
    input_ids,
    force_words_ids=force_words_ids,
    num_beams=10,
    num_return_sequences=1,
    no_repeat_ngram_size=1,
    remove_invalid_values=True,
)

generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)

assert generated_text[0] == "The soldiers, who were all scared and screaming at each other as they tried to get out of the"
assert generated_text[1] == "The child was taken to a local hospital where she screamed and scared for her life, police said."

ToDo

run the script on transformers==4.20.1it works perfectly well
run the script on a version above 4.20.1 it will not pass the assertions

Expected behavior

Problem

The constraining algorithm seems to be somewhat broken in versions above 4.20.1 For example on version 4.22we the script generates the following the outputs:

The soldiers, who had been stationed at the base for more than a year before being evacuated screaming scared The child was taken to a local hospital where he died.\n 'I don’t think screaming scared

You can see that the constraints just get added to the end of the generated sentence. In fact, when trying around with constraints, I found out, that they are either placed right after the input: –> example is made up to show what happens…

_The soldiers screaming scared, who had been stationed at the base for more than a year before being evacuated _ The child screaming scared was taken to a local hospital where he died.\n 'I don’t think

or at the end of the generated sentence:

The soldiers, who had been stationed at the base for more than a year before being evacuated screaming scared The child was taken to a local hospital where he died.\n 'I don’t think screaming scared

I expect for the constraints to appear naturally within the generated sentence (like in the testing-script). On versions above 4.20.1 they are just appended in a senseless manner?

hope that helps
pls ask me if you have further questions, through I am a beginner myself

Issue Analytics

State:
Created a year ago
Reactions:1
Comments:13 (7 by maintainers)

Top GitHub Comments

3reactions

gantecommented, Nov 25, 2022

Reopened (it’s still on my generate task queue, which sadly is quite long) 😃

1reaction

patrickvonplatencommented, Sep 27, 2022

@gante more generally should we maybe mark the disjunctive decoding as experimental and state that we don’t actively maintain them? It’s simply too time-consuming to look into this at the moment IMO

Top Results From Across the Web

Generate constraint words within the output sentence and not ...

Now the problem is, that the constraint words will almost certainly appear after the input or at the end of the generated sentence....

How to generate a meaningful sentence from words only?

The dataset: Just take a dataset constisting of sentences. Tokenize each sentence and shuffle the sentences. These shuffled tokens are your ...

The Effect of Number and Presentation Order of High ... - NCBI

Keywords: second language, word learning, sentence constraint, ... or low-constraint sentences ending with known or unknown words. After ...

Constraints on sentence comprehension

Phrase -level contingent frequency constraints, defined as the probability of phrases occurring in particular phrase structure contexts. We ...

Topic-word-constrained sentence generation with variational ...

Constrained sentence generation involves generating sentences under certain constraints such as word, topic, tense, syntax, or sentiment. This task is ...