Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Generating text with Transformer XL

See original GitHub issue

Hi everyone,

I am trying to generate text with the pre-trained transformer XL model in a similar way to how we do with the GPT-2 model. But I guess there is a bug in the sample_sequence function after I adjusted to the transformer XL architecture. But the generated text is completely random in general and with respect to the context as well. The core sampling loop looks very similar to the gpt-2 one:

with torch.no_grad():
        for i in trange(length):
            logits, past = model(prev, mems=past)
            logits = logits[:, -1, :] / temperature
            logits = top_k_logits(logits, k=top_k)
            log_probs = F.softmax(logits, dim=-1)
            if sample:
                prev = torch.multinomial(log_probs, num_samples=1)
            else:
                _, prev = torch.topk(log_probs, k=1, dim=-1)
            output = torch.cat((output, prev), dim=1)

What is the bug that I’m missing?

Issue Analytics

State:
Created 4 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

4reactions

yaroslavvbcommented, Apr 17, 2019

Here’s an example of text generation, picks second most likely word at each step

tokenizer = TransfoXLTokenizer.from_pretrained('transfo-xl-wt103')
model = TransfoXLLMHeadModel.from_pretrained('transfo-xl-wt103')
line = "Cars were invented in"
line_tokenized = tokenizer.tokenize(line)
line_indexed = tokenizer.convert_tokens_to_ids(line_tokenized)
tokens_tensor = torch.tensor([line_indexed])
tokens_tensor = tokens_tensor.to(device)

max_predictions = 50
mems = None
for i in range(max_predictions):
    predictions, mems = model(tokens_tensor, mems=mems)
    predicted_index = torch.topk(predictions[0, -1, :],5)[1][1].item()
    predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0]
    print(predicted_token)
    predicted_index = torch.tensor([[predicted_index]]).to(device)
    tokens_tensor = torch.cat((tokens_tensor, predicted_index), dim=1)

Should produce

Britain
and
America
,
but
the
first
two
cars
had
to
have
been
a
"
Turbo

1reaction

yaroslavvbcommented, Jul 20, 2020

@gussmith you could do it this way, but empirically the results are very bad. The model loss is trained to maximize probability of “next token prediction”. What looks like loss over a loss over whole sequence is actually a parallelization trick to compute many “next token prediction” losses in a single pass.

Top Results From Across the Web

Music and text generation with Transformer-XL. - GitHub

Description. The goal of this project is to generate long and coherent sequences of data using Transformer architectures based on the following ...

Source code for transformers.pipelines.text_generation

This language generation pipeline can currently be loaded from ... Prefix text to help Transformer-XL and XLNet with short prompts as proposed by...

Generating text with a transformer language model - O'Reilly

We'll use a pretrained transformer-XL model to generate new text based on an initial input sequence. The goal is to give you a...

Transformer-XL: Attentive Language ... - Papers With Code

Task Dataset Model Metric Name Metric V... Language Modelling enwik8 Transformer‑XL (24 layers) Bit per Character (BPC) 0.99 Language Modelling enwik8 Transformer‑XL (24 layers) Number...

Text Generation - Happy Transformer

Initialization Arguments: ... We recommend using “HappyTextGeneration(“GPT2”, “gpt2-xl”)” for the best performance. If you are using Google Colab on a free ...