question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Generating text with Transformer XL

See original GitHub issue

Hi everyone,

I am trying to generate text with the pre-trained transformer XL model in a similar way to how we do with the GPT-2 model. But I guess there is a bug in the sample_sequence function after I adjusted to the transformer XL architecture. But the generated text is completely random in general and with respect to the context as well. The core sampling loop looks very similar to the gpt-2 one:

with torch.no_grad():
        for i in trange(length):
            logits, past = model(prev, mems=past)
            logits = logits[:, -1, :] / temperature
            logits = top_k_logits(logits, k=top_k)
            log_probs = F.softmax(logits, dim=-1)
            if sample:
                prev = torch.multinomial(log_probs, num_samples=1)
            else:
                _, prev = torch.topk(log_probs, k=1, dim=-1)
            output = torch.cat((output, prev), dim=1)

What is the bug that I’m missing?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

4reactions
yaroslavvbcommented, Apr 17, 2019

Here’s an example of text generation, picks second most likely word at each step

tokenizer = TransfoXLTokenizer.from_pretrained('transfo-xl-wt103')
model = TransfoXLLMHeadModel.from_pretrained('transfo-xl-wt103')
line = "Cars were invented in"
line_tokenized = tokenizer.tokenize(line)
line_indexed = tokenizer.convert_tokens_to_ids(line_tokenized)
tokens_tensor = torch.tensor([line_indexed])
tokens_tensor = tokens_tensor.to(device)

max_predictions = 50
mems = None
for i in range(max_predictions):
    predictions, mems = model(tokens_tensor, mems=mems)
    predicted_index = torch.topk(predictions[0, -1, :],5)[1][1].item()
    predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0]
    print(predicted_token)
    predicted_index = torch.tensor([[predicted_index]]).to(device)
    tokens_tensor = torch.cat((tokens_tensor, predicted_index), dim=1)

Should produce

Britain
and
America
,
but
the
first
two
cars
had
to
have
been
a
"
Turbo
1reaction
yaroslavvbcommented, Jul 20, 2020

@gussmith you could do it this way, but empirically the results are very bad. The model loss is trained to maximize probability of “next token prediction”. What looks like loss over a loss over whole sequence is actually a parallelization trick to compute many “next token prediction” losses in a single pass.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Music and text generation with Transformer-XL. - GitHub
Description. The goal of this project is to generate long and coherent sequences of data using Transformer architectures based on the following ...
Read more >
Source code for transformers.pipelines.text_generation
This language generation pipeline can currently be loaded from ... Prefix text to help Transformer-XL and XLNet with short prompts as proposed by...
Read more >
Generating text with a transformer language model - O'Reilly
We'll use a pretrained transformer-XL model to generate new text based on an initial input sequence. The goal is to give you a...
Read more >
Transformer-XL: Attentive Language ... - Papers With Code
Task Dataset Model Metric Name Metric V... Language Modelling enwik8 Transformer‑XL (24 layers) Bit per Character (BPC) 0.99 Language Modelling enwik8 Transformer‑XL (24 layers) Number...
Read more >
Text Generation - Happy Transformer
Initialization Arguments: ... We recommend using “HappyTextGeneration(“GPT2”, “gpt2-xl”)” for the best performance. If you are using Google Colab on a free ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found