Generating text with Transformer XL
See original GitHub issueHi everyone,
I am trying to generate text with the pre-trained transformer XL model in a similar way to how we do with the GPT-2 model. But I guess there is a bug in the sample_sequence
function after I adjusted to the transformer XL architecture. But the generated text is completely random in general and with respect to the context as well.
The core sampling loop looks very similar to the gpt-2 one:
with torch.no_grad():
for i in trange(length):
logits, past = model(prev, mems=past)
logits = logits[:, -1, :] / temperature
logits = top_k_logits(logits, k=top_k)
log_probs = F.softmax(logits, dim=-1)
if sample:
prev = torch.multinomial(log_probs, num_samples=1)
else:
_, prev = torch.topk(log_probs, k=1, dim=-1)
output = torch.cat((output, prev), dim=1)
What is the bug that I’m missing?
Issue Analytics
- State:
- Created 4 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
Music and text generation with Transformer-XL. - GitHub
Description. The goal of this project is to generate long and coherent sequences of data using Transformer architectures based on the following ...
Read more >Source code for transformers.pipelines.text_generation
This language generation pipeline can currently be loaded from ... Prefix text to help Transformer-XL and XLNet with short prompts as proposed by...
Read more >Generating text with a transformer language model - O'Reilly
We'll use a pretrained transformer-XL model to generate new text based on an initial input sequence. The goal is to give you a...
Read more >Transformer-XL: Attentive Language ... - Papers With Code
Task Dataset Model Metric Name Metric V...
Language Modelling enwik8 Transformer‑XL (24 layers) Bit per Character (BPC) 0.99
Language Modelling enwik8 Transformer‑XL (24 layers) Number...
Read more >Text Generation - Happy Transformer
Initialization Arguments: ... We recommend using “HappyTextGeneration(“GPT2”, “gpt2-xl”)” for the best performance. If you are using Google Colab on a free ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Here’s an example of text generation, picks second most likely word at each step
Should produce
@gussmith you could do it this way, but empirically the results are very bad. The model loss is trained to maximize probability of “next token prediction”. What looks like loss over a loss over whole sequence is actually a parallelization trick to compute many “next token prediction” losses in a single pass.