question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Is it possible to use AutoregressiveWrapper in combination with ContinuousTransformerWrapper

See original GitHub issue

Can I combine the AutoregressiveWrapper with the ContinuousTransformerWrapper? ignore_index and pad_value are scalars and should be tensors here I believe.

Should I create a custom ContinuousAutoregressiveWrapper for this?

I’m trying to use it like this:

model = AutoregressiveWrapper(ContinuousTransformerWrapper(
    max_seq_len=self.max_sequence_length,
    dim_in=self.vector_dimension,
    dim_out=self.vector_dimension,
    emb_dim=self.embedding_dimension,
    use_pos_emb=True,
    attn_layers=Decoder(
        dim=self.embedding_dimension,
        depth=self.depth,
        heads=self.heads,
        attn_dropout=self.dropout,
        ff_dropout=self.dropout,
        rotary_pos_emb=True
    )
),
    pad_value=[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]           # For example
).to(utils.get_device())

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:1
  • Comments:14 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
wingedsheepcommented, Dec 14, 2021

It works! I trained a model on the midi of “Californication” converted to encodings by the auto-encoder, and it actually learns to generate the song 😄 I’m curious how it will do when trained on more data.

1reaction
lucidrainscommented, Dec 13, 2021

very cool! the faceformer paper has some interesting twists on the ALiBi encoding too to make their deep net work. maybe you can draw some inspiration from their architecture if you get stuck

Read more comments on GitHub >

github_iconTop Results From Across the Web

Summary of the models - Hugging Face
An autoregressive transformer model with lots of tricks to reduce memory footprint and compute time. Those tricks include: Use Axial position encoding (see ......
Read more >
Combining Bidirectional and Auto-Regressive Transformers
Slides: https://sebastianraschka.com/pdf/lecture-notes/stat453ss21/L19_seq2seq_rnn-transformers__slides.pdf0:00 Introduction0:33 BART.
Read more >
The Image Local Autoregressive Transformer
Empirically, we introduce several locally guidance tasks, including pose-guided image generation and face editing tasks; and extensive experiments are conducted ...
Read more >
Issues - x-transformers - lucidrains - Geeks
Can the continous transformer autoregressive wrapper help with pre-training on ... Is it possible to use AutoregressiveWrapper in combination with ...
Read more >
A Pyramid Semi-Autoregressive Transformer with Rich ... - MDPI
For the probability distribution obtained during inference, we use greedy search to take the possibility of the maximum value from probability ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found