Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Is it possible to use AutoregressiveWrapper in combination with ContinuousTransformerWrapper

See original GitHub issue

Can I combine the AutoregressiveWrapper with the ContinuousTransformerWrapper? ignore_index and pad_value are scalars and should be tensors here I believe.

Should I create a custom ContinuousAutoregressiveWrapper for this?

I’m trying to use it like this:

model = AutoregressiveWrapper(ContinuousTransformerWrapper(
    max_seq_len=self.max_sequence_length,
    dim_in=self.vector_dimension,
    dim_out=self.vector_dimension,
    emb_dim=self.embedding_dimension,
    use_pos_emb=True,
    attn_layers=Decoder(
        dim=self.embedding_dimension,
        depth=self.depth,
        heads=self.heads,
        attn_dropout=self.dropout,
        ff_dropout=self.dropout,
        rotary_pos_emb=True
    )
),
    pad_value=[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]           # For example
).to(utils.get_device())

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:14 (8 by maintainers)

Top GitHub Comments

1reaction

wingedsheepcommented, Dec 14, 2021

It works! I trained a model on the midi of “Californication” converted to encodings by the auto-encoder, and it actually learns to generate the song 😄 I’m curious how it will do when trained on more data.

1reaction

lucidrainscommented, Dec 13, 2021

very cool! the faceformer paper has some interesting twists on the ALiBi encoding too to make their deep net work. maybe you can draw some inspiration from their architecture if you get stuck

Top Results From Across the Web

Summary of the models - Hugging Face

An autoregressive transformer model with lots of tricks to reduce memory footprint and compute time. Those tricks include: Use Axial position encoding (see ......

Combining Bidirectional and Auto-Regressive Transformers

Slides: https://sebastianraschka.com/pdf/lecture-notes/stat453ss21/L19_seq2seq_rnn-transformers__slides.pdf0:00 Introduction0:33 BART.

The Image Local Autoregressive Transformer

Empirically, we introduce several locally guidance tasks, including pose-guided image generation and face editing tasks; and extensive experiments are conducted ...

Issues - x-transformers - lucidrains - Geeks

Can the continous transformer autoregressive wrapper help with pre-training on ... Is it possible to use AutoregressiveWrapper in combination with ...

A Pyramid Semi-Autoregressive Transformer with Rich ... - MDPI

For the probability distribution obtained during inference, we use greedy search to take the possibility of the maximum value from probability ...