question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

OpenAIVae implementation

See original GitHub issue

do I see it correctly that the code fragments provided by OpenAI and the way you binded it in the vae.py file means that there is no actual codebook in the form of an explicit nn.Paramter or nn.Embeddings but the very first layer of the decoder serves as the vocabulary?

(decoder): Decoder(
    (blocks): Sequential(
      (input): Conv2d(n_in=8192, n_out=128, kw=1, use_float16=False, device=device(type='cpu'), requires_grad=False)

this would explain why I couldnt finy any oO

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:27 (4 by maintainers)

github_iconTop GitHub Comments

3reactions
sidmlcommented, May 6, 2021

@CDitzel In pytorch implementation, they seem to be directly adding the logits to the sample from gumbel distribution.

I believe they divide the logits by temperature before sampling from categorical distribution in Figure 1 of the paper.

3reactions
CDitzelcommented, Apr 1, 2021

all right, so this is what I have come up with so far. It closely resembles Lucids implementations but parameterizes the gumbel softmax with the distance of the encoder output (logits) to the codebook vectors (described in this paper) and akin to VQ-VAEs, but in contrast to Lucids implementation which uses the logits directly as input to the Gumbel. Phils (and Karpathys implementation) never worked for me when I rightfully included the KL loss i.e. a kl loss > 0. With this implementation the KL loss can be included as it should with a uniform prior. However, the results on a larger data set are still underwhelming and not really satisfying in terms of reconstruction quality. Maybe someone can take a look at it and assess the correctness of this implementation?

class SoftDiscretizer(nn.Module):
    def __init__(
        self,
        nTokens,
        dTokens,
        temperature,
        kl_weight,
        **kwargs,
    ):
        super().__init__()
        self.nTokens = nTokens
        self.dTokens = dTokens
        self.kl_weight = kl_weight

        self.embedding = nn.Embedding(nTokens, dTokens)

    def forward(self, z):
        B, C, H, W = z.size()
        N, D = self.embedding.weight.shape

        z_flat = rearrange(z, "b c h w -> (b h w) c")
        distances = (
            torch.sum(self.embedding.weight ** 2, dim=1)
            + torch.sum(z_flat ** 2, dim=1, keepdim=True)
            - 2 * torch.matmul(z_flat, self.embedding.weight.t())
        )
        distances = rearrange(distances, "(b h w) n -> b h w n", h=H, w=W)

        # minus so that closer codebook vectors have higher probability?
        samples = F.gumbel_softmax(-distances, temperature=0.5, hard=False, dim=-1)

        if not self.training:
            tokens = samples.argmax(dim=-1)
            return tokens.flatten(start_dim=1)

        z_q = einsum("b h w n, n d -> b d h w", samples, self.embedding.weight)

        # KL loss
        logits = F.log_softmax(-distances, dim=-1)
        probs = torch.exp(logits)  # supposed to be numerically more stable than softmax alone
        neg_entropy = torch.sum(probs * (logits + math.log(self.nTokens)), dim=(1, 2, 3))
        kl_loss = self.kl_weight * torch.mean(neg_entropy)

        return z_q, kl_loss
Read more comments on GitHub >

github_iconTop Results From Across the Web

OpenAI's ChatGPT shows why implementation is key with ...
OpenAI has already made waves with its DALL-E image generation technology, and its GPT series has drawn attention with each successive release ( ......
Read more >
lucidrains/DALLE-pytorch - [REPO]@Telematika
Implementation / replication of DALL-E (paper), OpenAI's Text to Image Transformer, in Pytorch. It will also contain CLIP for ranking the ...
Read more >
github.com-lucidrains-DALLE-pytorch_-_2021-04-19_12-38-39
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, ... vae = OpenAIDiscreteVAE() # loads pretrained OpenAI VAE.
Read more >
DALL-E in Pytorch download | SourceForge.net
Implementation / replication of DALL-E, OpenAI's Text to Image. ... In contrast to OpenAI's VAE, it also has an extra layer of downsampling, ......
Read more >
365 Papers - Jordan Matelsky
BB-Graph is a new subisomorphism search algorithm that improves upon GraphQL and Cypher implementations. [ Read More ].
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found