Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

OpenAIVae implementation

See original GitHub issue

do I see it correctly that the code fragments provided by OpenAI and the way you binded it in the vae.py file means that there is no actual codebook in the form of an explicit nn.Paramter or nn.Embeddings but the very first layer of the decoder serves as the vocabulary?

(decoder): Decoder(
    (blocks): Sequential(
      (input): Conv2d(n_in=8192, n_out=128, kw=1, use_float16=False, device=device(type='cpu'), requires_grad=False)

this would explain why I couldnt finy any oO

Issue Analytics

State:
Created 3 years ago
Comments:27 (4 by maintainers)

Top GitHub Comments

3reactions

sidmlcommented, May 6, 2021

@CDitzel In pytorch implementation, they seem to be directly adding the logits to the sample from gumbel distribution.

I believe they divide the logits by temperature before sampling from categorical distribution in Figure 1 of the paper.

3reactions

CDitzelcommented, Apr 1, 2021

all right, so this is what I have come up with so far. It closely resembles Lucids implementations but parameterizes the gumbel softmax with the distance of the encoder output (logits) to the codebook vectors (described in this paper) and akin to VQ-VAEs, but in contrast to Lucids implementation which uses the logits directly as input to the Gumbel. Phils (and Karpathys implementation) never worked for me when I rightfully included the KL loss i.e. a kl loss > 0. With this implementation the KL loss can be included as it should with a uniform prior. However, the results on a larger data set are still underwhelming and not really satisfying in terms of reconstruction quality. Maybe someone can take a look at it and assess the correctness of this implementation?

class SoftDiscretizer(nn.Module):
    def __init__(
        self,
        nTokens,
        dTokens,
        temperature,
        kl_weight,
        **kwargs,
    ):
        super().__init__()
        self.nTokens = nTokens
        self.dTokens = dTokens
        self.kl_weight = kl_weight

        self.embedding = nn.Embedding(nTokens, dTokens)

    def forward(self, z):
        B, C, H, W = z.size()
        N, D = self.embedding.weight.shape

        z_flat = rearrange(z, "b c h w -> (b h w) c")
        distances = (
            torch.sum(self.embedding.weight ** 2, dim=1)
            + torch.sum(z_flat ** 2, dim=1, keepdim=True)
            - 2 * torch.matmul(z_flat, self.embedding.weight.t())
        )
        distances = rearrange(distances, "(b h w) n -> b h w n", h=H, w=W)

        # minus so that closer codebook vectors have higher probability?
        samples = F.gumbel_softmax(-distances, temperature=0.5, hard=False, dim=-1)

        if not self.training:
            tokens = samples.argmax(dim=-1)
            return tokens.flatten(start_dim=1)

        z_q = einsum("b h w n, n d -> b d h w", samples, self.embedding.weight)

        # KL loss
        logits = F.log_softmax(-distances, dim=-1)
        probs = torch.exp(logits)  # supposed to be numerically more stable than softmax alone
        neg_entropy = torch.sum(probs * (logits + math.log(self.nTokens)), dim=(1, 2, 3))
        kl_loss = self.kl_weight * torch.mean(neg_entropy)

        return z_q, kl_loss

Top Results From Across the Web

OpenAI's ChatGPT shows why implementation is key with ...

OpenAI has already made waves with its DALL-E image generation technology, and its GPT series has drawn attention with each successive release ( ......

lucidrains/DALLE-pytorch - [REPO]@Telematika

Implementation / replication of DALL-E (paper), OpenAI's Text to Image Transformer, in Pytorch. It will also contain CLIP for ranking the ...

github.com-lucidrains-DALLE-pytorch_-_2021-04-19_12-38-39

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, ... vae = OpenAIDiscreteVAE() # loads pretrained OpenAI VAE.

DALL-E in Pytorch download | SourceForge.net

Implementation / replication of DALL-E, OpenAI's Text to Image. ... In contrast to OpenAI's VAE, it also has an extra layer of downsampling, ......

365 Papers - Jordan Matelsky

BB-Graph is a new subisomorphism search algorithm that improves upon GraphQL and Cypher implementations. [ Read More ].