question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DeepSpeed and nn.Embedding issue

See original GitHub issue

Hi Lucidrains First of all thanks for the contribution. You are doing an awesome job here.

I’m trying to implement the Seq2Seq model using DeepSpeed since I will have 32k seq_len as input. This is my code: ` CODE:

 class GenomeToMolDataset(Dataset):
    def __init__(self, data, src_lang, trg_lang):
        super().__init__()
        self.data = data
        self.src_lang = src_lang
        self.trg_lang = trg_lang

    def __getitem__(self, index):
        #print(index)
        pair = self.data[index]
        #print('src:',pair[0])
        #print('\n\ntrg:',pair[1])
        src = torch.tensor(indexesFromSentence(self.src_lang,pair[0]))
        trg = torch.tensor(indexesFromSentence(self.trg_lang,pair[1]))
        print('src:', src)
        print('trg:', trg)
        return src,trg

    def __len__(self):
        return len(self.data)

train_dataset = GenomeToMolDataset(tr_pairs, input_lang, target_lang)
test_dataset = GenomeToMolDataset(ts_pairs, input_lang, target_lang)

encoder = ReformerLM(
    num_tokens = input_lang.n_words,
    emb_dim = emb_dim,#128,
    dim = dim,#512,
    bucket_size = bucket_size, # 16,
    depth = depth, # 6,
    heads = heads, # 8,
    n_hashes= n_hashes,
    max_seq_len = VIR_SEQ_LEN,
    ff_chunks = ff_chunks, #400,      # number of chunks for feedforward layer, make higher if there are memory issues
    attn_chunks = attn_chunks, #16,    # process lsh attention in chunks, only way for memory to fit when scaling to 16k tokens
    #weight_tie = True,
    fixed_position_emb = True,
    return_embeddings = True # return output of last attention layer
).cuda()

decoder = ReformerLM(
    num_tokens = target_lang.n_words,
    emb_dim = emb_dim, # 128,
    dim = dim, # 512,
    bucket_size = bucket_size, #16,
    depth = depth, #6,
    heads = heads, #8,
    n_hashes= n_hashes,
    ff_chunks = ff_chunks, # 400,      # number of chunks for feedforward layer, make higher if there are memory issues
    attn_chunks = attn_chunks, # 16,    # process lsh attention in chunks, only way for memory to fit when scaling to 16k tokens
    max_seq_len = MOL_SEQ_LEN,
    fixed_position_emb = True,
    causal = True
).cuda()

encoder_optimizer = RangerLars(encoder.parameters()) # torch.optim.Adam(encoder.parameters(), lr=learning_rate)
decoder_optimizer = RangerLars(decoder.parameters()) # torch.optim.Adam(decoder.parameters(), lr=learning_rate)

if use_apex:
    encoder, encoder_optimizer = amp.initialize(encoder, encoder_optimizer, opt_level='O1')
    decoder, decoder_optimizer = amp.initialize(decoder, decoder_optimizer, opt_level='O1')

encoder = TrainingWrapper(encoder).cuda()
#encoder.cuda()

decoder = TrainingWrapper(decoder).cuda()
#decoder.cuda()

encoder_params = filter(lambda p: p.requires_grad, encoder.parameters())
decoder_params = filter(lambda p: p.requires_grad, decoder.parameters())

encoder_engine, encoder_optimizer, trainloader, _ = deepspeed.initialize(args=cmd_args, model=encoder, optimizer=encoder_optimizer, model_parameters=encoder_params, training_data=train_dataset, dist_init_required=True)
decoder_engine, decoder_optimizer, _, _ = deepspeed.initialize(args=cmd_args, model=decoder, optimizer=decoder_optimizer, model_parameters=encoder_params, dist_init_required=False)

# training
VALIDATE_EVERY = 1
SAVE_EVERY = 10
SAVE_DIR = './saved_model/'
_, encoder_client_sd = encoder_engine.load_checkpoint(SAVE_DIR+'encoder/', None)
_, decoder_client_sd = decoder_engine.load_checkpoint(SAVE_DIR+'decoder/', None) #args.ckpt_id 
for i, pair in enumerate(trainloader):
    src = pair[0]
    trg = pair[1]
    encoder_engine.train()
    decoder_engine.train()
    src = src.to(encoder_engine.local_rank)
    trg = trg.to(decoder_engine.local_rank)
    
    print(src.shape)
    print(src.dtype)
    print(trg.shape)
    print(trg.dtype)

    enc_keys = encoder_engine(src)
    loss = decoder_engine(trg, keys = enc_keys, return_loss = True)   # (1, 4096, 20000)
    encoder_engine.backward(loss)
    decoder_engine.backward(loss)
    encoder_engine.step()
    decoder_engine.step()
    print('Training Loss:',loss.item())       

    if i % VALIDATE_EVERY == 0:
        encoder.eval()
        decoder.eval()
        with torch.no_grad():
            ts_src,ts_trg = random.choice(test_dataset)[:-1]
            enc_keys = encoder(ts_src.to(device))
            loss = decoder(ts_trg, keys=enc_keys, return_loss = True)
            print(f'\tValidation Loss: {loss.item()}')

    if i % SAVE_EVERY:
        encoder_client_sd['step'] = i
        decoder_client_sd['step'] = i
        ckpt_id = loss.item()
        encoder_engine.save_checkpoint(SAVE_DIR+'encoder/', ckpt_id, client_sd = encoder_client_sd)
        decoder_engine.save_checkpoint(SAVE_DIR+'decoder/', ckpt_id, client_sd = decoder_client_sd)`

The issue I’m having is with the nn.Embedding Layer since it wants Long integer as input but DeepSpeed works only with Floats. And it prompts this error: RuntimeError: expected device cuda:0 and dtype Float but got device cuda:0 and dtype Long

If I cast to float the inputs, then the Embedding layer will prompt the vice versa error.

How can I use your ReformerLM as Encoder-Decoder with DeepSpeed in this case? Is there any way I can workaround the Embedding issue?

Thank you, Cal

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
CalogeroZarbocommented, Mar 11, 2020

@lucidrains this is the repo for the virus project: https://github.com/CalogeroZarbo/bioshield

I checked the new version of the library with the positional embedding and it works like a charm. Thank you for the fix!

1reaction
lucidrainscommented, Mar 11, 2020

@CalogeroZarbo Thank you for the trace! I believe you caught a bug with my sinusoidal positional encoding implementation, and it has been fixed in the latest version (I hope, please let me know).

That doesn’t sound silly at all, and I think we are largely on the same page. Research is trickling in that attention may work well for chemicals and molecules. There’s a lot left to explore. https://arxiv.org/abs/2002.08264 and https://twitter.com/EricTopol/status/1229150936028733440?s=19

Please share the database if you can! I would love to get involved. I played around with SMILES myself and have a generative model for chemicals up at https://thischemicaldoesnotexist.com using Reformer.

Finally, as a fellow practitioner, I’ve been thinking about how deep learning can be applied to this crisis. Evidence shows that deep learning can greatly speed up simulations (https://arxiv.org/abs/2001.08055), and I was wondering if perhaps it will be fruitful to train a differentiable docking function, perhaps specific to the Spike protein of Covid. Such a module could eventually be used in some end-to-end pipeline for evaluating candidates? Anyways, I am much an amateur in this arena, but those are my thoughts.

Read more comments on GitHub >

github_iconTop Results From Across the Web

DeepSpeed Configuration JSON
Enable sparse compression of torch.nn.Embedding gradients. This feature is essentially deprecated as we don't see use cases for it as much anymore.
Read more >
Source code for deepspeed.runtime.pipe.module
Module): raise RuntimeError('LayerSpec only supports torch.nn. ... This is a problem in DeepSpeed because we often allocate tensors using slices of large ...
Read more >
transformers.modeling_utils — transformers 4.11.3 documentation
Returns: :obj:`nn.Module`: A torch module mapping hidden states to vocabulary. """ return None # Overwrite for models with output embeddings.
Read more >
benchmark assessment for deepspeed optimization library
deal with DL complexity and efficiency issues. ... Keywords Machine Learning · Neural Networks · Deep Learning Models · Optimization Models.
Read more >
revlib - PyPI
Simple and efficient RevNet-Library with DeepSpeed support. ... AbsolutePositionalEmbedding import revlib class Reformer(torch.nn.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found