Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

DeepSpeed and nn.Embedding issue

See original GitHub issue

Hi Lucidrains First of all thanks for the contribution. You are doing an awesome job here.

I’m trying to implement the Seq2Seq model using DeepSpeed since I will have 32k seq_len as input. This is my code: ` CODE:

 class GenomeToMolDataset(Dataset):
    def __init__(self, data, src_lang, trg_lang):
        super().__init__()
        self.data = data
        self.src_lang = src_lang
        self.trg_lang = trg_lang

    def __getitem__(self, index):
        #print(index)
        pair = self.data[index]
        #print('src:',pair[0])
        #print('\n\ntrg:',pair[1])
        src = torch.tensor(indexesFromSentence(self.src_lang,pair[0]))
        trg = torch.tensor(indexesFromSentence(self.trg_lang,pair[1]))
        print('src:', src)
        print('trg:', trg)
        return src,trg

    def __len__(self):
        return len(self.data)

train_dataset = GenomeToMolDataset(tr_pairs, input_lang, target_lang)
test_dataset = GenomeToMolDataset(ts_pairs, input_lang, target_lang)

encoder = ReformerLM(
    num_tokens = input_lang.n_words,
    emb_dim = emb_dim,#128,
    dim = dim,#512,
    bucket_size = bucket_size, # 16,
    depth = depth, # 6,
    heads = heads, # 8,
    n_hashes= n_hashes,
    max_seq_len = VIR_SEQ_LEN,
    ff_chunks = ff_chunks, #400,      # number of chunks for feedforward layer, make higher if there are memory issues
    attn_chunks = attn_chunks, #16,    # process lsh attention in chunks, only way for memory to fit when scaling to 16k tokens
    #weight_tie = True,
    fixed_position_emb = True,
    return_embeddings = True # return output of last attention layer
).cuda()

decoder = ReformerLM(
    num_tokens = target_lang.n_words,
    emb_dim = emb_dim, # 128,
    dim = dim, # 512,
    bucket_size = bucket_size, #16,
    depth = depth, #6,
    heads = heads, #8,
    n_hashes= n_hashes,
    ff_chunks = ff_chunks, # 400,      # number of chunks for feedforward layer, make higher if there are memory issues
    attn_chunks = attn_chunks, # 16,    # process lsh attention in chunks, only way for memory to fit when scaling to 16k tokens
    max_seq_len = MOL_SEQ_LEN,
    fixed_position_emb = True,
    causal = True
).cuda()

encoder_optimizer = RangerLars(encoder.parameters()) # torch.optim.Adam(encoder.parameters(), lr=learning_rate)
decoder_optimizer = RangerLars(decoder.parameters()) # torch.optim.Adam(decoder.parameters(), lr=learning_rate)

if use_apex:
    encoder, encoder_optimizer = amp.initialize(encoder, encoder_optimizer, opt_level='O1')
    decoder, decoder_optimizer = amp.initialize(decoder, decoder_optimizer, opt_level='O1')

encoder = TrainingWrapper(encoder).cuda()
#encoder.cuda()

decoder = TrainingWrapper(decoder).cuda()
#decoder.cuda()

encoder_params = filter(lambda p: p.requires_grad, encoder.parameters())
decoder_params = filter(lambda p: p.requires_grad, decoder.parameters())

encoder_engine, encoder_optimizer, trainloader, _ = deepspeed.initialize(args=cmd_args, model=encoder, optimizer=encoder_optimizer, model_parameters=encoder_params, training_data=train_dataset, dist_init_required=True)
decoder_engine, decoder_optimizer, _, _ = deepspeed.initialize(args=cmd_args, model=decoder, optimizer=decoder_optimizer, model_parameters=encoder_params, dist_init_required=False)

# training
VALIDATE_EVERY = 1
SAVE_EVERY = 10
SAVE_DIR = './saved_model/'
_, encoder_client_sd = encoder_engine.load_checkpoint(SAVE_DIR+'encoder/', None)
_, decoder_client_sd = decoder_engine.load_checkpoint(SAVE_DIR+'decoder/', None) #args.ckpt_id 
for i, pair in enumerate(trainloader):
    src = pair[0]
    trg = pair[1]
    encoder_engine.train()
    decoder_engine.train()
    src = src.to(encoder_engine.local_rank)
    trg = trg.to(decoder_engine.local_rank)
    
    print(src.shape)
    print(src.dtype)
    print(trg.shape)
    print(trg.dtype)

    enc_keys = encoder_engine(src)
    loss = decoder_engine(trg, keys = enc_keys, return_loss = True)   # (1, 4096, 20000)
    encoder_engine.backward(loss)
    decoder_engine.backward(loss)
    encoder_engine.step()
    decoder_engine.step()
    print('Training Loss:',loss.item())       

    if i % VALIDATE_EVERY == 0:
        encoder.eval()
        decoder.eval()
        with torch.no_grad():
            ts_src,ts_trg = random.choice(test_dataset)[:-1]
            enc_keys = encoder(ts_src.to(device))
            loss = decoder(ts_trg, keys=enc_keys, return_loss = True)
            print(f'\tValidation Loss: {loss.item()}')

    if i % SAVE_EVERY:
        encoder_client_sd['step'] = i
        decoder_client_sd['step'] = i
        ckpt_id = loss.item()
        encoder_engine.save_checkpoint(SAVE_DIR+'encoder/', ckpt_id, client_sd = encoder_client_sd)
        decoder_engine.save_checkpoint(SAVE_DIR+'decoder/', ckpt_id, client_sd = decoder_client_sd)`

The issue I’m having is with the nn.Embedding Layer since it wants Long integer as input but DeepSpeed works only with Floats. And it prompts this error: RuntimeError: expected device cuda:0 and dtype Float but got device cuda:0 and dtype Long

If I cast to float the inputs, then the Embedding layer will prompt the vice versa error.

How can I use your ReformerLM as Encoder-Decoder with DeepSpeed in this case? Is there any way I can workaround the Embedding issue?

Thank you, Cal

Issue Analytics

State:
Created 4 years ago
Comments:7 (3 by maintainers)

Top GitHub Comments

1reaction

CalogeroZarbocommented, Mar 11, 2020

@lucidrains this is the repo for the virus project: https://github.com/CalogeroZarbo/bioshield

I checked the new version of the library with the positional embedding and it works like a charm. Thank you for the fix!

1reaction

lucidrainscommented, Mar 11, 2020

@CalogeroZarbo Thank you for the trace! I believe you caught a bug with my sinusoidal positional encoding implementation, and it has been fixed in the latest version (I hope, please let me know).

That doesn’t sound silly at all, and I think we are largely on the same page. Research is trickling in that attention may work well for chemicals and molecules. There’s a lot left to explore. https://arxiv.org/abs/2002.08264 and https://twitter.com/EricTopol/status/1229150936028733440?s=19

Please share the database if you can! I would love to get involved. I played around with SMILES myself and have a generative model for chemicals up at https://thischemicaldoesnotexist.com using Reformer.

Finally, as a fellow practitioner, I’ve been thinking about how deep learning can be applied to this crisis. Evidence shows that deep learning can greatly speed up simulations (https://arxiv.org/abs/2001.08055), and I was wondering if perhaps it will be fruitful to train a differentiable docking function, perhaps specific to the Spike protein of Covid. Such a module could eventually be used in some end-to-end pipeline for evaluating candidates? Anyways, I am much an amateur in this arena, but those are my thoughts.

Top Results From Across the Web

DeepSpeed Configuration JSON

Enable sparse compression of torch.nn.Embedding gradients. This feature is essentially deprecated as we don't see use cases for it as much anymore.

Source code for deepspeed.runtime.pipe.module

Module): raise RuntimeError('LayerSpec only supports torch.nn. ... This is a problem in DeepSpeed because we often allocate tensors using slices of large ...

transformers.modeling_utils — transformers 4.11.3 documentation

Returns: :obj:`nn.Module`: A torch module mapping hidden states to vocabulary. """ return None # Overwrite for models with output embeddings.

benchmark assessment for deepspeed optimization library

deal with DL complexity and efficiency issues. ... Keywords Machine Learning · Neural Networks · Deep Learning Models · Optimization Models.

revlib - PyPI

Simple and efficient RevNet-Library with DeepSpeed support. ... AbsolutePositionalEmbedding import revlib class Reformer(torch.nn.