Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MSA tensor format

See original GitHub issue

In the Usage, there is code like seq = torch.randint(0, 21, (1, 128)).cuda() msa = torch.randint(0, 21, (1, 5, 64)).cuda(). If I have a a3m msa file, how to encode the file to this tensor? And why the seq length is 128 but the msa is 5 times 64 (5 timeshalf the length of seq?). Could you give an example of how to use that or how to generate that msa tensor?

Issue Analytics

State:
Created 2 years ago
Reactions:1
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

panganqicommented, Apr 1, 2021

@panganqi are you working with templates by any chance?

No, I’m working with Free Modelling mode

1reaction

panganqicommented, Apr 1, 2021

@panganqi Hi! I just wanted to demonstrate that the MSA and the primary sequence does not have to be the same length (although they would probably be aligned in practice)

The framework is in a good enough place that I’ll start thinking about how to tackle data preprocessing! (I’d like to make it as seamless and easy as possible) How is the data laid out in your directory at the moment?

I use the combined sidechainnet data which does not contain the MSA and we run hhblits on CASP data to get the MSA files. I want to combine those two to be a new dataset. And the MSA and the primary sequence are of the same length