question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

stats.parquet file for ESM Metagenomic Atlas has duplicate entries

See original GitHub issue

Here is an example of the duplicate entries in the stats.parquet file with the same MGnify id. The are the same except for differing ptm values. It seems there should not be duplicates since I believe each MGnify id has only one structure prediction in the atlas.

                         id    ptm  plddt  num_conf  len                                                                                 
521608462  MGYP000000011531  0.679  0.842        28   33                                                                                 
577478634  MGYP000000011531  0.490  0.842        28   33                                                                                 
577528622  MGYP000000011531  0.622  0.842        28   33  

This is the stats.parquet file I used is

https://dl.fbaipublicfiles.com/esmatlas/v0/stats.parquet

The link to the stats.parquet file

https://dl.fbaipublicfiles.com/esmatlas/v0.0/stats.parquet

on the Atlas API web page

https://esmatlas.com/about#api

is broken, gives an Access Denied error.

Issue Analytics

  • State:closed
  • Created 10 months ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
tomgoddardcommented, Nov 22, 2022

Thanks! My main interest in stats.parquet was to get the list of MGnify identifiers for the database so that I could create a file of all the sequences for searching. You provided the sequence file in #366 which has solved that problem. But I may still uses stats.parquet to do filtering my model scores.

0reactions
tomsercucommented, Nov 23, 2022

Ah yes happy we have the right data in place now! 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

Analyzing Parquet Metadata and Statistics with PyArrow
This blog post shows you how to create a Parquet file with PyArrow and review the metadata that contains important information like the ......
Read more >
ATLAS: a Snakemake workflow for assembly, annotation, and ...
ATLAS is written in Python and the workflow implemented in Snakemake; ... is used remove PCR duplicates and compress the raw data files, ......
Read more >
Metagenome-Atlas - GitHub
Metagenome -atlas is a easy-to-use metagenomic pipeline based on snakemake. It handles all steps from QC, Assembly, Binning, to Annotation.
Read more >
Untitled
Used fpl trucks, Gunde aagi pothaande lyrics, Working haydon building corp, Les meilleurs plats espagnols, Moeritherium bbc, Def jam records members.
Read more >
Registry of Open Data on AWS
The Cancer Genome Atlas (TCGA), a collaboration between the National Cancer Institute (NCI) and National Human Genome Research Institute (NHGRI), aims to ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found