stats.parquet file for ESM Metagenomic Atlas has duplicate entries
See original GitHub issueHere is an example of the duplicate entries in the stats.parquet file with the same MGnify id. The are the same except for differing ptm values. It seems there should not be duplicates since I believe each MGnify id has only one structure prediction in the atlas.
id ptm plddt num_conf len
521608462 MGYP000000011531 0.679 0.842 28 33
577478634 MGYP000000011531 0.490 0.842 28 33
577528622 MGYP000000011531 0.622 0.842 28 33
This is the stats.parquet file I used is
https://dl.fbaipublicfiles.com/esmatlas/v0/stats.parquet
The link to the stats.parquet file
https://dl.fbaipublicfiles.com/esmatlas/v0.0/stats.parquet
on the Atlas API web page
https://esmatlas.com/about#api
is broken, gives an Access Denied error.
Issue Analytics
- State:
- Created 10 months ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
Analyzing Parquet Metadata and Statistics with PyArrow
This blog post shows you how to create a Parquet file with PyArrow and review the metadata that contains important information like the ......
Read more >ATLAS: a Snakemake workflow for assembly, annotation, and ...
ATLAS is written in Python and the workflow implemented in Snakemake; ... is used remove PCR duplicates and compress the raw data files, ......
Read more >Metagenome-Atlas - GitHub
Metagenome -atlas is a easy-to-use metagenomic pipeline based on snakemake. It handles all steps from QC, Assembly, Binning, to Annotation.
Read more >Untitled
Used fpl trucks, Gunde aagi pothaande lyrics, Working haydon building corp, Les meilleurs plats espagnols, Moeritherium bbc, Def jam records members.
Read more >Registry of Open Data on AWS
The Cancer Genome Atlas (TCGA), a collaboration between the National Cancer Institute (NCI) and National Human Genome Research Institute (NHGRI), aims to ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks! My main interest in stats.parquet was to get the list of MGnify identifiers for the database so that I could create a file of all the sequences for searching. You provided the sequence file in #366 which has solved that problem. But I may still uses stats.parquet to do filtering my model scores.
Ah yes happy we have the right data in place now! 😃