Asian training dataset(from glint) discussion.
See original GitHub issue- Download dataset from http://trillionpairs.deepglint.com/data (after signup).
msra
is a cleaned subset of MS1M from glint whilecelebrity
is the asian dataset. - Generate lst file by calling
src/data/glint2lst.py
. For example:
python glint2lst.py /data/glint_data msra,celebrity > glint.lst
or generate the asian dataset only by:
python glint2lst.py /data/glint_data celebrity > glint_cn.lst
- Call face2rec2.py to generate .rec file.
- Merge the dataset with existing one by calling
src/data/dataset_merge.py
without setting param model which will combine all IDs from those two datasets.
Finally you will get a dataset contains about 180K IDs.
Use src/eval/gen_glint.py
to prepare test feature file by using pretrained insightface model.
You can also post your private testing results here.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:86
- Comments:160
Top Results From Across the Web
Trillionpairs
Register and download our training and testing data sets. ... This dataset has been excluded from both LFW and Asian-Celeb. Asian-Celeb 93,979 ids/2,830,146 ......
Read more >Highlights from Glint's Asia-Pacific People Success Summit
The Asia-Pacific Glint People Success Summit took a deep dive into what the most recent data is telling us about people's happiness at...
Read more >Glint360K Dataset - Papers With Code
The largest and cleanest face recognition dataset Glint360K, which contains 17,091,657 images of 360,232 individuals, baseline models trained on Glint360K ...
Read more >GLINT- Everything you need to know - LinkedIn
Many organizations rely on annual or semi-annual surveys to gather employee engagement data. This process typically requires a significant ...
Read more >Discussion and conclusions - Metformin in non-diabetic ...
The original GLINT proposal was based upon modelled CVD risk using data from the EPIC-Norfolk study. The feasibility study demonstrated that the CVD...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
有格林的人在吗,我是楼下比特大陆的,下载速度太慢了,可以直接去楼上直接拷贝吗?
After test, this dataset is pretty clean, but still containing 0.3%~0.8% noise. Also, we found their ms1m and Asian parts still have about 15-30 overlaps, though I guess it doesn’t matter when the scale is already so large. Another findings is that this dataset suffers long tail a lot. Take the asian part for example, only 18K identites out of 10K have over 25 images per class, and only few thousand identities have over 60 images.