question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Asian training dataset(from glint) discussion.

See original GitHub issue
  1. Download dataset from http://trillionpairs.deepglint.com/data (after signup). msra is a cleaned subset of MS1M from glint while celebrity is the asian dataset.
  2. Generate lst file by calling src/data/glint2lst.py. For example:
python glint2lst.py /data/glint_data msra,celebrity > glint.lst

or generate the asian dataset only by:

python glint2lst.py /data/glint_data celebrity > glint_cn.lst
  1. Call face2rec2.py to generate .rec file.
  2. Merge the dataset with existing one by calling src/data/dataset_merge.py without setting param model which will combine all IDs from those two datasets.

Finally you will get a dataset contains about 180K IDs.

Use src/eval/gen_glint.py to prepare test feature file by using pretrained insightface model.

You can also post your private testing results here.

Issue Analytics

  • State:open
  • Created 5 years ago
  • Reactions:86
  • Comments:160

github_iconTop GitHub Comments

135reactions
meanmeecommented, Jun 19, 2018

有格林的人在吗,我是楼下比特大陆的,下载速度太慢了,可以直接去楼上直接拷贝吗?

31reactions
JianbangZcommented, Jun 19, 2018

After test, this dataset is pretty clean, but still containing 0.3%~0.8% noise. Also, we found their ms1m and Asian parts still have about 15-30 overlaps, though I guess it doesn’t matter when the scale is already so large. Another findings is that this dataset suffers long tail a lot. Take the asian part for example, only 18K identites out of 10K have over 25 images per class, and only few thousand identities have over 60 images.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Trillionpairs
Register and download our training and testing data sets. ... This dataset has been excluded from both LFW and Asian-Celeb. Asian-Celeb 93,979 ids/2,830,146 ......
Read more >
Highlights from Glint's Asia-Pacific People Success Summit
The Asia-Pacific Glint People Success Summit took a deep dive into what the most recent data is telling us about people's happiness at...
Read more >
Glint360K Dataset - Papers With Code
The largest and cleanest face recognition dataset Glint360K, which contains 17,091,657 images of 360,232 individuals, baseline models trained on Glint360K ...
Read more >
GLINT- Everything you need to know - LinkedIn
Many organizations rely on annual or semi-annual surveys to gather employee engagement data. This process typically requires a significant ...
Read more >
Discussion and conclusions - Metformin in non-diabetic ...
The original GLINT proposal was based upon modelled CVD risk using data from the EPIC-Norfolk study. The feasibility study demonstrated that the CVD...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found