Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Asian training dataset(from glint) discussion.

See original GitHub issue

Download dataset from http://trillionpairs.deepglint.com/data (after signup). msra is a cleaned subset of MS1M from glint while celebrity is the asian dataset.
Generate lst file by calling src/data/glint2lst.py. For example:

python glint2lst.py /data/glint_data msra,celebrity > glint.lst

or generate the asian dataset only by:

python glint2lst.py /data/glint_data celebrity > glint_cn.lst

Call face2rec2.py to generate .rec file.
Merge the dataset with existing one by calling src/data/dataset_merge.py without setting param model which will combine all IDs from those two datasets.

Finally you will get a dataset contains about 180K IDs.

Use src/eval/gen_glint.py to prepare test feature file by using pretrained insightface model.

You can also post your private testing results here.

Issue Analytics

State:
Created 5 years ago
Reactions:86
Comments:160

Top GitHub Comments

135reactions

meanmeecommented, Jun 19, 2018

有格林的人在吗，我是楼下比特大陆的，下载速度太慢了，可以直接去楼上直接拷贝吗？

31reactions

JianbangZcommented, Jun 19, 2018

After test, this dataset is pretty clean, but still containing 0.3%~0.8% noise. Also, we found their ms1m and Asian parts still have about 15-30 overlaps, though I guess it doesn’t matter when the scale is already so large. Another findings is that this dataset suffers long tail a lot. Take the asian part for example, only 18K identites out of 10K have over 25 images per class, and only few thousand identities have over 60 images.

Top Results From Across the Web

Trillionpairs

Register and download our training and testing data sets. ... This dataset has been excluded from both LFW and Asian-Celeb. Asian-Celeb 93,979 ids/2,830,146 ......

Highlights from Glint's Asia-Pacific People Success Summit

The Asia-Pacific Glint People Success Summit took a deep dive into what the most recent data is telling us about people's happiness at...

Glint360K Dataset - Papers With Code

The largest and cleanest face recognition dataset Glint360K, which contains 17,091,657 images of 360,232 individuals, baseline models trained on Glint360K ...

GLINT- Everything you need to know - LinkedIn

Many organizations rely on annual or semi-annual surveys to gather employee engagement data. This process typically requires a significant ...

Discussion and conclusions - Metformin in non-diabetic ...

The original GLINT proposal was based upon modelled CVD risk using data from the EPIC-Norfolk study. The feasibility study demonstrated that the CVD...