vargfacenet + arcloss infinite loss
See original GitHub issueI try to retrain vargfacenet as in the LFR Challenge, however the training is showing infinite loss immediately. Can someone please help? For the information, I train with smaller batch_size and with only one GPU, but I doubt if that is the problem. Log:
CUDA_VISIBLE_DEVICES='0' python -u train.py --network vargfacenet --loss arcface --dataset retina
gpu num: 1
prefix ./models/vargfacenet-arcface-retina/model
image_size [112, 112]
num_classes 93431
Called with argument: Namespace(batch_size=32, ckpt=3, ctx_num=1, dataset='retina', frequent=20, image_channel=3, kvstore='device', loss='arcface', lr=0.1, lr_steps='100000,160000,220000', models_root='./models', mom=0.9, network='vargfacenet', per_batch_size=32, pretrained='', pretrained_epoch=1, rescale_threshold=0, verbose=2000, wd=0.0005) {'bn_mom': 0.9, 'workspace': 256, 'emb_size': 512, 'ckpt_embedding': True, 'net_se': 0, 'net_act': 'prelu', 'net_unit': 3, 'net_input': 1, 'net_blocks': [1, 4, 6, 2], 'net_output': 'J', 'net_multiplier': 1.25, 'val_targets': ['lfw', 'cfp_fp', 'agedb_30'], 'ce_loss': True, 'fc7_lr_mult': 1.0, 'fc7_wd_mult': 1.0, 'fc7_no_bias': False, 'max_steps': 0, 'data_rand_mirror': True, 'data_cutoff': False, 'data_color': 0, 'data_images_filter': 0, 'count_flops': True, 'memonger': False, 'loss_name': 'margin_softmax', 'loss_s': 64.0, 'loss_m1': 1.0, 'loss_m2': 0.5, 'loss_m3': 0.0, 'net_name': 'vargfacenet', 'dataset': 'retina', 'dataset_path': '../datasets/ms1m-retinaface-t1', 'num_classes': 93431, 'image_shape': [112, 112, 3], 'loss': 'arcface', 'network': 'vargfacenet', 'num_workers': 1, 'batch_size': 32, 'per_batch_size': 32}
Network FLOPs: 1.0G
INFO:root:loading recordio ../datasets/ms1m-retinaface-t1/train.rec...
header0 label [5179511. 5272942.]
id2range 93431
5179510
rand_mirror True
[13:21:19] src/engine/engine.cc:55: MXNet start using engine: ThreadedEnginePerDevice
loading bin 0
loading bin 1000
loading bin 2000
loading bin 3000
loading bin 4000
loading bin 5000
loading bin 6000
loading bin 7000
loading bin 8000
loading bin 9000
loading bin 10000
loading bin 11000
(12000, 3, 112, 112)
ver lfw
loading bin 0
loading bin 1000
loading bin 2000
loading bin 3000
loading bin 4000
loading bin 5000
loading bin 6000
loading bin 7000
loading bin 8000
loading bin 9000
loading bin 10000
loading bin 11000
loading bin 12000
loading bin 13000
(14000, 3, 112, 112)
ver cfp_fp
loading bin 0
loading bin 1000
loading bin 2000
loading bin 3000
loading bin 4000
loading bin 5000
loading bin 6000
loading bin 7000
loading bin 8000
loading bin 9000
loading bin 10000
loading bin 11000
(12000, 3, 112, 112)
ver agedb_30
lr_steps [100000, 160000, 220000]
call reset()
/home/vdx/csenv/lib/python3.7/site-packages/mxnet/module/base_module.py:504: UserWarning: Optimizer created manually outside Module but rescale_grad is not normalized to 1.0/batch_size/num_workers (1.0 vs. 0.03125). Is this intended?
optimizer_params=optimizer_params)
[13:22:01] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:97: Running performance tests to find the best convolution algorithm, this can take a while... (set the environment variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
INFO:root:Epoch[0] Batch [0-20] Speed: 62.45 samples/sec acc=0.000000 lossvalue=nan
INFO:root:Epoch[0] Batch [20-40] Speed: 61.56 samples/sec acc=0.000000 lossvalue=nan
INFO:root:Epoch[0] Batch [40-60] Speed: 58.71 samples/sec acc=0.000000 lossvalue=nan
INFO:root:Epoch[0] Batch [60-80] Speed: 44.77 samples/sec acc=0.000000 lossvalue=nan
INFO:root:Epoch[0] Batch [80-100] Speed: 74.33 samples/sec acc=0.000000 lossvalue=nan
INFO:root:Epoch[0] Batch [100-120] Speed: 26.56 samples/sec acc=0.000000 lossvalue=nan
Issue Analytics
- State:
- Created 4 years ago
- Comments:9
Top Results From Across the Web
ArcFace: Additive Angular Margin Loss for Deep Face ... - arXiv
In this paper, we first introduce an Additive Angular Margin Loss (ArcFace), which not only has a clear geometric interpretation but also significantly...
Read more >Face Recognition and ArcFace: Additive Angular Margin Loss ...
In this article, you will discover an ArcFace approach, which obtains highly discriminative features for face recognition.
Read more >Labeled Faces in the Wild Benchmark (Face Verification)
Rank Model Accuracy Year
1 VarGFaceNet 99.85% 2019
2 ArcFace + MS1MV2 + R100, 99.83% 2018
3 PFEfuse+match 99.82% 2019
Read more >How to Choose a Loss Function For Face Recognition
The first half of this article describes loss functions that provide fine-grained control over these two sub-tasks. Unlike a generic classification task, it's ......
Read more >VarGFaceNet: An Efficient Variable Group Convolutional ...
To enhance interpretation ability, we employ an equivalence of angular distillation loss to guide our lightweight network and we apply recursive knowledge.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Go to the lfr challenge page, they link to the retina dataset.
Thank you for your instant reply. I will try immediately.