question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Parameter settings for `20170511-185253`

See original GitHub issue

Hi –

I’m trying to reproduce the performance of the 20170511-185253 model, training on the CASIA dataset. Using the following parameter settings from the wiki, I get LFW accuracy of 0.979667 for the final model.

Are these the parameters that were used to train 20170511-185253? If not, are those parameters available somewhere?

Thanks!

export PYTHONPATH="./src"

# Align
for N in {1..4}; do
    python src/align/align_dataset_mtcnn.py \
        ./data/CASIA-WebFace-clean/ \
        ./data/casia_maxpy_mtcnnpy_182 \
        --image_size 182 --margin 44 --random_order --gpu-id 0 --gpu_memory_fraction 0.2 &
done

# Train

python src/train_softmax.py \
    --logs_base_dir ./logs \
    --models_base_dir ./models \
    --data_dir ./data/casia_maxpy_mtcnnpy_182 \
    --image_size 160 \
    --model_def models.inception_resnet_v1 \
    --optimizer RMSPROP \
    --learning_rate -1 \
    --max_nrof_epochs 80 \
    --keep_probability 0.8 \
    --random_crop \
    --random_flip \
    --learning_rate_schedule_file ./data/learning_rate_schedule_classifier_casia.txt \
    --weight_decay 5e-5 \
    --center_loss_factor 1e-2 \
    --center_loss_alfa 0.9

(Also, FWIW, in the last epoch of training loss is approx. 1.35 and regloss is approx. 1.06)

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Reactions:1
  • Comments:11 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
bkjcommented, Jun 27, 2017

OK – as far as I can tell, the issue is related to using the “fused batch norm” option in Tensorflow:

https://www.tensorflow.org/performance/performance_guide#use_fused_batch_norm

That page makes it sound like fused=True uses a newer, faster but identical batch norm implementation. Before I ran training for the first time, I was seeing if there were any trivial ways to speed up training, so I set fused=True. However, I trained ~ 3 models w/ fused batched norm and never broke acc=0.979 whereas the one model I trained w/o fused batched norm got to 0.983.

I recognize there’s variability between runs, so I guess take this as preliminary evidence. I haven’t looked at the batch norm vs. fused batch norm implementations to see if it’s even possible that they give different results, but will post here with more info if I figure anything out.

EDIT: FWIW, training w/ fused batchnorm gives lower training loss (w/ higher variance?) for the duration:

screen shot 2017-06-27 at 10 00 37 am
0reactions
AlexeyAL1ecommented, May 18, 2022

Okay, this repository has a huge number of problems, but I’m gradually trying to solve them. Firstly, I formed my dataset of 30 personalities, each person has 34 face photos. after that i used these commands to align and get the loss with using script softmax.py Second, I had to run this script to get the .pb file python src/freeze_graph.py .model_dir output_file.pb Thirdly I got a .pkl file using this python src/classifier.py TRAIN ~/datasets/my_dataset/train/ ~/models/model-20170216-091149.pb ~/models/my_classifier.pkl --batch_size 34 Now, I use my model to identify faces in my test dataset, with run script: python src/classifier.py CLASSIFY ~/datasets/my_dataset/test/ ~/models/model-20170216-091149.pb ~/models/my_classifier.pkl --batch_size 34 but I get an error Found array with 0 sample(s) (shape=(0, 512)) while a minimum of 1 is required, it looks like the file .pkl, there are no entries for embedded images Tell me, if these are all points, then what am I doing wrong? @bkj @MaartenBloemen @jerryhouuu

Read more comments on GitHub >

github_iconTop Results From Across the Web

First Steps with ASM (3): Configuring Parameter Sets - YouTube
This tutorial video shows you how to configure parameter sets in ModelDesk which are needed to parameterize ASM (dSPACE Automotive ...
Read more >
Ops Parameters and Configuration Control
Mission is responsible to provide web-accessible list of (or pointers to) all the parameters, their definitions, and their values over time.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found