Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Some details about the training parameters.

See original GitHub issue

I trained the network from scratch but got poor results. Below are my training parameters(same as the training guide you provide).

--program_name=twingan
--dataset_name="image_only"
--dataset_dir="dir to celeba tfrecord files"
--unpaired_target_dataset_name="anime_faces"
--unpaired_target_dataset_dir="dir to the anime tfrecord you provided"
--train_dir="dir to save results"
--dataset_split_name=train
--preprocessing_name="danbooru"
--resize_mode=RANDOM_CROP
--do_random_cropping=True
--learning_rate=0.0001
--learning_rate_decay_type=fixed
--is_training=True
--generator_network="pggan"
--loss_architecture=dragan
--pggan_max_num_channels=256
--generator_norm_type=batch_renorm
--use_ttur=True
--num_images_per_resolution=50000

Compared with official PGGAN repo, I found some differences.

num_image_per_resolution of TwinGAN is 50000 while PGGAN is 600000.
TwinGAN uses RANDOM_CROP while PGGAN uses RESIZE directly.

Could you please help me out here?

Issue Analytics

State:
Created 5 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

jerryli27commented, Aug 17, 2018

Hi @lionel3 I updated the training documentation. There was indeed a bug in my default parameters. After fixing that I am able to reproduce my previous results.

Please sync to the latest version and see https://github.com/jerryli27/TwinGAN/blob/master/docs/training.md .

The parameters I added are:

--do_pixel_norm=True
--l_content_weight=0.1
--l_cycle_weight=1.0

Please reopen this issue if you cannot reproduce. Thanks!

1reaction

jerryli27commented, Aug 8, 2018

I added two lines to the training script. It should work now. --gradient_penalty_lambda=0.25 --use_unet=True

The whole script now looks like:

python pggan_runner.py
--program_name=twingan
--dataset_name="image_only"
# Assume you have data like 
# ./data/celeba/train-00000-of-00100.tfrecord,  
# ./data/celeba/train-00001-of-00100.tfrecord ...
--dataset_dir="./data/celeba/"
--unpaired_target_dataset_name="anime_faces"
--unpaired_target_dataset_dir="./data/anime_faces/"
--train_dir="./checkpoints/twingan_faces/"
--dataset_split_name=train
--preprocessing_name="danbooru"
--resize_mode=RESHAPE
--do_random_cropping=True
--learning_rate=0.0001
--learning_rate_decay_type=fixed
--is_training=True
--generator_network="pggan"
--use_unet=True
--num_images_per_resolution=300000
--loss_architecture=dragan
--gradient_penalty_lambda=0.25
--pggan_max_num_channels=256
--generator_norm_type=batch_renorm
--hw_to_batch_size="{4: 8, 8: 8, 16: 8, 32: 8, 64: 8, 128: 4, 256: 3, 512: 2}"

I haven’t tested with the multi-gpu setting thoroughly yet due to limits in hardware, so yes there may be some bug, but you can try to add the following flags.

--sync_replicas=False
--replicas_to_aggregate=1
--num_clones=2
--worker_replicas=1

I updated the training readme with the comments above.

Top Results From Across the Web

Training Parameters - Amazon Machine Learning

Typically, machine learning algorithms accept parameters that can be used to control certain properties of the training process and of the resulting ML...

Training Parameters - Cognex Deep Learning Help

The Deep Learning framework provides several general parameters that you can use to influence the training process: Epoch Count, Training Set for all...

Parameters, Hyperparameters, Machine Learning

Explains hyperparameters and parameters in machine learning and deep learning and how they relate with each other. list of hyperparameters and parameters.

How much training data do you need? | by Malay Haldar

Of course, larger models need more training data, but for a given ratio of the number of training samples to the number of...

Chapter 9 Training Neural Networks Part 3 - Deep Learning

Training related parameters such the Learning Rate, Momentum parameter, Regularization parameters, Batch Size, Stopping Time etc. All these parameters are ...