Number of channels in Generator is incorrect at higher resolutions, causing the model to have far too many parameters
See original GitHub issueAt 1024x1024 (default settings other than image_size), the model from this repo’s channels look like this:
Gen block channels
64 -> 8192
Gen block channels
8192 -> 4096
Gen block channels
4096 -> 2048
Gen block channels
2048 -> 1024
Gen block channels
1024 -> 512
Gen block channels
512 -> 256
Gen block channels
256 -> 128
Gen block channels
128 -> 64
Gen block channels
64 -> 32
number of parameters: 1237966560
For contrast, the original tensorflow implementation looks like this for 1024x1024:
G Params OutputShape WeightShape
--- --- --- ---
latents_in - (?, 512) -
labels_in - (?, 0) -
lod - () -
dlatent_avg - (512,) -
G_mapping/latents_in - (?, 512) -
G_mapping/labels_in - (?, 0) -
G_mapping/Normalize - (?, 512) -
G_mapping/Dense0 262656 (?, 512) (512, 512)
G_mapping/Dense1 262656 (?, 512) (512, 512)
G_mapping/Dense2 262656 (?, 512) (512, 512)
G_mapping/Dense3 262656 (?, 512) (512, 512)
G_mapping/Dense4 262656 (?, 512) (512, 512)
G_mapping/Dense5 262656 (?, 512) (512, 512)
G_mapping/Dense6 262656 (?, 512) (512, 512)
G_mapping/Dense7 262656 (?, 512) (512, 512)
G_mapping/Broadcast - (?, 18, 512) -
G_mapping/dlatents_out - (?, 18, 512) -
Truncation/Lerp - (?, 18, 512) -
G_synthesis/dlatents_in - (?, 18, 512) -
G_synthesis/4x4/Const 8192 (?, 512, 4, 4) (1, 512, 4, 4)
G_synthesis/4x4/Conv 2622465 (?, 512, 4, 4) (3, 3, 512, 512)
G_synthesis/4x4/ToRGB 264195 (?, 3, 4, 4) (1, 1, 512, 3)
G_synthesis/8x8/Conv0_up 2622465 (?, 512, 8, 8) (3, 3, 512, 512)
G_synthesis/8x8/Conv1 2622465 (?, 512, 8, 8) (3, 3, 512, 512)
G_synthesis/8x8/Upsample - (?, 3, 8, 8) -
G_synthesis/8x8/ToRGB 264195 (?, 3, 8, 8) (1, 1, 512, 3)
G_synthesis/16x16/Conv0_up 2622465 (?, 512, 16, 16) (3, 3, 512, 512)
G_synthesis/16x16/Conv1 2622465 (?, 512, 16, 16) (3, 3, 512, 512)
G_synthesis/16x16/Upsample - (?, 3, 16, 16) -
G_synthesis/16x16/ToRGB 264195 (?, 3, 16, 16) (1, 1, 512, 3)
G_synthesis/32x32/Conv0_up 2622465 (?, 512, 32, 32) (3, 3, 512, 512)
G_synthesis/32x32/Conv1 2622465 (?, 512, 32, 32) (3, 3, 512, 512)
G_synthesis/32x32/Upsample - (?, 3, 32, 32) -
G_synthesis/32x32/ToRGB 264195 (?, 3, 32, 32) (1, 1, 512, 3)
G_synthesis/64x64/Conv0_up 2622465 (?, 512, 64, 64) (3, 3, 512, 512)
G_synthesis/64x64/Conv1 2622465 (?, 512, 64, 64) (3, 3, 512, 512)
G_synthesis/64x64/Upsample - (?, 3, 64, 64) -
G_synthesis/64x64/ToRGB 264195 (?, 3, 64, 64) (1, 1, 512, 3)
G_synthesis/128x128/Conv0_up 1442561 (?, 256, 128, 128) (3, 3, 512, 256)
G_synthesis/128x128/Conv1 721409 (?, 256, 128, 128) (3, 3, 256, 256)
G_synthesis/128x128/Upsample - (?, 3, 128, 128) -
G_synthesis/128x128/ToRGB 132099 (?, 3, 128, 128) (1, 1, 256, 3)
G_synthesis/256x256/Conv0_up 426369 (?, 128, 256, 256) (3, 3, 256, 128)
G_synthesis/256x256/Conv1 213249 (?, 128, 256, 256) (3, 3, 128, 128)
G_synthesis/256x256/Upsample - (?, 3, 256, 256) -
G_synthesis/256x256/ToRGB 66051 (?, 3, 256, 256) (1, 1, 128, 3)
G_synthesis/512x512/Conv0_up 139457 (?, 64, 512, 512) (3, 3, 128, 64)
G_synthesis/512x512/Conv1 69761 (?, 64, 512, 512) (3, 3, 64, 64)
G_synthesis/512x512/Upsample - (?, 3, 512, 512) -
G_synthesis/512x512/ToRGB 33027 (?, 3, 512, 512) (1, 1, 64, 3)
G_synthesis/1024x1024/Conv0_up 51297 (?, 32, 1024, 1024) (3, 3, 64, 32)
G_synthesis/1024x1024/Conv1 25665 (?, 32, 1024, 1024) (3, 3, 32, 32)
G_synthesis/1024x1024/Upsample - (?, 3, 1024, 1024) -
G_synthesis/1024x1024/ToRGB 16515 (?, 3, 1024, 1024) (1, 1, 32, 3)
G_synthesis/images_out - (?, 3, 1024, 1024) -
G_synthesis/noise0 - (1, 1, 4, 4) -
G_synthesis/noise1 - (1, 1, 8, 8) -
G_synthesis/noise2 - (1, 1, 8, 8) -
G_synthesis/noise3 - (1, 1, 16, 16) -
G_synthesis/noise4 - (1, 1, 16, 16) -
G_synthesis/noise5 - (1, 1, 32, 32) -
G_synthesis/noise6 - (1, 1, 32, 32) -
G_synthesis/noise7 - (1, 1, 64, 64) -
G_synthesis/noise8 - (1, 1, 64, 64) -
G_synthesis/noise9 - (1, 1, 128, 128) -
G_synthesis/noise10 - (1, 1, 128, 128) -
G_synthesis/noise11 - (1, 1, 256, 256) -
G_synthesis/noise12 - (1, 1, 256, 256) -
G_synthesis/noise13 - (1, 1, 512, 512) -
G_synthesis/noise14 - (1, 1, 512, 512) -
G_synthesis/noise15 - (1, 1, 1024, 1024) -
G_synthesis/noise16 - (1, 1, 1024, 1024) -
images_out - (?, 3, 1024, 1024) -
--- --- --- ---
Total 30370060
In the original implementation, the highest number of channels a convolution has is 512, but in this implementation, it grows to 8192?
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (5 by maintainers)
Top Results From Across the Web
CS230: Deep Learning - Stanford University
Solution: Overfitting: the model fits the training set so much that it does not generalize well. Low training error and high dev error...
Read more >Release 12 Major Bug Fixes - Electrical and Computer Engineering
A MATLAB 5.2 bug in DELAUNAY was fixed for MATLAB 5.3 but this caused different ... The complex Rayleigh fading channels have incorrect...
Read more >COMMON ERRORS IN KRONOS AND HOW TO FIX THEM
Employee Error Details. Entering work hours before they have actually been worked. Employees are not allowed to record their work time until they...
Read more >A survey on Image Data Augmentation for Deep Learning
Deep convolutional neural networks have performed remarkably well on many Computer Vision tasks. However, these networks are heavily reliant ...
Read more >Makoto's preprocessing pipeline - SCCN
Choose 'use MNI coordinate file for BEM dipfit model' unless you have special reason not to do so. This will automatically import channel...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Also, the 4x4 constant should have the same number of channels as the first convolution (in the 1024x1024 case 512 channels).
It appears that the cap is separate from the latents:
Cap: https://github.com/NVlabs/stylegan2/blob/master/training/networks_stylegan2.py#L315
Latents: https://github.com/NVlabs/stylegan2/blob/master/training/networks_stylegan2.py#L254 https://github.com/NVlabs/stylegan2/blob/master/training/networks_stylegan2.py#L256 https://github.com/NVlabs/stylegan2/blob/master/training/networks_stylegan2.py#L259 https://github.com/NVlabs/stylegan2/blob/master/training/networks_stylegan2.py#L309