I'm implementing the image embedding in your paper, what am I doing wrong?
See original GitHub issueHey, so let me start by saying this is some impressive work! I’ve decided to take the challenge of implementing the image embedding + editing from your paper. I feel like I understand what I need to do, but I’m not getting good results.
Let me say that I have the W+ state for an embedded image. I used the dataset you used in your code to do the embedding with the system provided by Stylegan Encoder (https://github.com/Puzer/stylegan-encoder.git). It’s in the shape of [1, 14, 512]
, and I get the correct image out after running it through the synthesis function Gs.components.synthesis.get_output_for(dlatents, randomize_noise = False)
.
So if I understand your paper and the Image2StyleGAN paper correctly, what I have is the W+ state of the image. According to your paper, I should be able to implement the following:
is the image I’m trying to encode
was the process I went through to encode said image with Stylegan Encoder
is the [1, 14, 512]
shape that I got from said Stylegan Encoder process
is the Gs.components.synthesis.get_output_for
func
and
is equal toNow, for my implementation. I’m currently trying to manipulate one of the theta (ROT) variables.
In order to do this I have hijacked the truncate_generation
function. I felt it was the best spot to implement this.
Here is how I have implemented it:
def truncate_generation(Gs,inputcoeff,rate=0.7,dlatent_average_id=None):
embedded_w_plus_space = np.reshape(np.load('embedded.npy'),[1,14,512]).astype(np.float32) # embedded image
embedded_w_plus_space = tf.constant(embedded_w_plus_space)
# INPUTcoeff = tf.concat([IDcoeff,EXPcoeff,Rotcoeff,GAMMAcoeff], axis = 1)
# INPUTcoeff = tf.concat([INPUTcoeff, noise_], axis = 1)
# final shape = [160, 64, 3, 27, 32]
id_exp_zero = tf.zeros([1,160+64])
rand_rot_isolate = tf.concat([inputcoeff[:,160+64:160+64+1], tf.zeros([1, 2])], axis=1)
gamma_noise_zero = tf.zeros([1,27+32])
lambda_i_equals_a = tf.concat([id_exp_zero, rand_rot_isolate, gamma_noise_zero],axis=1)
lambda_i_equals_b = lambda_i_equals_a * 0.0
w_i_equals_a = Gs.components.mapping.get_output_for(lambda_i_equals_a, None ,is_training=False, is_validation = True)
w_i_equals_b = Gs.components.mapping.get_output_for(lambda_i_equals_b, None ,is_training=False, is_validation = True)
delta_w_iab = w_i_equals_a - w_i_equals_b #w(a) - w(b)
x_t_pre_synth = embedded_w_plus_space + delta_w_iab
x_t = Gs.components.synthesis.get_output_for(x_t_pre_synth, randomize_noise = False)
x_t = tf.clip_by_value((x_t+1)*127.5,0,255)
x_t = tf.transpose(x_t,perm = [0,2,3,1])
return x_t
Am I on the right track? Can you give me some hints? Thanks! Let me know if you want me to explain anything further.
Issue Analytics
- State:
- Created 3 years ago
- Comments:14
Top GitHub Comments
The backprop process is simple. We use a similar method as in (Image2StyleGAN)[https://arxiv.org/abs/1904.03189]. We start from an average latent vector in W+ space (1x14x512), where the average vector is calculated from 50k random samples. We use the latent vector as the input to the generator and obtain a generated image. By comparing the L2 loss and perceptual loss between the generated image and the real image which we intend to embed, the gradient can be backprop to the input latent vector in W+ space and we directly update it using an Adam optimizer. Finally, after around 3000 iterations, we obtain the final embedding vector we want.
Good news!
I switched to a different encoder and it works PHENOMENALLY!!! I will post my results tomorrow.
You did an amazing job writing this paper and building the system! Congratulations.