Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Help wanted] Not satisfaying image_stylization result

See original GitHub issue

Hello,

based on the info at https://github.com/tensorflow/magenta/tree/master/magenta/models/image_stylization I tried to train new models from scratch but when transferring the style I get bad results, probably due to the same cause as #935 and #1285.

I report here the more information I can to help solve the issue:

I ran

image_stylization_create_dataset \
      --vgg_checkpoint=vgg/vgg_16.ckpt \
      --style_files=STYLE_IMAGE.jpg \
      --output_file=TRAIN_RECORD.tfrecord

then

image_stylization_train \
      --train_dir=TRAIN_DIR \
      --style_dataset_file=TRAIN_RECORD.tfrecord \
      --num_styles=1 \
      --vgg_checkpoint=vgg/vgg_16.ckpt \
      --imagenet_data_dir=imagenet_tf_output

After the model was generated I used the following code to transfer the style.

image_stylization_transform \
      --num_styles=1 \
      --checkpoint=MODEL.CKPT-XXXX \
      --input_image=INPUT_IMAGE.jpg \
      --which_styles="[0]" \
      --output_dir=OUTPUT_DIR \
      --output_basename="stylized"

Attached you can find the style image I used for training, starry_night_1280.jpg, resolution is 1280 × 1014 (I get OOM issues during image_stylization_create_dataset if I use bigger resolution images)

starry_night_1280

I used the default content/style loss hyperparameters at the one in the scripts, I didn’t change them.
Attached you can find the content image I used, home_1600.jpg, resolution is 1600 × 992.

home_1600

Attached you can find first the style transfer result obtained using the Varied model provided, which has the Starry night style, and then the result I got using the model I trained. If you look at the one obtained with the Varied model (home_varied_model_result.jpg) you can clearly see the starry night style has been applied; if you look at the one obtained using the model I trained (home_custom_model_result.jpg), you can see there is “something” missing.

home_varied_model_result

home_custom_model_result

Environment info: TensorFlow: v1.12.0 Nvidia K80 Nvidia driver: v390.77 Cuda: v9.0 Installed using conda package https://anaconda.org/anaconda/tensorflow-gpu

Could you please point me in the right direction to discover what’s wrong in the process I follow? Thank you very much.

Issue Analytics

State:
Created 5 years ago
Reactions:1
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

vdumoulincommented, Dec 21, 2018

Hi Emanuele!

The resolution of the content and style images you are using is probably too large for the model. We usually train on square content images of size 256px (see here). I don’t recall what exact style image sizes we used, but I remember them being smaller than the resolutions you trained at (this is also the case for other commonly-used fast stylization models; see here and here).

This has several consequences which could explain your observations:

The VGG network which is used to compute the style transfer loss was trained on 224x224 inputs, and feeding in much larger inputs runs the risk of getting OOM errors. Likewise, the style transfer network architecture proposed by Johnson et al. and adopted for this model was built with 256x256 inputs in mind.
The Gram matrices computed from the VGG feature maps are not input-scale invariant, meaning that feeding in the same style image at different resolutions will produce different Gram matrices, and consequently will lead to the style loss emphasizing different kinds of visual textures.
Once the style transfer network is trained, there’s an inherent spatial scale to the visual textures that get embedded onto the stylized image. Given that the model is trained with 256x256 content images, feeding a much larger content image at evaluation time will result in the brush strokes and “swirls” appearing comparatively smaller (this is visible in the image you included in point 5). Also, note that stylizing large content images is non-trivial, and this is true even for the optimization-based procedure (see this paper for more details).

With all of that in mind, I would suggest that you lower the resolution of the style image you use to train as well as the resolution of the content image you use for evaluation.

Please don’t hesitate to reach out again if you have further questions!

0reactions

reavcncommented, Jan 28, 2019

I have similar issues to the ones @ema987 mentioned in his latest post. I can’t get a satisfying output from a 256x256 input image. I followed every step of the OP link.

The content loss appears to be definitely smaller than the style loss

Top Results From Across the Web

Help your employees find purpose--or watch them leave

In this article, we describe the role that work can play in individual purpose, highlight what employees want from employers and what they...

Maslow's Hierarchy of Needs - Simply Psychology

If these needs are not satisfied the human body cannot function optimally. Maslow considered physiological needs the most important as all ...

Know Your Customers' “Jobs to Be Done”

If it does the job well, we'll hire it again. If it does a crummy job, we “fire” it and look for something...

What Are the Most Satisfying Jobs? - LiveAbout

Check out some of the most satisfying jobs, review what makes a job satisfying, explore options, and learn how to find a satisfying...

Careers for creative people - Bureau of Labor Statistics

It discusses the creative process, highlights selected occupations that require creativity, and offers employment and wage data for these occupations.