Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Example for how to use the grow option?

See original GitHub issue

I am trying to use the progressive growth option but I am getting an error when trying to use it as I think it is supposed to be used:

I have a trained 32x32 checkpoint which I am now trying to grow to a 64x64 one, so I am using the following arguments:

python3 train.py --config configs/config_64x64.json --name chkpt_64_1 --batch-size 100 --grow chkpt_32_2.pth --grow-config configs/config_32x32.json

The config_32x32.json is the default one from the repository, the config_64x64.json is using the additional layers and changed values as mentioned in #9:

#from 32x32:

 "model": {
        "type": "image_v1",
        "input_channels": 3,
        "input_size": [32, 32],
        "patch_size": 1,
        "mapping_out": 256,
        "depths": [2, 4, 4],
        "channels": [128, 256, 512],
        "self_attn_depths": [false, true, true],
        "dropout_rate": 0.05,
        "augment_prob": 0.12,
        "sigma_data": 0.5,
        "sigma_min": 1e-2,
        "sigma_max": 80,
        "sigma_sample_density": {
            "type": "lognormal",
            "mean": -1.2,
            "std": 1.2
        }
    },
    
#from 64x64:

"model": {
        "type": "image_v1",
        "input_channels": 3,
        "input_size": [64, 64],
        "patch_size": 1,
        "mapping_out": 256,
        "depths": [2, 2, 4, 4],
        "channels": [128, 256, 256, 512],
        "self_attn_depths": [false, false, true, true],
        "dropout_rate": 0.05,
        "augment_prob": 0.12,
        "sigma_data": 0.5,
        "sigma_min": 1e-2,
        "sigma_max": 80,
        "sigma_sample_density": {
            "type": "lognormal",
            "mean": -1.2,
            "std": 1.2
        }
    },

But when trying to run train.py I am getting a whole lot of “key missing” and “size mismatch” errors in inner_model.load_state_dict(old_inner_model.state_dict()) Missing key(s) in state_dict: "inner_model.u_net.d_blocks.1.2.main.0.mapper.weight", "inner_model.u_net.d_blocks.1.2.main.0.mapper.bias", "inner_model.u_net.d_blocks.1.2.main.2.weight", "inner_model.u_net.d_blocks.1.2.main.2.bias", "inn....

So I am wondering whether I am doing something wrong here or if this just one of those “work in progress” issues.

Is suspect that I might rather have to do something that involves patch_size and skip_stages since those are used in the wrapper, but I have no idea what their function is.

Issue Analytics

State:
Created a year ago
Comments:14 (3 by maintainers)

Top GitHub Comments

1reaction

Quasimondocommented, Jul 29, 2022

From what I have understood, you first have to decide what the maximum size is you want to train for and create that config. So for 128 that would be something like:

#conf128.json
"model": {
      "type": "image_v1",
      "input_channels": 3,
      "input_size": [128, 128],
      "skip_stages":0,
      "mapping_out": 256,
      "depths": [2, 2, 2, 4, 4],
      "channels": [128, 256, 256, 512, 512],
      "self_attn_depths": [false, false, false, true, true],
      "dropout_rate": 0.05,
      "augment_prob": 0.12,
      "sigma_data": 0.5,
      "sigma_min": 1e-2,
      "sigma_max": 160,
      "sigma_sample_density": {
          "type": "lognormal",
          "mean": -1.2,
          "std": 1.2
      }
  },.....

Now you make a copy of that conf and for the first stage (assuming you start with 32x32) you change the values in the copy to:

#conf_32x32_skip.json
"model": {
      "type": "image_v1",
      "input_channels": 3,
      "input_size": [32, 32],
      "skip_stages":2,
      "mapping_out": 256,
      "depths": [2, 2, 2, 4, 4],
      "channels": [128, 256, 256, 512, 512],
      "self_attn_depths": [false, false, false, true, true],
      "dropout_rate": 0.05,
      "augment_prob": 0.12,
      "sigma_data": 0.5,
      "sigma_min": 1e-2,
      "sigma_max": 160,
      "sigma_sample_density": {
          "type": "lognormal",
          "mean": -1.2,
          "std": 1.2
      }
  },....

(I don’t know if the sigma_max value has to be reduced here to 80?)

In the first stage you do not use the grow argument yet. When the 32x32 model has finished training you create another conf for the 64x64 step:

#conf_64x64_skip.json
 "model": {
        "type": "image_v1",
        "input_channels": 3,
        "input_size": [64, 64],
        "skip_stages":1,
        "mapping_out": 256,
        "depths": [2, 2, 2, 4, 4],
        "channels": [128, 256, 256, 512, 512],
        "self_attn_depths": [false, false, false, true, true],
        "dropout_rate": 0.05,
        "augment_prob": 0.12,
        "sigma_data": 0.5,
        "sigma_min": 1e-2,
        "sigma_max": 160,
        "sigma_sample_density": {
            "type": "lognormal",
            "mean": -1.2,
            "std": 1.2
        }
    },....

This time you have to use the -grow argument:

python3 train.py --config configs/config_64x64_skip.json --name chkpt_64 --grow chkpt_32.pth --grow-config configs/config_32x32_skip.json

And once that has finished you can use the 128 conf:

python3 train.py --config configs/config_128.json --name chkpt_128 --grow chkpt_64.pth --grow-config configs/config_64x64_skip.json

1reaction

Quasimondocommented, Jul 29, 2022

Oh yes that’s a possibility of course. I have only just started diving into training my own diffusion models, but one observation with my toy models I made is that the old rule that well-aligned datasets of similar things converge better than those that are zoom-level or composition-wise all over the place still applies. Which is why right now I try to keep my data within a certain theme or scale level (and which is why I try to use the grow method).