question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

train 2 unet error

See original GitHub issue

hello I want to try to train two unet this is my train code

def train(img):
    global LOSS
    Loss=[0,0]
    for i in range(2):
        name = "image_checkpoint.pth"
        trainer.load(name)
        loss = trainer(img,unet_number =i+1)
        Loss[i] = Loss[i]+loss
        trainer.update(unet_number=i+1)
        trainer.save(name)
    wandb.log({"u1_loss":Loss[0],"u2_loss":Loss[1]},step=step)
    LOSS[0]=LOSS[0]+Loss[0]
    LOSS[1]=LOSS[1]+Loss[1]

but an error occurs

you cannot only train on one unet at a time. you will need to save the trainer into a checkpoint, and resume training on a new unet

Therefore I changed my code

def train(img):
    global LOSS
    Loss=[0,0]
    for i in range(2):
        name = "image_checkpoint"+str(i+1)+".pth"
        trainer.load(name)
        loss = trainer(img,unet_number =i+1)
        Loss[i] = Loss[i]+loss
        trainer.update(unet_number=i+1)
        trainer.save(name)
    wandb.log({"u1_loss":Loss[0],"u2_loss":Loss[1]},step=step)
    LOSS[0]=LOSS[0]+Loss[0]
    LOSS[1]=LOSS[1]+Loss[1]

But he still got the above error

How to train two unet ? thank

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
TheFusion21commented, Oct 17, 2022

afaik you have to reinstantiate the trainer and/or imagen and then load the cp

0reactions
dinhdungzcommented, Oct 17, 2022

@TheFusion21 Thanks you, that error is gone

Read more comments on GitHub >

github_iconTop Results From Across the Web

Incompatible shapes error while training my UNET model in ...
The model runs successfully, but I get an error,Shapes (2, 512, 512, 2, 17) and (2, 512, 512, 2) are incompatible.
Read more >
No such file or directory error when trying to train TAO Unet ...
This kind of error is usually due to wrong setting in ~/.tao_mounts.json. Please note that all the path when you type "!tao unet...
Read more >
U-net segmentation gives error for layers output size after ...
I have resized the images to power of 2. Exact size is [512 512 4]. There will be 4 class. Code works fine...
Read more >
Unet training Error: The size of tensor a (16) must match the ...
I'm trying to train a Unit model on LandCoverNet dataset, which is a satellite imagery dataset that contains input images and corresponding land ......
Read more >
Learn How to Train U-Net On Your Dataset | by Sukriti Paul
Fig.2: Architecture of U-Net based on the paper by Olaf Ronneberger et.al ... Incase a few errors pop up, go though my answers...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found