Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cannot train caffe.CaffeFunction

See original GitHub issue

Hi I am using 4.0.0b3 to load caffe from shicai however I want to fine-tune the model like so:

def truncate_bn(sym):
    # Need to truncate batchnorm - eps
    for layer in list(sym._children):
        if "bn" in layer:
            if sym.__dict__[layer].eps < 1e-5:
                sym.__dict__[layer].eps = 1e-5

# Load base model
base_symbol = caffe.CaffeFunction("DenseNet_121.caffemodel")
# Truncate
truncate_bn(base_symbol)

class DenseNet121(chainer.Chain):
    # Class to wrap base (up to pool5 output)
    def __init__(self, base_symbol, n_classes=16):
        super(DenseNet121, self).__init__()
        self.base_symbol = base_symbol
        self.base_symbol.to_gpu()
        with self.init_scope():
            self.fc = L.Linear(1024, n_classes)
    
    def __call__(self, x):
        with chainer.using_config('train', True):
            h = self.base_symbol(inputs={'data':cuda.to_gpu(x)}, outputs=['pool5'])[0]
        return self.fc(h)

def init_model(m, lr=0.001, momentum=0.9):
    optimizer = optimizers.MomentumSGD(lr, momentum)
    optimizer.setup(m)
    return optimizer

# Create symbol
chainer.cuda.get_device(0).use()  # Make a specified GPU current
sym = DenseNet121(base_symbol = base_symbol)
sym.to_gpu()  # Copy the model to the GPU

optimizer = init_model(sym)

# Random data
data = np.random.rand(32, 3, 224, 224).astype('float32')
target = np.ones((32, 16)).astype('int32')

# Try test-forward
with chainer.using_config('train', True), chainer.using_config('enable_backprop', True):
    for _ in range(10):
        # Data
        data = cuda.to_gpu(data)
        target = cuda.to_gpu(target)
        # Forward pass
        output = sym(data)
        # Loss
        loss = F.sigmoid_cross_entropy(output, target)
        sym.cleargrads()
        # Optimiser
        loss.backward()
        optimizer.update()
        #Log
        print(loss)
        print("Sum of conv1:", np.sum(sym.base_symbol['conv1'].W))
        print("Sum of fc:", np.sum(sym.fc.W))

But the weights for FC update but not for the base-model loaded from caffe:

variable(0.8340064) Sum of conv1: variable(-3.52687) Sum of fc: variable(-7.717535) variable(0.82841897) Sum of conv1: variable(-3.52687) Sum of fc: variable(-7.3210917) variable(0.8178877) Sum of conv1: variable(-3.52687) Sum of fc: variable(-6.757659) variable(0.80306095) Sum of conv1: variable(-3.52687) Sum of fc: variable(-6.046247) variable(0.784577) Sum of conv1: variable(-3.52687) Sum of fc: variable(-5.2045937)

Is CaffeFunction not trainable?

Issue Analytics

State:
Created 6 years ago
Comments:8 (3 by maintainers)

Top GitHub Comments

1reaction

beam2dcommented, Mar 28, 2018

We need to fix CaffeFunction to not keep unnecessary references to the intermediate variables so that memory is effectively reused during forward propagation. Deleting references after forward may partially solve the situation (esp. memory usage of backward), but it does not shrink the memory usage of forward propagation. You may also try #4301 to further reduce the memory usage during backward.

0reactions

ilkarmancommented, Jun 28, 2018

Sorry yes!