question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cannot train caffe.CaffeFunction

See original GitHub issue

Hi I am using 4.0.0b3 to load caffe from shicai however I want to fine-tune the model like so:

def truncate_bn(sym):
    # Need to truncate batchnorm - eps
    for layer in list(sym._children):
        if "bn" in layer:
            if sym.__dict__[layer].eps < 1e-5:
                sym.__dict__[layer].eps = 1e-5

# Load base model
base_symbol = caffe.CaffeFunction("DenseNet_121.caffemodel")
# Truncate
truncate_bn(base_symbol)

class DenseNet121(chainer.Chain):
    # Class to wrap base (up to pool5 output)
    def __init__(self, base_symbol, n_classes=16):
        super(DenseNet121, self).__init__()
        self.base_symbol = base_symbol
        self.base_symbol.to_gpu()
        with self.init_scope():
            self.fc = L.Linear(1024, n_classes)
    
    def __call__(self, x):
        with chainer.using_config('train', True):
            h = self.base_symbol(inputs={'data':cuda.to_gpu(x)}, outputs=['pool5'])[0]
        return self.fc(h)

def init_model(m, lr=0.001, momentum=0.9):
    optimizer = optimizers.MomentumSGD(lr, momentum)
    optimizer.setup(m)
    return optimizer

# Create symbol
chainer.cuda.get_device(0).use()  # Make a specified GPU current
sym = DenseNet121(base_symbol = base_symbol)
sym.to_gpu()  # Copy the model to the GPU

optimizer = init_model(sym)

# Random data
data = np.random.rand(32, 3, 224, 224).astype('float32')
target = np.ones((32, 16)).astype('int32')

# Try test-forward
with chainer.using_config('train', True), chainer.using_config('enable_backprop', True):
    for _ in range(10):
        # Data
        data = cuda.to_gpu(data)
        target = cuda.to_gpu(target)
        # Forward pass
        output = sym(data)
        # Loss
        loss = F.sigmoid_cross_entropy(output, target)
        sym.cleargrads()
        # Optimiser
        loss.backward()
        optimizer.update()
        #Log
        print(loss)
        print("Sum of conv1:", np.sum(sym.base_symbol['conv1'].W))
        print("Sum of fc:", np.sum(sym.fc.W))

But the weights for FC update but not for the base-model loaded from caffe:

variable(0.8340064) Sum of conv1: variable(-3.52687) Sum of fc: variable(-7.717535) variable(0.82841897) Sum of conv1: variable(-3.52687) Sum of fc: variable(-7.3210917) variable(0.8178877) Sum of conv1: variable(-3.52687) Sum of fc: variable(-6.757659) variable(0.80306095) Sum of conv1: variable(-3.52687) Sum of fc: variable(-6.046247) variable(0.784577) Sum of conv1: variable(-3.52687) Sum of fc: variable(-5.2045937)

Is CaffeFunction not trainable?

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
beam2dcommented, Mar 28, 2018

We need to fix CaffeFunction to not keep unnecessary references to the intermediate variables so that memory is effectively reused during forward propagation. Deleting references after forward may partially solve the situation (esp. memory usage of backward), but it does not shrink the memory usage of forward propagation. You may also try #4301 to further reduce the memory usage during backward.

0reactions
ilkarmancommented, Jun 28, 2018

Sorry yes!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Deep-Learning-with-Caffe/How to train in Caffe.md at master
How to train your own network in Caffe. The main files, apart from the dataset, required to train your network are the model...
Read more >
How to train a caffe model? - python 2.7 - Stack Overflow
I have written a simple example to train a Caffe model on the Iris data set in Python. It also gives the predicted...
Read more >
Caffe Tutorial
Torch vs Caffe vs TensorFlow? • Torch has more functionality built-in (more variety of layers etc.) and is in general more flexible.
Read more >
Deep learning tutorial on Caffe technology : basic commands ...
First install Caffe following my tutorials on Ubuntu or Mac OS with Python layers activated and pycaffe path correctly set export PYTHONPATH=~/ ...
Read more >
caffe Namespace Reference
A layer factory that allows one to register layers. During runtime, registered layers can be called by passing a LayerParameter protobuffer to the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found