Cannot train caffe.CaffeFunction
See original GitHub issueHi I am using 4.0.0b3 to load caffe from shicai however I want to fine-tune the model like so:
def truncate_bn(sym):
# Need to truncate batchnorm - eps
for layer in list(sym._children):
if "bn" in layer:
if sym.__dict__[layer].eps < 1e-5:
sym.__dict__[layer].eps = 1e-5
# Load base model
base_symbol = caffe.CaffeFunction("DenseNet_121.caffemodel")
# Truncate
truncate_bn(base_symbol)
class DenseNet121(chainer.Chain):
# Class to wrap base (up to pool5 output)
def __init__(self, base_symbol, n_classes=16):
super(DenseNet121, self).__init__()
self.base_symbol = base_symbol
self.base_symbol.to_gpu()
with self.init_scope():
self.fc = L.Linear(1024, n_classes)
def __call__(self, x):
with chainer.using_config('train', True):
h = self.base_symbol(inputs={'data':cuda.to_gpu(x)}, outputs=['pool5'])[0]
return self.fc(h)
def init_model(m, lr=0.001, momentum=0.9):
optimizer = optimizers.MomentumSGD(lr, momentum)
optimizer.setup(m)
return optimizer
# Create symbol
chainer.cuda.get_device(0).use() # Make a specified GPU current
sym = DenseNet121(base_symbol = base_symbol)
sym.to_gpu() # Copy the model to the GPU
optimizer = init_model(sym)
# Random data
data = np.random.rand(32, 3, 224, 224).astype('float32')
target = np.ones((32, 16)).astype('int32')
# Try test-forward
with chainer.using_config('train', True), chainer.using_config('enable_backprop', True):
for _ in range(10):
# Data
data = cuda.to_gpu(data)
target = cuda.to_gpu(target)
# Forward pass
output = sym(data)
# Loss
loss = F.sigmoid_cross_entropy(output, target)
sym.cleargrads()
# Optimiser
loss.backward()
optimizer.update()
#Log
print(loss)
print("Sum of conv1:", np.sum(sym.base_symbol['conv1'].W))
print("Sum of fc:", np.sum(sym.fc.W))
But the weights for FC update but not for the base-model loaded from caffe:
variable(0.8340064) Sum of conv1: variable(-3.52687) Sum of fc: variable(-7.717535) variable(0.82841897) Sum of conv1: variable(-3.52687) Sum of fc: variable(-7.3210917) variable(0.8178877) Sum of conv1: variable(-3.52687) Sum of fc: variable(-6.757659) variable(0.80306095) Sum of conv1: variable(-3.52687) Sum of fc: variable(-6.046247) variable(0.784577) Sum of conv1: variable(-3.52687) Sum of fc: variable(-5.2045937)
Is CaffeFunction not trainable?
Issue Analytics
- State:
- Created 6 years ago
- Comments:8 (3 by maintainers)
Top Results From Across the Web
Deep-Learning-with-Caffe/How to train in Caffe.md at master
How to train your own network in Caffe. The main files, apart from the dataset, required to train your network are the model...
Read more >How to train a caffe model? - python 2.7 - Stack Overflow
I have written a simple example to train a Caffe model on the Iris data set in Python. It also gives the predicted...
Read more >Caffe Tutorial
Torch vs Caffe vs TensorFlow? • Torch has more functionality built-in (more variety of layers etc.) and is in general more flexible.
Read more >Deep learning tutorial on Caffe technology : basic commands ...
First install Caffe following my tutorials on Ubuntu or Mac OS with Python layers activated and pycaffe path correctly set export PYTHONPATH=~/ ...
Read more >caffe Namespace Reference
A layer factory that allows one to register layers. During runtime, registered layers can be called by passing a LayerParameter protobuffer to the...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
We need to fix CaffeFunction to not keep unnecessary references to the intermediate variables so that memory is effectively reused during forward propagation. Deleting references after forward may partially solve the situation (esp. memory usage of backward), but it does not shrink the memory usage of forward propagation. You may also try #4301 to further reduce the memory usage during backward.
Sorry yes!