Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Getting started - basic network won't classify two distinct patterns

See original GitHub issue

Hello, I’m trying to get to a working starting point and work up from there so I set up a basic net. I then created a simple problem, two test cases each initialized with a different constant value. However after running through 1000 iterations of training on both patterns, when I forward the patterns back through the net, I get exactly the same classification probabilities for both patterns (increasing the number of iterations doesn’t help). Initially, I had a single class, but then changed it to two (thinking that might somehow help, but of course it didn’t). I’m sure I must be doing something very fundamentally wrong. If you would be willing to take a look at my code (below), that would be much appreciated. I’m sure it must be something very basic that’s wrong. Thanks!

Output

using System;
using ConvNetSharp.Core;
using ConvNetSharp.Core.Layers.Double;
using ConvNetSharp.Core.Training.Double;
using ConvNetSharp.Volume;
using ConvNetSharp.Volume.Double;

namespace ClusterNN
{
internal class Program
{
private static void Main()
{
var net = new Net();

        net.AddLayer(new InputLayer(7, 9, 6));

        // Convolution layer #1 2x2
        net.AddLayer(new ConvLayer(2, 2, 6) { Stride = 1 });
        // Result after applying filter is (6 x 8) x 6_filters = 288 elements

        // declare a ReLU (rectified linear unit non-linearity)
        net.AddLayer(new ReluLayer());

        // Convolution layer #2 2x2
        net.AddLayer(new ConvLayer(2, 2, 6) { Stride = 1 });
        // Result after applying filter is (5 x 7) x 6_filters = 210 elements

        // declare a ReLU (rectified linear unit non-linearity)
        net.AddLayer(new ReluLayer());

        // Fully connected layer #1.
        net.AddLayer(new FullyConnLayer(100));
        
        // Fully connected layer #2
        net.AddLayer(new FullyConnLayer(2));

        // declare the linear classifier on top of the previous hidden layer
        net.AddLayer(new SoftmaxLayer(2));

        // Construct a trainer for our network
        var trainer = new SgdTrainer(net) { LearningRate = 0.001, Momentum = 0.0, BatchSize = 1 };

        var evenInput = BuilderInstance.Volume.SameAs(new Shape(7, 9, 6));
        var oddInput = BuilderInstance.Volume.SameAs(new Shape(7, 9, 6));

        MakeEvenOddVolumes(evenInput, oddInput);

        var label0_2 = BuilderInstance.Volume.SameAs(new Shape(1, 1, 2));
        label0_2.Set(0, 0, 0, 0.0);
        label0_2.Set(0, 0, 1, 1.0);

        var label1_2 = BuilderInstance.Volume.SameAs(new Shape(1, 1, 2));
        label1_2.Set(0, 0, 0, 1.0);
        label1_2.Set(0, 0, 1, 0.0);

        for (int i = 0; i < 1000; i++)
        {
            Console.WriteLine("iteration: " + i);

            trainer.Train(evenInput, label0_2);
            trainer.Train(oddInput, label1_2);
        }

        // forward a data point through the network
        var vscore = net.Forward(evenInput);
        Console.WriteLine("probability that x is even: " + vscore.Get(0, 0, 0) + " " + vscore.Get(0, 0, 1));

        var vscore2 = net.Forward(oddInput);
        Console.WriteLine("probability that x is odd: " + vscore2.Get(0, 0, 0) + " " +  vscore2.Get(0, 0, 1));

    }

    private static void MakeEvenOddVolumes(Volume<double> evenVolume, Volume<double> oddVolume)
    {
        int counter = 0;
        for (int i = 0; i < 7; i++)
        {
            for (int j = 0; j < 9; j++)
            {
                for (int k = 0; k < 6; k++)
                {
                    evenVolume.Set(i, j, k, 100.0);
                    oddVolume.Set(i, j, k, 11);
                    counter++;
                }
            }
        }
    }
}

Issue Analytics

State:
Created 3 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

gschmidt958commented, Apr 30, 2020

Thank you so much for your awesome support, I think I’ve reached a point where I can make real progress now and now I’m really looking forward to that!

0reactions

cbovarcommented, Apr 30, 2020

It should work with Relu / LeakyRelu as well. Initially I noticed that the gradients of even sample where much bigger than odd sample so I thought of using a tanh or sigmoid layer to prevent big values flowing into the network.
If you want to produce a single output value, then you should use a RegressionLayer as last layer rather than SoftmaxLayer: