Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Training loss and Testing accuracy

See original GitHub issue

Hello, when I try to reproduce the results in your paper or try a modified model by a different learning rate schedule, for example, start from base_lr=0.1, go on training until the training loss no longer decreases. I found that while the train loss is in a right, decreasing manner, the testing loss however, blows up and testing accuracy decreases. I think the model is not overfitting because the training loss is around 1. A example log file, epoch 5 and 10 is common, but the testing accuracy and loss of epoch 15 is not in normal range.

[ Sun Apr  8 21:42:15 2018 ] Training epoch: 1
[ Sun Apr  8 21:42:29 2018 ] 	Batch(0/589) done. Loss: 5.3065  lr:0.100000
[ Sun Apr  8 21:44:10 2018 ] 	Batch(100/589) done. Loss: 3.5391  lr:0.100000
[ Sun Apr  8 21:45:52 2018 ] 	Batch(200/589) done. Loss: 3.4271  lr:0.100000
[ Sun Apr  8 21:47:34 2018 ] 	Batch(300/589) done. Loss: 3.1172  lr:0.100000
[ Sun Apr  8 21:49:16 2018 ] 	Batch(400/589) done. Loss: 2.8495  lr:0.100000
[ Sun Apr  8 21:50:58 2018 ] 	Batch(500/589) done. Loss: 2.9964  lr:0.100000
[ Sun Apr  8 21:52:27 2018 ] 	Mean training loss: 3.3161.
[ Sun Apr  8 21:52:27 2018 ] 	Time consumption: [Data]01%, [Network]99%
[ Sun Apr  8 21:52:27 2018 ] Training epoch: 2
[ Sun Apr  8 21:52:41 2018 ] 	Batch(0/589) done. Loss: 3.0124  lr:0.100000
[ Sun Apr  8 21:54:22 2018 ] 	Batch(100/589) done. Loss: 2.9106  lr:0.100000
[ Sun Apr  8 21:56:03 2018 ] 	Batch(200/589) done. Loss: 2.4281  lr:0.100000
[ Sun Apr  8 21:57:44 2018 ] 	Batch(300/589) done. Loss: 2.3935  lr:0.100000
[ Sun Apr  8 21:59:26 2018 ] 	Batch(400/589) done. Loss: 2.3242  lr:0.100000
[ Sun Apr  8 22:01:08 2018 ] 	Batch(500/589) done. Loss: 2.2797  lr:0.100000
[ Sun Apr  8 22:02:36 2018 ] 	Mean training loss: 2.4595.
[ Sun Apr  8 22:02:36 2018 ] 	Time consumption: [Data]03%, [Network]97%
[ Sun Apr  8 22:02:36 2018 ] Training epoch: 3
[ Sun Apr  8 22:02:50 2018 ] 	Batch(0/589) done. Loss: 2.0113  lr:0.100000
[ Sun Apr  8 22:04:31 2018 ] 	Batch(100/589) done. Loss: 1.9469  lr:0.100000
[ Sun Apr  8 22:06:13 2018 ] 	Batch(200/589) done. Loss: 2.0902  lr:0.100000
[ Sun Apr  8 22:07:56 2018 ] 	Batch(300/589) done. Loss: 1.9241  lr:0.100000
[ Sun Apr  8 22:09:38 2018 ] 	Batch(400/589) done. Loss: 1.6968  lr:0.100000
[ Sun Apr  8 22:11:20 2018 ] 	Batch(500/589) done. Loss: 1.6265  lr:0.100000
[ Sun Apr  8 22:12:49 2018 ] 	Mean training loss: 1.8767.
[ Sun Apr  8 22:12:49 2018 ] 	Time consumption: [Data]03%, [Network]97%
[ Sun Apr  8 22:12:49 2018 ] Training epoch: 4
[ Sun Apr  8 22:13:03 2018 ] 	Batch(0/589) done. Loss: 1.5664  lr:0.100000
[ Sun Apr  8 22:14:45 2018 ] 	Batch(100/589) done. Loss: 1.2361  lr:0.100000
[ Sun Apr  8 22:16:27 2018 ] 	Batch(200/589) done. Loss: 1.9590  lr:0.100000
[ Sun Apr  8 22:18:08 2018 ] 	Batch(300/589) done. Loss: 1.4472  lr:0.100000
[ Sun Apr  8 22:19:50 2018 ] 	Batch(400/589) done. Loss: 1.7926  lr:0.100000
[ Sun Apr  8 22:21:32 2018 ] 	Batch(500/589) done. Loss: 1.6678  lr:0.100000
[ Sun Apr  8 22:23:00 2018 ] 	Mean training loss: 1.5810.
[ Sun Apr  8 22:23:00 2018 ] 	Time consumption: [Data]03%, [Network]97%
[ Sun Apr  8 22:23:00 2018 ] Training epoch: 5
[ Sun Apr  8 22:23:14 2018 ] 	Batch(0/589) done. Loss: 1.3737  lr:0.100000
[ Sun Apr  8 22:24:55 2018 ] 	Batch(100/589) done. Loss: 1.6322  lr:0.100000
[ Sun Apr  8 22:26:37 2018 ] 	Batch(200/589) done. Loss: 1.2826  lr:0.100000
[ Sun Apr  8 22:28:20 2018 ] 	Batch(300/589) done. Loss: 1.7919  lr:0.100000
[ Sun Apr  8 22:30:02 2018 ] 	Batch(400/589) done. Loss: 1.5371  lr:0.100000
[ Sun Apr  8 22:31:42 2018 ] 	Batch(500/589) done. Loss: 1.3910  lr:0.100000
[ Sun Apr  8 22:33:11 2018 ] 	Mean training loss: 1.4091.
[ Sun Apr  8 22:33:11 2018 ] 	Time consumption: [Data]03%, [Network]97%
[ Sun Apr  8 22:33:11 2018 ] Eval epoch: 5
[ Sun Apr  8 22:35:05 2018 ] 	Mean test loss of 296 batches: 1.38761536334012.
[ Sun Apr  8 22:35:06 2018 ] 	Top1: 57.94%
[ Sun Apr  8 22:35:06 2018 ] 	Top5: 90.33%
[ Sun Apr  8 22:35:06 2018 ] Training epoch: 6
[ Sun Apr  8 22:35:19 2018 ] 	Batch(0/589) done. Loss: 1.4409  lr:0.100000
[ Sun Apr  8 22:37:00 2018 ] 	Batch(100/589) done. Loss: 1.3341  lr:0.100000
[ Sun Apr  8 22:38:42 2018 ] 	Batch(200/589) done. Loss: 1.0841  lr:0.100000
[ Sun Apr  8 22:40:23 2018 ] 	Batch(300/589) done. Loss: 1.2607  lr:0.100000
[ Sun Apr  8 22:42:05 2018 ] 	Batch(400/589) done. Loss: 1.3300  lr:0.100000
[ Sun Apr  8 22:43:46 2018 ] 	Batch(500/589) done. Loss: 1.1257  lr:0.100000
[ Sun Apr  8 22:45:15 2018 ] 	Mean training loss: 1.2766.
[ Sun Apr  8 22:45:15 2018 ] 	Time consumption: [Data]03%, [Network]97%
[ Sun Apr  8 22:45:15 2018 ] Training epoch: 7
[ Sun Apr  8 22:45:29 2018 ] 	Batch(0/589) done. Loss: 1.4653  lr:0.100000
[ Sun Apr  8 22:47:10 2018 ] 	Batch(100/589) done. Loss: 1.2261  lr:0.100000
[ Sun Apr  8 22:48:51 2018 ] 	Batch(200/589) done. Loss: 1.1842  lr:0.100000
[ Sun Apr  8 22:50:33 2018 ] 	Batch(300/589) done. Loss: 1.2471  lr:0.100000
[ Sun Apr  8 22:52:15 2018 ] 	Batch(400/589) done. Loss: 1.1583  lr:0.100000
[ Sun Apr  8 22:53:56 2018 ] 	Batch(500/589) done. Loss: 0.9828  lr:0.100000
[ Sun Apr  8 22:55:24 2018 ] 	Mean training loss: 1.1803.
[ Sun Apr  8 22:55:24 2018 ] 	Time consumption: [Data]03%, [Network]97%
[ Sun Apr  8 22:55:24 2018 ] Training epoch: 8
[ Sun Apr  8 22:55:39 2018 ] 	Batch(0/589) done. Loss: 1.0015  lr:0.100000
[ Sun Apr  8 22:57:20 2018 ] 	Batch(100/589) done. Loss: 1.0679  lr:0.100000
[ Sun Apr  8 22:59:02 2018 ] 	Batch(200/589) done. Loss: 1.2700  lr:0.100000
[ Sun Apr  8 23:00:43 2018 ] 	Batch(300/589) done. Loss: 1.0391  lr:0.100000
[ Sun Apr  8 23:02:24 2018 ] 	Batch(400/589) done. Loss: 0.8358  lr:0.100000
[ Sun Apr  8 23:04:06 2018 ] 	Batch(500/589) done. Loss: 0.7021  lr:0.100000
[ Sun Apr  8 23:05:34 2018 ] 	Mean training loss: 1.1058.
[ Sun Apr  8 23:05:34 2018 ] 	Time consumption: [Data]03%, [Network]97%
[ Sun Apr  8 23:05:34 2018 ] Training epoch: 9
[ Sun Apr  8 23:05:48 2018 ] 	Batch(0/589) done. Loss: 1.4356  lr:0.100000
[ Sun Apr  8 23:07:28 2018 ] 	Batch(100/589) done. Loss: 0.9781  lr:0.100000
[ Sun Apr  8 23:09:10 2018 ] 	Batch(200/589) done. Loss: 1.1352  lr:0.100000
[ Sun Apr  8 23:10:51 2018 ] 	Batch(300/589) done. Loss: 0.8561  lr:0.100000
[ Sun Apr  8 23:12:33 2018 ] 	Batch(400/589) done. Loss: 1.0276  lr:0.100000
[ Sun Apr  8 23:14:15 2018 ] 	Batch(500/589) done. Loss: 1.3473  lr:0.100000
[ Sun Apr  8 23:15:43 2018 ] 	Mean training loss: 1.0431.
[ Sun Apr  8 23:15:43 2018 ] 	Time consumption: [Data]03%, [Network]97%
[ Sun Apr  8 23:15:43 2018 ] Training epoch: 10
[ Sun Apr  8 23:15:57 2018 ] 	Batch(0/589) done. Loss: 1.2543  lr:0.100000
[ Sun Apr  8 23:17:39 2018 ] 	Batch(100/589) done. Loss: 0.8085  lr:0.100000
[ Sun Apr  8 23:19:20 2018 ] 	Batch(200/589) done. Loss: 1.0412  lr:0.100000
[ Sun Apr  8 23:21:02 2018 ] 	Batch(300/589) done. Loss: 0.9332  lr:0.100000
[ Sun Apr  8 23:22:43 2018 ] 	Batch(400/589) done. Loss: 1.0560  lr:0.100000
[ Sun Apr  8 23:24:25 2018 ] 	Batch(500/589) done. Loss: 0.9087  lr:0.100000
[ Sun Apr  8 23:25:53 2018 ] 	Mean training loss: 0.9881.
[ Sun Apr  8 23:25:53 2018 ] 	Time consumption: [Data]03%, [Network]97%
[ Sun Apr  8 23:25:54 2018 ] Eval epoch: 10
[ Sun Apr  8 23:27:47 2018 ] 	Mean test loss of 296 batches: 1.0697445980197675.
[ Sun Apr  8 23:27:48 2018 ] 	Top1: 68.29%
[ Sun Apr  8 23:27:48 2018 ] 	Top5: 94.53%
[ Sun Apr  8 23:27:48 2018 ] Training epoch: 11
[ Sun Apr  8 23:28:01 2018 ] 	Batch(0/589) done. Loss: 0.6880  lr:0.100000
[ Sun Apr  8 23:29:42 2018 ] 	Batch(100/589) done. Loss: 1.1329  lr:0.100000
[ Sun Apr  8 23:31:23 2018 ] 	Batch(200/589) done. Loss: 0.9698  lr:0.100000
[ Sun Apr  8 23:33:05 2018 ] 	Batch(300/589) done. Loss: 0.6172  lr:0.100000
[ Sun Apr  8 23:34:47 2018 ] 	Batch(400/589) done. Loss: 0.9810  lr:0.100000
[ Sun Apr  8 23:36:31 2018 ] 	Batch(500/589) done. Loss: 0.8487  lr:0.100000
[ Sun Apr  8 23:38:01 2018 ] 	Mean training loss: 0.9404.
[ Sun Apr  8 23:38:01 2018 ] 	Time consumption: [Data]03%, [Network]97%
[ Sun Apr  8 23:38:01 2018 ] Training epoch: 12
[ Sun Apr  8 23:38:15 2018 ] 	Batch(0/589) done. Loss: 0.8225  lr:0.100000
[ Sun Apr  8 23:39:57 2018 ] 	Batch(100/589) done. Loss: 0.9550  lr:0.100000
[ Sun Apr  8 23:41:40 2018 ] 	Batch(200/589) done. Loss: 0.9237  lr:0.100000
[ Sun Apr  8 23:43:23 2018 ] 	Batch(300/589) done. Loss: 0.7804  lr:0.100000
[ Sun Apr  8 23:45:06 2018 ] 	Batch(400/589) done. Loss: 0.7944  lr:0.100000
[ Sun Apr  8 23:46:51 2018 ] 	Batch(500/589) done. Loss: 0.6681  lr:0.100000
[ Sun Apr  8 23:48:20 2018 ] 	Mean training loss: 0.9031.
[ Sun Apr  8 23:48:20 2018 ] 	Time consumption: [Data]03%, [Network]97%
[ Sun Apr  8 23:48:20 2018 ] Training epoch: 13
[ Sun Apr  8 23:48:34 2018 ] 	Batch(0/589) done. Loss: 1.0019  lr:0.100000
[ Sun Apr  8 23:50:15 2018 ] 	Batch(100/589) done. Loss: 1.1436  lr:0.100000
[ Sun Apr  8 23:51:57 2018 ] 	Batch(200/589) done. Loss: 0.9631  lr:0.100000
[ Sun Apr  8 23:53:39 2018 ] 	Batch(300/589) done. Loss: 0.8120  lr:0.100000
[ Sun Apr  8 23:55:21 2018 ] 	Batch(400/589) done. Loss: 1.2053  lr:0.100000
[ Sun Apr  8 23:57:02 2018 ] 	Batch(500/589) done. Loss: 0.6185  lr:0.100000
[ Sun Apr  8 23:58:30 2018 ] 	Mean training loss: 0.8703.
[ Sun Apr  8 23:58:30 2018 ] 	Time consumption: [Data]03%, [Network]97%
[ Sun Apr  8 23:58:30 2018 ] Training epoch: 14
[ Sun Apr  8 23:58:44 2018 ] 	Batch(0/589) done. Loss: 0.7425  lr:0.100000
[ Mon Apr  9 00:00:25 2018 ] 	Batch(100/589) done. Loss: 0.8590  lr:0.100000
[ Mon Apr  9 00:02:07 2018 ] 	Batch(200/589) done. Loss: 0.7516  lr:0.100000
[ Mon Apr  9 00:03:49 2018 ] 	Batch(300/589) done. Loss: 0.8640  lr:0.100000
[ Mon Apr  9 00:05:30 2018 ] 	Batch(400/589) done. Loss: 0.6930  lr:0.100000
[ Mon Apr  9 00:07:11 2018 ] 	Batch(500/589) done. Loss: 0.9798  lr:0.100000
[ Mon Apr  9 00:08:40 2018 ] 	Mean training loss: 0.8336.
[ Mon Apr  9 00:08:40 2018 ] 	Time consumption: [Data]03%, [Network]97%
[ Mon Apr  9 00:08:40 2018 ] Training epoch: 15
[ Mon Apr  9 00:08:54 2018 ] 	Batch(0/589) done. Loss: 0.9048  lr:0.100000
[ Mon Apr  9 00:10:34 2018 ] 	Batch(100/589) done. Loss: 0.7716  lr:0.100000
[ Mon Apr  9 00:12:16 2018 ] 	Batch(200/589) done. Loss: 0.4784  lr:0.100000
[ Mon Apr  9 00:13:57 2018 ] 	Batch(300/589) done. Loss: 0.6179  lr:0.100000
[ Mon Apr  9 00:15:39 2018 ] 	Batch(400/589) done. Loss: 0.9232  lr:0.100000
[ Mon Apr  9 00:17:20 2018 ] 	Batch(500/589) done. Loss: 0.7198  lr:0.100000
[ Mon Apr  9 00:18:49 2018 ] 	Mean training loss: 0.7999.
[ Mon Apr  9 00:18:49 2018 ] 	Time consumption: [Data]03%, [Network]97%
[ Mon Apr  9 00:18:49 2018 ] Eval epoch: 15
[ Mon Apr  9 00:20:43 2018 ] 	Mean test loss of 296 batches: 6.906595945358276.
[ Mon Apr  9 00:20:44 2018 ] 	Top1: 22.58%
[ Mon Apr  9 00:20:44 2018 ] 	Top5: 46.53%

Issue Analytics

State:
Created 5 years ago
Comments:9

Top GitHub Comments

3reactions

zhujiagangcommented, Apr 20, 2018

I’ve recently tried a lot of data proprecessing techniques. I found that when a sample is subtracted by its frame center like https://github.com/hongsong-wang/RNN-for-skeletons/blob/7a90f8969ac00f24fd2578a7ea4e5b5b3bce6555/rnn_model.py#L76 or starting frame center like view adaptive LSTM, the training and testing become stable and the testing accuracy no longer decreases. I believe that normalizing the samples to the same spatial position is beneficial for the training and could bridge the gap between the different distributions of training and testing sets of NTU RGB-D and other datasets. However, the training of ST-GCN is stable without any data proprecessing techniques. A great model!

0reactions

ttthappycommented, Jan 10, 2019

@zhujiagang thank you for sharing your experiences. I want to train the st-gcn model in NTU RGB+D dataset. I use the old version code with default parameters. But, after more 80 epoch’s training I got poor result. Then I modify the learning rate according to your paper, but the accuracy is still poor（Top1: 1.67%, Top5: 8.27%）. As a new student in this field, I was puzzled if I missed some details. Hope your reply. Thanks a lot. [ Sat Dec 22 18:59:51 2018 ] Training epoch: 80 [ Sat Dec 22 19:00:03 2018 ] Batch(0/589) done. Loss: 4.1821 lr:0.001000 [ Sat Dec 22 19:03:31 2018 ] Batch(100/589) done. Loss: 4.1529 lr:0.001000 [ Sat Dec 22 19:06:57 2018 ] Batch(200/589) done. Loss: 4.1635 lr:0.001000 [ Sat Dec 22 19:10:27 2018 ] Batch(300/589) done. Loss: 4.1190 lr:0.001000 [ Sat Dec 22 19:13:55 2018 ] Batch(400/589) done. Loss: 4.1429 lr:0.001000 [ Sat Dec 22 19:17:22 2018 ] Batch(500/589) done. Loss: 4.1611 lr:0.001000 [ Sat Dec 22 19:20:24 2018 ] Mean training loss: 4.1409. [ Sat Dec 22 19:20:24 2018 ] Time consumption: [Data]01%, [Network]99% [ Sat Dec 22 19:20:24 2018 ] Eval epoch: 80 [ Sat Dec 22 19:24:07 2018 ] Mean test loss of 296 batches: 4.106204550008516. [ Sat Dec 22 19:24:08 2018 ] Top1: 1.26% [ Sat Dec 22 19:24:08 2018 ] Top5: 9.44%

modify the learning rate according to your paper:

[ Tue Dec 25 06:20:27 2018 ] Training epoch: 20 [ Tue Dec 25 06:20:48 2018 ] Batch(0/589) done. Loss: 4.5612 lr:0.001000 [ Tue Dec 25 06:25:36 2018 ] Batch(100/589) done. Loss: 4.8982 lr:0.001000 [ Tue Dec 25 06:30:21 2018 ] Batch(200/589) done. Loss: 4.5673 lr:0.001000 [ Tue Dec 25 06:35:04 2018 ] Batch(300/589) done. Loss: 4.6968 lr:0.001000 [ Tue Dec 25 06:39:46 2018 ] Batch(400/589) done. Loss: 4.5363 lr:0.001000 [ Tue Dec 25 06:44:30 2018 ] Batch(500/589) done. Loss: 4.4839 lr:0.001000 [ Tue Dec 25 06:48:41 2018 ] Mean training loss: 4.6989. [ Tue Dec 25 06:48:41 2018 ] Time consumption: [Data]01%, [Network]99% [ Tue Dec 25 06:48:41 2018 ] Eval epoch: 20 [ Tue Dec 25 06:53:54 2018 ] Mean test loss of 296 batches: 4.28557998747439. [ Tue Dec 25 06:53:54 2018 ] Top1: 1.52% [ Tue Dec 25 06:53:55 2018 ] Top5: 8.95%

Top Results From Across the Web

What is the relationship between the accuracy and the loss in ...

That's why loss is mostly used to debug your training. Accuracy, better represents the real world application and is much more interpretable. But,...

Interpretation of Loss and Accuracy for a Machine Learning ...

Loss and accuracy are essential values to take into account when training models. Let's take a closer look at their meaning.

How to interpret loss and accuracy for a machine learning model

The loss is calculated on training and validation and its interperation is how well the model is doing for these two sets. Unlike...

Accuracy and Loss - AI Wiki

Unlike accuracy, loss is not a percentage — it is a summation of the errors made for each sample in training or validation...

Accuracy vs Loss Conflict | Data Science and Machine Learning

By definition, Accuracy score is the number of correct predictions obtained. Loss values are the values indicating the difference from the desired target...