Training loss and Testing accuracy
See original GitHub issueHello, when I try to reproduce the results in your paper or try a modified model by a different learning rate schedule, for example, start from base_lr=0.1, go on training until the training loss no longer decreases. I found that while the train loss is in a right, decreasing manner, the testing loss however, blows up and testing accuracy decreases. I think the model is not overfitting because the training loss is around 1. A example log file, epoch 5 and 10 is common, but the testing accuracy and loss of epoch 15 is not in normal range.
[ Sun Apr 8 21:42:15 2018 ] Training epoch: 1
[ Sun Apr 8 21:42:29 2018 ] Batch(0/589) done. Loss: 5.3065 lr:0.100000
[ Sun Apr 8 21:44:10 2018 ] Batch(100/589) done. Loss: 3.5391 lr:0.100000
[ Sun Apr 8 21:45:52 2018 ] Batch(200/589) done. Loss: 3.4271 lr:0.100000
[ Sun Apr 8 21:47:34 2018 ] Batch(300/589) done. Loss: 3.1172 lr:0.100000
[ Sun Apr 8 21:49:16 2018 ] Batch(400/589) done. Loss: 2.8495 lr:0.100000
[ Sun Apr 8 21:50:58 2018 ] Batch(500/589) done. Loss: 2.9964 lr:0.100000
[ Sun Apr 8 21:52:27 2018 ] Mean training loss: 3.3161.
[ Sun Apr 8 21:52:27 2018 ] Time consumption: [Data]01%, [Network]99%
[ Sun Apr 8 21:52:27 2018 ] Training epoch: 2
[ Sun Apr 8 21:52:41 2018 ] Batch(0/589) done. Loss: 3.0124 lr:0.100000
[ Sun Apr 8 21:54:22 2018 ] Batch(100/589) done. Loss: 2.9106 lr:0.100000
[ Sun Apr 8 21:56:03 2018 ] Batch(200/589) done. Loss: 2.4281 lr:0.100000
[ Sun Apr 8 21:57:44 2018 ] Batch(300/589) done. Loss: 2.3935 lr:0.100000
[ Sun Apr 8 21:59:26 2018 ] Batch(400/589) done. Loss: 2.3242 lr:0.100000
[ Sun Apr 8 22:01:08 2018 ] Batch(500/589) done. Loss: 2.2797 lr:0.100000
[ Sun Apr 8 22:02:36 2018 ] Mean training loss: 2.4595.
[ Sun Apr 8 22:02:36 2018 ] Time consumption: [Data]03%, [Network]97%
[ Sun Apr 8 22:02:36 2018 ] Training epoch: 3
[ Sun Apr 8 22:02:50 2018 ] Batch(0/589) done. Loss: 2.0113 lr:0.100000
[ Sun Apr 8 22:04:31 2018 ] Batch(100/589) done. Loss: 1.9469 lr:0.100000
[ Sun Apr 8 22:06:13 2018 ] Batch(200/589) done. Loss: 2.0902 lr:0.100000
[ Sun Apr 8 22:07:56 2018 ] Batch(300/589) done. Loss: 1.9241 lr:0.100000
[ Sun Apr 8 22:09:38 2018 ] Batch(400/589) done. Loss: 1.6968 lr:0.100000
[ Sun Apr 8 22:11:20 2018 ] Batch(500/589) done. Loss: 1.6265 lr:0.100000
[ Sun Apr 8 22:12:49 2018 ] Mean training loss: 1.8767.
[ Sun Apr 8 22:12:49 2018 ] Time consumption: [Data]03%, [Network]97%
[ Sun Apr 8 22:12:49 2018 ] Training epoch: 4
[ Sun Apr 8 22:13:03 2018 ] Batch(0/589) done. Loss: 1.5664 lr:0.100000
[ Sun Apr 8 22:14:45 2018 ] Batch(100/589) done. Loss: 1.2361 lr:0.100000
[ Sun Apr 8 22:16:27 2018 ] Batch(200/589) done. Loss: 1.9590 lr:0.100000
[ Sun Apr 8 22:18:08 2018 ] Batch(300/589) done. Loss: 1.4472 lr:0.100000
[ Sun Apr 8 22:19:50 2018 ] Batch(400/589) done. Loss: 1.7926 lr:0.100000
[ Sun Apr 8 22:21:32 2018 ] Batch(500/589) done. Loss: 1.6678 lr:0.100000
[ Sun Apr 8 22:23:00 2018 ] Mean training loss: 1.5810.
[ Sun Apr 8 22:23:00 2018 ] Time consumption: [Data]03%, [Network]97%
[ Sun Apr 8 22:23:00 2018 ] Training epoch: 5
[ Sun Apr 8 22:23:14 2018 ] Batch(0/589) done. Loss: 1.3737 lr:0.100000
[ Sun Apr 8 22:24:55 2018 ] Batch(100/589) done. Loss: 1.6322 lr:0.100000
[ Sun Apr 8 22:26:37 2018 ] Batch(200/589) done. Loss: 1.2826 lr:0.100000
[ Sun Apr 8 22:28:20 2018 ] Batch(300/589) done. Loss: 1.7919 lr:0.100000
[ Sun Apr 8 22:30:02 2018 ] Batch(400/589) done. Loss: 1.5371 lr:0.100000
[ Sun Apr 8 22:31:42 2018 ] Batch(500/589) done. Loss: 1.3910 lr:0.100000
[ Sun Apr 8 22:33:11 2018 ] Mean training loss: 1.4091.
[ Sun Apr 8 22:33:11 2018 ] Time consumption: [Data]03%, [Network]97%
[ Sun Apr 8 22:33:11 2018 ] Eval epoch: 5
[ Sun Apr 8 22:35:05 2018 ] Mean test loss of 296 batches: 1.38761536334012.
[ Sun Apr 8 22:35:06 2018 ] Top1: 57.94%
[ Sun Apr 8 22:35:06 2018 ] Top5: 90.33%
[ Sun Apr 8 22:35:06 2018 ] Training epoch: 6
[ Sun Apr 8 22:35:19 2018 ] Batch(0/589) done. Loss: 1.4409 lr:0.100000
[ Sun Apr 8 22:37:00 2018 ] Batch(100/589) done. Loss: 1.3341 lr:0.100000
[ Sun Apr 8 22:38:42 2018 ] Batch(200/589) done. Loss: 1.0841 lr:0.100000
[ Sun Apr 8 22:40:23 2018 ] Batch(300/589) done. Loss: 1.2607 lr:0.100000
[ Sun Apr 8 22:42:05 2018 ] Batch(400/589) done. Loss: 1.3300 lr:0.100000
[ Sun Apr 8 22:43:46 2018 ] Batch(500/589) done. Loss: 1.1257 lr:0.100000
[ Sun Apr 8 22:45:15 2018 ] Mean training loss: 1.2766.
[ Sun Apr 8 22:45:15 2018 ] Time consumption: [Data]03%, [Network]97%
[ Sun Apr 8 22:45:15 2018 ] Training epoch: 7
[ Sun Apr 8 22:45:29 2018 ] Batch(0/589) done. Loss: 1.4653 lr:0.100000
[ Sun Apr 8 22:47:10 2018 ] Batch(100/589) done. Loss: 1.2261 lr:0.100000
[ Sun Apr 8 22:48:51 2018 ] Batch(200/589) done. Loss: 1.1842 lr:0.100000
[ Sun Apr 8 22:50:33 2018 ] Batch(300/589) done. Loss: 1.2471 lr:0.100000
[ Sun Apr 8 22:52:15 2018 ] Batch(400/589) done. Loss: 1.1583 lr:0.100000
[ Sun Apr 8 22:53:56 2018 ] Batch(500/589) done. Loss: 0.9828 lr:0.100000
[ Sun Apr 8 22:55:24 2018 ] Mean training loss: 1.1803.
[ Sun Apr 8 22:55:24 2018 ] Time consumption: [Data]03%, [Network]97%
[ Sun Apr 8 22:55:24 2018 ] Training epoch: 8
[ Sun Apr 8 22:55:39 2018 ] Batch(0/589) done. Loss: 1.0015 lr:0.100000
[ Sun Apr 8 22:57:20 2018 ] Batch(100/589) done. Loss: 1.0679 lr:0.100000
[ Sun Apr 8 22:59:02 2018 ] Batch(200/589) done. Loss: 1.2700 lr:0.100000
[ Sun Apr 8 23:00:43 2018 ] Batch(300/589) done. Loss: 1.0391 lr:0.100000
[ Sun Apr 8 23:02:24 2018 ] Batch(400/589) done. Loss: 0.8358 lr:0.100000
[ Sun Apr 8 23:04:06 2018 ] Batch(500/589) done. Loss: 0.7021 lr:0.100000
[ Sun Apr 8 23:05:34 2018 ] Mean training loss: 1.1058.
[ Sun Apr 8 23:05:34 2018 ] Time consumption: [Data]03%, [Network]97%
[ Sun Apr 8 23:05:34 2018 ] Training epoch: 9
[ Sun Apr 8 23:05:48 2018 ] Batch(0/589) done. Loss: 1.4356 lr:0.100000
[ Sun Apr 8 23:07:28 2018 ] Batch(100/589) done. Loss: 0.9781 lr:0.100000
[ Sun Apr 8 23:09:10 2018 ] Batch(200/589) done. Loss: 1.1352 lr:0.100000
[ Sun Apr 8 23:10:51 2018 ] Batch(300/589) done. Loss: 0.8561 lr:0.100000
[ Sun Apr 8 23:12:33 2018 ] Batch(400/589) done. Loss: 1.0276 lr:0.100000
[ Sun Apr 8 23:14:15 2018 ] Batch(500/589) done. Loss: 1.3473 lr:0.100000
[ Sun Apr 8 23:15:43 2018 ] Mean training loss: 1.0431.
[ Sun Apr 8 23:15:43 2018 ] Time consumption: [Data]03%, [Network]97%
[ Sun Apr 8 23:15:43 2018 ] Training epoch: 10
[ Sun Apr 8 23:15:57 2018 ] Batch(0/589) done. Loss: 1.2543 lr:0.100000
[ Sun Apr 8 23:17:39 2018 ] Batch(100/589) done. Loss: 0.8085 lr:0.100000
[ Sun Apr 8 23:19:20 2018 ] Batch(200/589) done. Loss: 1.0412 lr:0.100000
[ Sun Apr 8 23:21:02 2018 ] Batch(300/589) done. Loss: 0.9332 lr:0.100000
[ Sun Apr 8 23:22:43 2018 ] Batch(400/589) done. Loss: 1.0560 lr:0.100000
[ Sun Apr 8 23:24:25 2018 ] Batch(500/589) done. Loss: 0.9087 lr:0.100000
[ Sun Apr 8 23:25:53 2018 ] Mean training loss: 0.9881.
[ Sun Apr 8 23:25:53 2018 ] Time consumption: [Data]03%, [Network]97%
[ Sun Apr 8 23:25:54 2018 ] Eval epoch: 10
[ Sun Apr 8 23:27:47 2018 ] Mean test loss of 296 batches: 1.0697445980197675.
[ Sun Apr 8 23:27:48 2018 ] Top1: 68.29%
[ Sun Apr 8 23:27:48 2018 ] Top5: 94.53%
[ Sun Apr 8 23:27:48 2018 ] Training epoch: 11
[ Sun Apr 8 23:28:01 2018 ] Batch(0/589) done. Loss: 0.6880 lr:0.100000
[ Sun Apr 8 23:29:42 2018 ] Batch(100/589) done. Loss: 1.1329 lr:0.100000
[ Sun Apr 8 23:31:23 2018 ] Batch(200/589) done. Loss: 0.9698 lr:0.100000
[ Sun Apr 8 23:33:05 2018 ] Batch(300/589) done. Loss: 0.6172 lr:0.100000
[ Sun Apr 8 23:34:47 2018 ] Batch(400/589) done. Loss: 0.9810 lr:0.100000
[ Sun Apr 8 23:36:31 2018 ] Batch(500/589) done. Loss: 0.8487 lr:0.100000
[ Sun Apr 8 23:38:01 2018 ] Mean training loss: 0.9404.
[ Sun Apr 8 23:38:01 2018 ] Time consumption: [Data]03%, [Network]97%
[ Sun Apr 8 23:38:01 2018 ] Training epoch: 12
[ Sun Apr 8 23:38:15 2018 ] Batch(0/589) done. Loss: 0.8225 lr:0.100000
[ Sun Apr 8 23:39:57 2018 ] Batch(100/589) done. Loss: 0.9550 lr:0.100000
[ Sun Apr 8 23:41:40 2018 ] Batch(200/589) done. Loss: 0.9237 lr:0.100000
[ Sun Apr 8 23:43:23 2018 ] Batch(300/589) done. Loss: 0.7804 lr:0.100000
[ Sun Apr 8 23:45:06 2018 ] Batch(400/589) done. Loss: 0.7944 lr:0.100000
[ Sun Apr 8 23:46:51 2018 ] Batch(500/589) done. Loss: 0.6681 lr:0.100000
[ Sun Apr 8 23:48:20 2018 ] Mean training loss: 0.9031.
[ Sun Apr 8 23:48:20 2018 ] Time consumption: [Data]03%, [Network]97%
[ Sun Apr 8 23:48:20 2018 ] Training epoch: 13
[ Sun Apr 8 23:48:34 2018 ] Batch(0/589) done. Loss: 1.0019 lr:0.100000
[ Sun Apr 8 23:50:15 2018 ] Batch(100/589) done. Loss: 1.1436 lr:0.100000
[ Sun Apr 8 23:51:57 2018 ] Batch(200/589) done. Loss: 0.9631 lr:0.100000
[ Sun Apr 8 23:53:39 2018 ] Batch(300/589) done. Loss: 0.8120 lr:0.100000
[ Sun Apr 8 23:55:21 2018 ] Batch(400/589) done. Loss: 1.2053 lr:0.100000
[ Sun Apr 8 23:57:02 2018 ] Batch(500/589) done. Loss: 0.6185 lr:0.100000
[ Sun Apr 8 23:58:30 2018 ] Mean training loss: 0.8703.
[ Sun Apr 8 23:58:30 2018 ] Time consumption: [Data]03%, [Network]97%
[ Sun Apr 8 23:58:30 2018 ] Training epoch: 14
[ Sun Apr 8 23:58:44 2018 ] Batch(0/589) done. Loss: 0.7425 lr:0.100000
[ Mon Apr 9 00:00:25 2018 ] Batch(100/589) done. Loss: 0.8590 lr:0.100000
[ Mon Apr 9 00:02:07 2018 ] Batch(200/589) done. Loss: 0.7516 lr:0.100000
[ Mon Apr 9 00:03:49 2018 ] Batch(300/589) done. Loss: 0.8640 lr:0.100000
[ Mon Apr 9 00:05:30 2018 ] Batch(400/589) done. Loss: 0.6930 lr:0.100000
[ Mon Apr 9 00:07:11 2018 ] Batch(500/589) done. Loss: 0.9798 lr:0.100000
[ Mon Apr 9 00:08:40 2018 ] Mean training loss: 0.8336.
[ Mon Apr 9 00:08:40 2018 ] Time consumption: [Data]03%, [Network]97%
[ Mon Apr 9 00:08:40 2018 ] Training epoch: 15
[ Mon Apr 9 00:08:54 2018 ] Batch(0/589) done. Loss: 0.9048 lr:0.100000
[ Mon Apr 9 00:10:34 2018 ] Batch(100/589) done. Loss: 0.7716 lr:0.100000
[ Mon Apr 9 00:12:16 2018 ] Batch(200/589) done. Loss: 0.4784 lr:0.100000
[ Mon Apr 9 00:13:57 2018 ] Batch(300/589) done. Loss: 0.6179 lr:0.100000
[ Mon Apr 9 00:15:39 2018 ] Batch(400/589) done. Loss: 0.9232 lr:0.100000
[ Mon Apr 9 00:17:20 2018 ] Batch(500/589) done. Loss: 0.7198 lr:0.100000
[ Mon Apr 9 00:18:49 2018 ] Mean training loss: 0.7999.
[ Mon Apr 9 00:18:49 2018 ] Time consumption: [Data]03%, [Network]97%
[ Mon Apr 9 00:18:49 2018 ] Eval epoch: 15
[ Mon Apr 9 00:20:43 2018 ] Mean test loss of 296 batches: 6.906595945358276.
[ Mon Apr 9 00:20:44 2018 ] Top1: 22.58%
[ Mon Apr 9 00:20:44 2018 ] Top5: 46.53%
Issue Analytics
- State:
- Created 5 years ago
- Comments:9
Top Results From Across the Web
What is the relationship between the accuracy and the loss in ...
That's why loss is mostly used to debug your training. Accuracy, better represents the real world application and is much more interpretable. But,...
Read more >Interpretation of Loss and Accuracy for a Machine Learning ...
Loss and accuracy are essential values to take into account when training models. Let's take a closer look at their meaning.
Read more >How to interpret loss and accuracy for a machine learning model
The loss is calculated on training and validation and its interperation is how well the model is doing for these two sets. Unlike...
Read more >Accuracy and Loss - AI Wiki
Unlike accuracy, loss is not a percentage — it is a summation of the errors made for each sample in training or validation...
Read more >Accuracy vs Loss Conflict | Data Science and Machine Learning
By definition, Accuracy score is the number of correct predictions obtained. Loss values are the values indicating the difference from the desired target...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I’ve recently tried a lot of data proprecessing techniques. I found that when a sample is subtracted by its frame center like https://github.com/hongsong-wang/RNN-for-skeletons/blob/7a90f8969ac00f24fd2578a7ea4e5b5b3bce6555/rnn_model.py#L76 or starting frame center like view adaptive LSTM, the training and testing become stable and the testing accuracy no longer decreases. I believe that normalizing the samples to the same spatial position is beneficial for the training and could bridge the gap between the different distributions of training and testing sets of NTU RGB-D and other datasets. However, the training of ST-GCN is stable without any data proprecessing techniques. A great model!
@zhujiagang thank you for sharing your experiences. I want to train the st-gcn model in NTU RGB+D dataset. I use the old version code with default parameters. But, after more 80 epoch’s training I got poor result. Then I modify the learning rate according to your paper, but the accuracy is still poor(Top1: 1.67%, Top5: 8.27%). As a new student in this field, I was puzzled if I missed some details. Hope your reply. Thanks a lot. [ Sat Dec 22 18:59:51 2018 ] Training epoch: 80 [ Sat Dec 22 19:00:03 2018 ] Batch(0/589) done. Loss: 4.1821 lr:0.001000 [ Sat Dec 22 19:03:31 2018 ] Batch(100/589) done. Loss: 4.1529 lr:0.001000 [ Sat Dec 22 19:06:57 2018 ] Batch(200/589) done. Loss: 4.1635 lr:0.001000 [ Sat Dec 22 19:10:27 2018 ] Batch(300/589) done. Loss: 4.1190 lr:0.001000 [ Sat Dec 22 19:13:55 2018 ] Batch(400/589) done. Loss: 4.1429 lr:0.001000 [ Sat Dec 22 19:17:22 2018 ] Batch(500/589) done. Loss: 4.1611 lr:0.001000 [ Sat Dec 22 19:20:24 2018 ] Mean training loss: 4.1409. [ Sat Dec 22 19:20:24 2018 ] Time consumption: [Data]01%, [Network]99% [ Sat Dec 22 19:20:24 2018 ] Eval epoch: 80 [ Sat Dec 22 19:24:07 2018 ] Mean test loss of 296 batches: 4.106204550008516. [ Sat Dec 22 19:24:08 2018 ] Top1: 1.26% [ Sat Dec 22 19:24:08 2018 ] Top5: 9.44%
modify the learning rate according to your paper:
[ Tue Dec 25 06:20:27 2018 ] Training epoch: 20 [ Tue Dec 25 06:20:48 2018 ] Batch(0/589) done. Loss: 4.5612 lr:0.001000 [ Tue Dec 25 06:25:36 2018 ] Batch(100/589) done. Loss: 4.8982 lr:0.001000 [ Tue Dec 25 06:30:21 2018 ] Batch(200/589) done. Loss: 4.5673 lr:0.001000 [ Tue Dec 25 06:35:04 2018 ] Batch(300/589) done. Loss: 4.6968 lr:0.001000 [ Tue Dec 25 06:39:46 2018 ] Batch(400/589) done. Loss: 4.5363 lr:0.001000 [ Tue Dec 25 06:44:30 2018 ] Batch(500/589) done. Loss: 4.4839 lr:0.001000 [ Tue Dec 25 06:48:41 2018 ] Mean training loss: 4.6989. [ Tue Dec 25 06:48:41 2018 ] Time consumption: [Data]01%, [Network]99% [ Tue Dec 25 06:48:41 2018 ] Eval epoch: 20 [ Tue Dec 25 06:53:54 2018 ] Mean test loss of 296 batches: 4.28557998747439. [ Tue Dec 25 06:53:54 2018 ] Top1: 1.52% [ Tue Dec 25 06:53:55 2018 ] Top5: 8.95%