question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Unified Validation Scheme

See original GitHub issue

In my opinion this package still needs a unified way to evaluate DL models.

Background As everyone knows, there are usually 3 different sets: training, validation, testing. One trains on the training set, validates and tunes the model (i.e. the Hyperparameters) on the validation set and finally evaluates on the unseen test set.

As it is arguably the most popular BCI dataset I will make examples regarding the BCI competition IV 2a dataset. As a starting point I want to discuss WithinSubject Validation. Regarding the BCI IV 2a dataset the train-test split is quite obvious: session_T for training and session_E for testing. The problems come with the validation set.

Examples

  1. MOABB: Here the dataset is splitted into session_T and session_E and the classifier gets trained on session_T and validated on session_E. There exists no test set (the validation set is the test set). This will lead to better results as the “test set” is used to tune the model (Hyperparamters, EarlyStopping, etc.). The final model therefore benefits from the test data during training and the final “test” result is positively biased. Further there is a 2-fold cross validation (same training is repeated with interchanged sessions).

  2. braindecode Example: same as in 1. without the 2-fold cross-validation.

  3. Schirrmeister et al. 2017 Appendix: Split data into train (session_T) and test (session_E). 2 training phases: a. train set is splitted into train and validation set (probably a random split?). The model is trained on the train split and the hyperparameters are tuned via the validation set. The final training loss of the best model is saved. b. The best model is trained with train and validation set (complete session_T) until the training loss reaches the previous best training loss to prevent overfitting. All results are then obtained via the unseen test set (session_E).

  4. “Conventional” approach: same as in 3. but without the second training phase.

Opinion In my opinion either method 3 or 4 should be used to get reasonable results. The braindecode/BCI community would really benefit from a unified implementation of one of these (or both) methods. As method 3 adds additional complexity to method 4 it would be interesting to know, how big the performance boost of method 3 (over method 4) is (@robintibor: is it worth it?).

What is your opinion on this topic?

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:1
  • Comments:13 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
bruAristimunhacommented, Aug 1, 2022

Hi @martinwimpff, I agree! If you need any help getting started with PR, please let me know!

1reaction
martinwimpffcommented, Aug 1, 2022

Thanks for the valuable input! So to wrap this up (and to make it as simple & fast as possible):

There should be a clear separation between “normal” train_test and HP tuning. As @agramfort stated out, EarlyStopping/the number of epochs is not a big issue so we should just use a fixed number of epochs to keep everything simple.

So for “normal” training/for the final evaluation:

  • train_test: simple division into session_T (train) and session_E (test), no validation set. This procedure should only be used for the final evaluation.

For HP tuning:

  • k-Fold CrossValidation: only use session_T, split into k folds (not shuffled in time), k-1 splits for train and 1 split for validation. Search (Grid, Random, Bayes, whatever) for the best HP configuration by using the average over the k folds
  • fast HP tuning: same as above, but just use the first split of the k-Fold CV to speed up the tuning process. This method should be used over k-Fold CV if either a) training duration is long or b) HP search space is very large (i.e. preliminary experiments)

The best HP configuration can then be evaluated by the train_test procedure above.

These are options 1. and 3. from above but splitted into 3 separate procedures. @bruAristimunha: do you agree?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Veeva Vault Validation Management
Manage and execute paperless validation in a unified solution. Veeva Announces Vault Validation Management to Streamline Test Execution Across the Enterprise.
Read more >
Unified Framework and Survey for Model Verification ...
Simulation must be accompanied by model verification, validation and uncertainty quantification (VV&UQ) activities to assess the inherent ...
Read more >
Toward a Unified Validation Framework in Mixed Methods ...
The primary purpose of this article is to further discussions of validity in mixed methods research by introducing a validation framework to ...
Read more >
Vulkan Unified Validation Layer - LunarG
Early in the development of the Vulkan validation layers, there was A Layer For Everything -- nearly a dozen individual validation layers.
Read more >
Unified Patent Court (UPC) – Unitary Patent and validation ...
A compensation scheme has been implemented for SMEs with their principal place of business in the EU which will pay €500 towards to...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found