Shuffling train ids
See original GitHub issueI am a bit confused about the line: https://github.com/jiesutd/NCRFpp/blob/ab82b6868bb81bedbac3e231d8e09f7341321e6f/main.py#L393
Doesn’t shuffling the data throw away useful information about the sequence? For example, the data in sample_data consists of sequences of sentences which are delimited by document delimiter tokens (-DOCSTART-). If the sentences are shuffled the model cannot learn from sequential relationships between sentences.
I would imagine that for sentence classification tasks, shuffling in this way is probably worse still?
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (2 by maintainers)
Top Results From Across the Web
What is the advantage of shuffling data in train-test split?
It's not uncommon that real world data is sorted in some manner. For example it could be sorted by: user id; timestamp of...
Read more >Random Shuffle Strategy To Split Your Full Dataset
In Python it is easy to split the FULL dataset into TRAIN and TEST datasets based on pure random sampling. You can download...
Read more >Partially Shuffling the Training Data to Improve Language ...
Here we present a method that partially shuffles the training data between epochs. This method makes each batch random, while keeping most ...
Read more >How to generate a train-test-split based on a group id?
I figured out the answer. This seems to work: from sklearn.model_selection import GroupShuffleSplit splitter ...
Read more >Train Shuffling (@TrainShuffling) / Twitter
A podcast where two guys talk about (and sometimes live-stream) 18xx and cube rails games. New Hampshire, USA ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

This is a good question. The model only takes a single sentence into consideration. It will not use the information between sentences. So this shuffle does not affect the model.
I’m confused about this question,sepecially in the sequence labeling task ,can you give me more hint?