question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Shuffling train ids

See original GitHub issue

I am a bit confused about the line: https://github.com/jiesutd/NCRFpp/blob/ab82b6868bb81bedbac3e231d8e09f7341321e6f/main.py#L393

Doesn’t shuffling the data throw away useful information about the sequence? For example, the data in sample_data consists of sequences of sentences which are delimited by document delimiter tokens (-DOCSTART-). If the sentences are shuffled the model cannot learn from sequential relationships between sentences.

I would imagine that for sentence classification tasks, shuffling in this way is probably worse still?

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
jiesutdcommented, Mar 12, 2019

This is a good question. The model only takes a single sentence into consideration. It will not use the information between sentences. So this shuffle does not affect the model.

0reactions
cpmss521commented, Nov 15, 2020

I’m confused about this question,sepecially in the sequence labeling task ,can you give me more hint?

Read more comments on GitHub >

github_iconTop Results From Across the Web

What is the advantage of shuffling data in train-test split?
It's not uncommon that real world data is sorted in some manner. For example it could be sorted by: user id; timestamp of...
Read more >
Random Shuffle Strategy To Split Your Full Dataset
In Python it is easy to split the FULL dataset into TRAIN and TEST datasets based on pure random sampling. You can download...
Read more >
Partially Shuffling the Training Data to Improve Language ...
Here we present a method that partially shuffles the training data between epochs. This method makes each batch random, while keeping most ...
Read more >
How to generate a train-test-split based on a group id?
I figured out the answer. This seems to work: from sklearn.model_selection import GroupShuffleSplit splitter ...
Read more >
Train Shuffling (@TrainShuffling) / Twitter
A podcast where two guys talk about (and sometimes live-stream) 18xx and cube rails games. New Hampshire, USA ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found