Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Shuffling train ids

See original GitHub issue

I am a bit confused about the line: https://github.com/jiesutd/NCRFpp/blob/ab82b6868bb81bedbac3e231d8e09f7341321e6f/main.py#L393

Doesn’t shuffling the data throw away useful information about the sequence? For example, the data in sample_data consists of sequences of sentences which are delimited by document delimiter tokens (-DOCSTART-). If the sentences are shuffled the model cannot learn from sequential relationships between sentences.

I would imagine that for sentence classification tasks, shuffling in this way is probably worse still?

Issue Analytics

State:
Created 5 years ago
Comments:6 (2 by maintainers)

Top GitHub Comments

1reaction

jiesutdcommented, Mar 12, 2019

This is a good question. The model only takes a single sentence into consideration. It will not use the information between sentences. So this shuffle does not affect the model.

0reactions

cpmss521commented, Nov 15, 2020

I’m confused about this question，sepecially in the sequence labeling task ，can you give me more hint?

Top Results From Across the Web

What is the advantage of shuffling data in train-test split?

It's not uncommon that real world data is sorted in some manner. For example it could be sorted by: user id; timestamp of...

Random Shuffle Strategy To Split Your Full Dataset

In Python it is easy to split the FULL dataset into TRAIN and TEST datasets based on pure random sampling. You can download...

Partially Shuffling the Training Data to Improve Language ...

Here we present a method that partially shuffles the training data between epochs. This method makes each batch random, while keeping most ...

How to generate a train-test-split based on a group id?

I figured out the answer. This seems to work: from sklearn.model_selection import GroupShuffleSplit splitter ...

Train Shuffling (@TrainShuffling) / Twitter

A podcast where two guys talk about (and sometimes live-stream) 18xx and cube rails games. New Hampshire, USA ...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Shuffling train ids

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Saver not working

RuntimeError: invalid argument 0: Tensors must have same number of dimensions: got 3 and 2