question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

How to train on part of the dataset

See original GitHub issue

❓ Questions/Help/Support

Hi @vfdev-5 ,

We have some requirements for very big dataset: See we have 100 iterations for 1 epoch, we want to call run() to train on the first 10 iterations data, do some other things for the model and then call run() to train on the second 10 iterations data, … Is it possible in ignite now? I found the the iter for dataloader in ignite is always from beginning? https://github.com/pytorch/ignite/blob/v0.4.2/ignite/engine/engine.py#L771

Thanks.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:17 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
vfdev-5commented, Oct 28, 2020

Hi @Nic-Ma ,

yes, you are right about that 👍 This can be an interesting approach. Thanks ! Currently, a sort of problem with calling multiple times trainer.run() is that it will trigger all the times events like STARTED, EPOCH_STARTED etc which may not what we expect. This is more or less conceptual problem of what the run is. This is something we are discussing with the team.

There is another approach (workaround) to keep control of the dataflow:


dataloader = ...

def cycle(dataloader):
        while True:
            for i in dataloader:
                yield i

dataloader_iter = cycle(dataloader)

@trainer.on(Events.ITERATION_STARTED)
def prepare_batch(engine):
    engine.state.batch = next(dataloader_iter)

trainer.add_event_handler(Events.ITERATION_COMPLETED(every=10), FLhandler)

data = list(range(len(dataloader) * num_epochs))
# round 1
trainer.run(data, max_epochs=num_epochs)
# FL sync
# ...
# round 2
trainer.run(data)
# FL sync
# ...
# round 3
trainer.run(data)
# etc
1reaction
Nic-Macommented, Oct 19, 2020

Cool, your trick seems very useful, let me make a demo to verify first. Thanks.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to split a Dataset into Train and Test Sets using Python
The simplest way to split the modelling dataset into training and testing sets is to assign 2/3 data points to the former and...
Read more >
How to Train to the Test Set in Machine Learning
Train to Test Set for Classification ; from sklearn.neighbors import KNeighborsClassifier. # load the dataset ; print(X.shape, y.shape). # split ...
Read more >
Training and Test Sets: Splitting Data | Machine Learning
Figure 1. Slicing a single data set into a training set and test set. · Figure 2. Validating the trained model against test...
Read more >
A guide through training dataset in Machine Learning
The training needs to be done with a clean, complete, and accurate set of input data. This learning set is supposed to teach...
Read more >
Split Your Dataset With scikit-learn's train_test_split()
The training set is applied to train, or fit, your model. For example, you use the training set to find the optimal weights,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found