question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

NotImplementedError Using BucketIterator

See original GitHub issue
import numpy as np
import spacy
import random
import torch
import torch.nn as nn
import torch.optim as optim
from torchtext.datasets import Multi30k
from torchtext.legacy.data import Field, BucketIterator 
train_data, valid_data, test_data = Multi30k()
batch_size = 64
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
train_iterator = BucketIterator(
    train_data, 
    batch_size=batch_size,
    sort_within_batch=True,
    sort_key=lambda x: len(x.src),
    device=device,
)

When I tried to iterate on train_iterator using

for data in train_iterator:
    print(data)

It gives me NotImplementedError like this,

 NotImplementedError                       Traceback (most recent call last)
<ipython-input-18-7f4717f8ef1c> in <module>
----> 1 for i in train_iterator:
      2     print(i)

~\anaconda3\envs\dl\lib\site-packages\torchtext\legacy\data\iterator.py in __iter__(self)
    143     def __iter__(self):
    144         while True:
--> 145             self.init_epoch()
    146             for idx, minibatch in enumerate(self.batches):
    147                 # fast-forward if loaded from state

~\anaconda3\envs\dl\lib\site-packages\torchtext\legacy\data\iterator.py in init_epoch(self)
    119             self._random_state_this_epoch = self.random_shuffler.random_state
    120 
--> 121         self.create_batches()
    122 
    123         if self._restored_from_state:

~\anaconda3\envs\dl\lib\site-packages\torchtext\legacy\data\iterator.py in create_batches(self)
    250                                  self.batch_size_fn)
    251         else:
--> 252             self.batches = pool(self.data(), self.batch_size,
    253                                 self.sort_key, self.batch_size_fn,
    254                                 random_shuffler=self.random_shuffler,

~\anaconda3\envs\dl\lib\site-packages\torchtext\legacy\data\iterator.py in data(self)
    106             xs = sorted(self.dataset, key=self.sort_key)
    107         elif self.shuffle:
--> 108             xs = [self.dataset[i] for i in self.random_shuffler(range(len(self.dataset)))]
    109         else:
    110             xs = self.dataset

~\anaconda3\envs\dl\lib\site-packages\torchtext\legacy\data\iterator.py in <listcomp>(.0)
    106             xs = sorted(self.dataset, key=self.sort_key)
    107         elif self.shuffle:
--> 108             xs = [self.dataset[i] for i in self.random_shuffler(range(len(self.dataset)))]
    109         else:
    110             xs = self.dataset

~\anaconda3\envs\dl\lib\site-packages\torch\utils\data\dataset.py in __getitem__(self, index)
     32 
     33     def __getitem__(self, index) -> T_co:
---> 34         raise NotImplementedError
     35 
     36     def __add__(self, other: 'Dataset[T_co]') -> 'ConcatDataset[T_co]':

NotImplementedError: 

Please help me to find out whats going wrong in this particular case

Thanks,

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
parmeetcommented, Sep 9, 2021

BucketIterator won’t work with new datasets. You may need to use legacy datasets.

Please note that legacy code is no longer maintained and will be removed in up-coming releases. You may refer to migration tutorial to help you move from legacy code-base.

1reaction
hudevencommented, Sep 8, 2021

@parmeet could you help to take a look?

Read more comments on GitHub >

github_iconTop Results From Across the Web

NotImplementedError when trying to iterate a dataloader
I am trying to create a custom IterableDataset in pytorch and split it into train, validation and test datasets using this answer ...
Read more >
Source code for torchtext.data.iterator
If self.sort is True and this is False, the batch is left in the original ... is not None: raise NotImplementedError return math.ceil(len(self.dataset) ......
Read more >
PyTorchText BucketIterator - George Mihaila
The purpose is to use an example text datasets and batch it using PyTorchText with BucketIterator and show how it groups text sequences...
Read more >
Source code for torchtext.datasets.nli - Read the Docs
Field): """ Field for parsed sentences data in NLI datasets. ... BucketIterator.splits( (train, val, test), batch_size=batch_size, device=device).
Read more >
torchtext.data.BucketIterator Example
BucketIterator ` object to iterate over the partition that was specified """ if dataset not in ["test", "train", "valid"]: raise NotImplementedError if ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found