Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

NotImplementedError Using BucketIterator

See original GitHub issue

import numpy as np
import spacy
import random
import torch
import torch.nn as nn
import torch.optim as optim
from torchtext.datasets import Multi30k
from torchtext.legacy.data import Field, BucketIterator

train_data, valid_data, test_data = Multi30k()

batch_size = 64
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

train_iterator = BucketIterator(
    train_data, 
    batch_size=batch_size,
    sort_within_batch=True,
    sort_key=lambda x: len(x.src),
    device=device,
)

When I tried to iterate on train_iterator using

for data in train_iterator:
    print(data)

It gives me NotImplementedError like this,

 NotImplementedError                       Traceback (most recent call last)
<ipython-input-18-7f4717f8ef1c> in <module>
----> 1 for i in train_iterator:
      2     print(i)

~\anaconda3\envs\dl\lib\site-packages\torchtext\legacy\data\iterator.py in __iter__(self)
    143     def __iter__(self):
    144         while True:
--> 145             self.init_epoch()
    146             for idx, minibatch in enumerate(self.batches):
    147                 # fast-forward if loaded from state

~\anaconda3\envs\dl\lib\site-packages\torchtext\legacy\data\iterator.py in init_epoch(self)
    119             self._random_state_this_epoch = self.random_shuffler.random_state
    120 
--> 121         self.create_batches()
    122 
    123         if self._restored_from_state:

~\anaconda3\envs\dl\lib\site-packages\torchtext\legacy\data\iterator.py in create_batches(self)
    250                                  self.batch_size_fn)
    251         else:
--> 252             self.batches = pool(self.data(), self.batch_size,
    253                                 self.sort_key, self.batch_size_fn,
    254                                 random_shuffler=self.random_shuffler,

~\anaconda3\envs\dl\lib\site-packages\torchtext\legacy\data\iterator.py in data(self)
    106             xs = sorted(self.dataset, key=self.sort_key)
    107         elif self.shuffle:
--> 108             xs = [self.dataset[i] for i in self.random_shuffler(range(len(self.dataset)))]
    109         else:
    110             xs = self.dataset

~\anaconda3\envs\dl\lib\site-packages\torchtext\legacy\data\iterator.py in <listcomp>(.0)
    106             xs = sorted(self.dataset, key=self.sort_key)
    107         elif self.shuffle:
--> 108             xs = [self.dataset[i] for i in self.random_shuffler(range(len(self.dataset)))]
    109         else:
    110             xs = self.dataset

~\anaconda3\envs\dl\lib\site-packages\torch\utils\data\dataset.py in __getitem__(self, index)
     32 
     33     def __getitem__(self, index) -> T_co:
---> 34         raise NotImplementedError
     35 
     36     def __add__(self, other: 'Dataset[T_co]') -> 'ConcatDataset[T_co]':

NotImplementedError:

Please help me to find out whats going wrong in this particular case

Thanks,

Issue Analytics

State:
Created 2 years ago
Comments:6 (5 by maintainers)

Top GitHub Comments

2reactions

parmeetcommented, Sep 9, 2021

BucketIterator won’t work with new datasets. You may need to use legacy datasets.

Please note that legacy code is no longer maintained and will be removed in up-coming releases. You may refer to migration tutorial to help you move from legacy code-base.

1reaction

hudevencommented, Sep 8, 2021

@parmeet could you help to take a look?

Top Results From Across the Web

NotImplementedError when trying to iterate a dataloader

I am trying to create a custom IterableDataset in pytorch and split it into train, validation and test datasets using this answer ...

Source code for torchtext.data.iterator

If self.sort is True and this is False, the batch is left in the original ... is not None: raise NotImplementedError return math.ceil(len(self.dataset) ......

PyTorchText BucketIterator - George Mihaila

The purpose is to use an example text datasets and batch it using PyTorchText with BucketIterator and show how it groups text sequences...

Source code for torchtext.datasets.nli - Read the Docs

Field): """ Field for parsed sentences data in NLI datasets. ... BucketIterator.splits( (train, val, test), batch_size=batch_size, device=device).

torchtext.data.BucketIterator Example

BucketIterator ` object to iterate over the partition that was specified """ if dataset not in ["test", "train", "valid"]: raise NotImplementedError if ...