NotImplementedError Using BucketIterator
See original GitHub issueimport numpy as np
import spacy
import random
import torch
import torch.nn as nn
import torch.optim as optim
from torchtext.datasets import Multi30k
from torchtext.legacy.data import Field, BucketIterator
train_data, valid_data, test_data = Multi30k()
batch_size = 64
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
train_iterator = BucketIterator(
train_data,
batch_size=batch_size,
sort_within_batch=True,
sort_key=lambda x: len(x.src),
device=device,
)
When I tried to iterate on train_iterator using
for data in train_iterator:
print(data)
It gives me NotImplementedError like this,
NotImplementedError Traceback (most recent call last)
<ipython-input-18-7f4717f8ef1c> in <module>
----> 1 for i in train_iterator:
2 print(i)
~\anaconda3\envs\dl\lib\site-packages\torchtext\legacy\data\iterator.py in __iter__(self)
143 def __iter__(self):
144 while True:
--> 145 self.init_epoch()
146 for idx, minibatch in enumerate(self.batches):
147 # fast-forward if loaded from state
~\anaconda3\envs\dl\lib\site-packages\torchtext\legacy\data\iterator.py in init_epoch(self)
119 self._random_state_this_epoch = self.random_shuffler.random_state
120
--> 121 self.create_batches()
122
123 if self._restored_from_state:
~\anaconda3\envs\dl\lib\site-packages\torchtext\legacy\data\iterator.py in create_batches(self)
250 self.batch_size_fn)
251 else:
--> 252 self.batches = pool(self.data(), self.batch_size,
253 self.sort_key, self.batch_size_fn,
254 random_shuffler=self.random_shuffler,
~\anaconda3\envs\dl\lib\site-packages\torchtext\legacy\data\iterator.py in data(self)
106 xs = sorted(self.dataset, key=self.sort_key)
107 elif self.shuffle:
--> 108 xs = [self.dataset[i] for i in self.random_shuffler(range(len(self.dataset)))]
109 else:
110 xs = self.dataset
~\anaconda3\envs\dl\lib\site-packages\torchtext\legacy\data\iterator.py in <listcomp>(.0)
106 xs = sorted(self.dataset, key=self.sort_key)
107 elif self.shuffle:
--> 108 xs = [self.dataset[i] for i in self.random_shuffler(range(len(self.dataset)))]
109 else:
110 xs = self.dataset
~\anaconda3\envs\dl\lib\site-packages\torch\utils\data\dataset.py in __getitem__(self, index)
32
33 def __getitem__(self, index) -> T_co:
---> 34 raise NotImplementedError
35
36 def __add__(self, other: 'Dataset[T_co]') -> 'ConcatDataset[T_co]':
NotImplementedError:
Please help me to find out whats going wrong in this particular case
Thanks,
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (5 by maintainers)
Top Results From Across the Web
NotImplementedError when trying to iterate a dataloader
I am trying to create a custom IterableDataset in pytorch and split it into train, validation and test datasets using this answer ...
Read more >Source code for torchtext.data.iterator
If self.sort is True and this is False, the batch is left in the original ... is not None: raise NotImplementedError return math.ceil(len(self.dataset) ......
Read more >PyTorchText BucketIterator - George Mihaila
The purpose is to use an example text datasets and batch it using PyTorchText with BucketIterator and show how it groups text sequences...
Read more >Source code for torchtext.datasets.nli - Read the Docs
Field): """ Field for parsed sentences data in NLI datasets. ... BucketIterator.splits( (train, val, test), batch_size=batch_size, device=device).
Read more >torchtext.data.BucketIterator Example
BucketIterator ` object to iterate over the partition that was specified """ if dataset not in ["test", "train", "valid"]: raise NotImplementedError if ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
BucketIterator won’t work with new datasets. You may need to use legacy datasets.
Please note that legacy code is no longer maintained and will be removed in up-coming releases. You may refer to migration tutorial to help you move from legacy code-base.
@parmeet could you help to take a look?