__getitem()__ not implemented?
See original GitHub issue❓ Questions and Help
Description
For some reason, calling getitem() on the Torchtext Multi30k dataset returns a NotImplementedError for me, despite the dataset being properly downloaded and calling next(iter()) on it providing valid output. Can someone help me understand this? I need the method as I’m wrapping the dataset in a larger dataset class and will have to call getitem() explicity to perform joint pre-processing with other dataset products.
Sample
m30k = torchtext.datasets.Multi30k(root='.\Data', split='test', language_pair=('en', 'de')) ; m30k.__getitem__(0)
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (4 by maintainers)
Top Results From Across the Web
How to solve error "Operator 'getitem' is not supported on this ...
case takes a list of when criteria as its first argument, so you need to bracket accordingly: sa.case([(Model.column == value, then_output)]).
Read more >Classes that implement getitem are reported as incompatible ...
Bug Report Mypy reports erroneously that a class that implements getitem is not suitable to be used with enumerate.
Read more >I got NotImplementedError when trying to create __getitem ...
I'm trying to solve Cat VS Dog classification problem using pytorch. So I started by creating a DataSet class using the following code:...
Read more >HD58847: PROBLEM WITH GETITEM, IMPLEMENTED ... - IBM
Problem with GetItem, implemented several times (and not well implemented for the VBExtensions) on a GSMGeom . Local fix. Problem summary.
Read more >Operator 'getitem' is not supported on this expression ... - Reddit
SQLAlchemy 'case' won't work - NotImplementedError: Operator 'getitem' is not supported on this expression ... It's for a pilot logbook programme.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Batch size can certainly be element dependent in NLP cases where you may want to form batches based on the length of examples (like max-token post-pad batching).
Some datasets in torchtext are modestly sized, but others (like CC100 soon) are significantly larger and iterable-style is the only way to realistically consume them. Additionally, datapipes in the pytorch ecosystem prefer iterable-style which enables slightly cleaner and intent-revealing semantics at the dataset level (vs. at the loader level).
Can I understand the reasoning behind implementing torchtext datasets as iterable-style instead of map-style? Many significantly larger image datasets (such as Imagenet and CIFAR-10) are implemented as iterable-style in torchvision (indeed loading the entire dataset into memory is not a requirement of the iterable-style anyway), and I’m not sure why batch size would be element dependent in this case. Those are really the only two cases where it seems convention denotes an iterable-style dataset be used.