question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

__getitem()__ not implemented?

See original GitHub issue

❓ Questions and Help

Description

For some reason, calling getitem() on the Torchtext Multi30k dataset returns a NotImplementedError for me, despite the dataset being properly downloaded and calling next(iter()) on it providing valid output. Can someone help me understand this? I need the method as I’m wrapping the dataset in a larger dataset class and will have to call getitem() explicity to perform joint pre-processing with other dataset products.

Sample

m30k = torchtext.datasets.Multi30k(root='.\Data', split='test', language_pair=('en', 'de')) ; m30k.__getitem__(0)

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
eripcommented, Jan 25, 2022

Batch size can certainly be element dependent in NLP cases where you may want to form batches based on the length of examples (like max-token post-pad batching).

Some datasets in torchtext are modestly sized, but others (like CC100 soon) are significantly larger and iterable-style is the only way to realistically consume them. Additionally, datapipes in the pytorch ecosystem prefer iterable-style which enables slightly cleaner and intent-revealing semantics at the dataset level (vs. at the loader level).

0reactions
ShairozScommented, Jan 24, 2022

Can I understand the reasoning behind implementing torchtext datasets as iterable-style instead of map-style? Many significantly larger image datasets (such as Imagenet and CIFAR-10) are implemented as iterable-style in torchvision (indeed loading the entire dataset into memory is not a requirement of the iterable-style anyway), and I’m not sure why batch size would be element dependent in this case. Those are really the only two cases where it seems convention denotes an iterable-style dataset be used.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to solve error "Operator 'getitem' is not supported on this ...
case takes a list of when criteria as its first argument, so you need to bracket accordingly: sa.case([(Model.column == value, then_output)]).
Read more >
Classes that implement getitem are reported as incompatible ...
Bug Report Mypy reports erroneously that a class that implements getitem is not suitable to be used with enumerate.
Read more >
I got NotImplementedError when trying to create __getitem ...
I'm trying to solve Cat VS Dog classification problem using pytorch. So I started by creating a DataSet class using the following code:...
Read more >
HD58847: PROBLEM WITH GETITEM, IMPLEMENTED ... - IBM
Problem with GetItem, implemented several times (and not well implemented for the VBExtensions) on a GSMGeom . Local fix. Problem summary.
Read more >
Operator 'getitem' is not supported on this expression ... - Reddit
SQLAlchemy 'case' won't work - NotImplementedError: Operator 'getitem' is not supported on this expression ... It's for a pilot logbook programme.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found