question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Using a field representing real numbers with the iterator

See original GitHub issue

I am trying to learn a regressor on text data and I use torchtext in all my other tasks but I see a problem in using it for this use case.

I define the field for targets as follows:

TARGETS = data.Field(
            sequential=False, tensor_type=torch.DoubleTensor, batch_first=True)
self.fields = [('targets', TARGETS), ('text', TEXT)]
self.train, self.val, self.test = data.TabularDataset.splits(
            path=self.path,
            train=self.train_suffix,
            validation=self.val_suffix,
            test=self.test_suffix,
            format=formatting,
            fields=self.fields)
TEXT.build_vocab(self.train)

I have a file that contains tab separate <values>\t<text>

When I make iterators out of it,

train_iter, val_iter, test_iter = data.Iterator.splits(
                (self.train, self.val, self.test),
                batch_sizes=(self.batch_size, self.test_batch_size,
                             self.test_batch_size),
                sort_key=lambda x: len(x.text),
                shuffle=True)
print(next(iter(train_iter)))

it gives me an error when getting the next batch:

AttributeError: ‘Field’ object has no attribute ‘vocab’

I know this is because I didn’t run .build_vocab for the TARGETS field. But why do I really need to do this? What if I just want to get real numbers and compute losses on them?

Any workaround is appreciated. If I am doing something wrong, please let me know too.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:11 (3 by maintainers)

github_iconTop GitHub Comments

7reactions
nelson-liucommented, Jul 21, 2017

Thanks for the issue.

Torchtext needs to convert the string number to an int or float somewhere down the line and it currently doesn’t do this. A quick fix would be to manually add a pipeline to the postprocessing argument that converts everything in the TARGETS field to int. With a slightly modified version of your code:

Edit: just noticed that your example uses doubles. changed my code accordingly

(tab separated file)

$ cat test.txt
1.1   test string
1.2   test string2
1.3   test string3

The following works on my machine in the meantime while we patch this:

In [1]: import torch

In [2]: from torchtext import data

In [3]: TEXT = data.Field(batch_first=True)

In [4]: TARGETS = data.Field(sequential=False, tensor_type=torch.DoubleTensor, batch_first=True, use_vocab=False, postprocessing=data.Pipeline(lambda x: float(x)))

In [5]: fields = [('targets', TARGETS), ('text', TEXT)]

In [6]: dataset = data.TabularDataset(path="test.txt", format="tsv", fields=fields)

In [7]: TEXT.build_vocab(dataset)

In [8]: train_iter = data.Iterator(dataset, batch_size=1, sort_key=lambda x: len(x.text), shuffle=True)

In [9]: batch = next(iter(train_iter))

In [10]: batch.targets
Out[10]: 
Variable containing:
 1.3000
[torch.cuda.DoubleTensor of size 1 (GPU 0)]

Hope that helps.

3reactions
greed2411commented, Jun 19, 2018

for me the above one, didn’t work. if anyone is still wondering, change postprocessing=data.Pipeline(lambda x: float(x)) to preprocessing= lambda x: float(x) that made it work for me (pytorch 0.4 and torchtext 0.2.3)

Read more comments on GitHub >

github_iconTop Results From Across the Web

Iterators in C++: An Ultimate Guide to Iterators - Simplilearn
Iterators act as a bridge that connects algorithms to STL containers and allows the modifications of the data present inside the container. They ......
Read more >
Lecture 25: Iterator and Iterable
The job of an Iterator is to keep track of whatever state is necessary to produce values one-at-a-time from a sequence, be it...
Read more >
Is it possible to iterate through all real numbers in [a, b ... - Quora
Anything that is discrete, ordered and computable (and representation of real numbers in computer systems usually is) can be iterated through in any...
Read more >
How to Iterate Through a Dictionary in Python
In this step-by-step tutorial, you'll take a deep dive into how to iterate through a dictionary in Python. Dictionaries are a fundamental data...
Read more >
7. Iteration — How to Think Like a Computer Scientist
This program makes use of the mathematical law of trichotomy (given real numbers a and b, exactly one of these three must be...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found