Using a field representing real numbers with the iterator
See original GitHub issueI am trying to learn a regressor on text data and I use torchtext in all my other tasks but I see a problem in using it for this use case.
I define the field for targets as follows:
TARGETS = data.Field(
sequential=False, tensor_type=torch.DoubleTensor, batch_first=True)
self.fields = [('targets', TARGETS), ('text', TEXT)]
self.train, self.val, self.test = data.TabularDataset.splits(
path=self.path,
train=self.train_suffix,
validation=self.val_suffix,
test=self.test_suffix,
format=formatting,
fields=self.fields)
TEXT.build_vocab(self.train)
I have a file that contains tab separate <values>\t<text>
When I make iterators out of it,
train_iter, val_iter, test_iter = data.Iterator.splits(
(self.train, self.val, self.test),
batch_sizes=(self.batch_size, self.test_batch_size,
self.test_batch_size),
sort_key=lambda x: len(x.text),
shuffle=True)
print(next(iter(train_iter)))
it gives me an error when getting the next batch:
AttributeError: ‘Field’ object has no attribute ‘vocab’
I know this is because I didn’t run .build_vocab for the TARGETS field. But why do I really need to do this? What if I just want to get real numbers and compute losses on them?
Any workaround is appreciated. If I am doing something wrong, please let me know too.
Issue Analytics
- State:
- Created 6 years ago
- Comments:11 (3 by maintainers)
Top Results From Across the Web
Iterators in C++: An Ultimate Guide to Iterators - Simplilearn
Iterators act as a bridge that connects algorithms to STL containers and allows the modifications of the data present inside the container. They ......
Read more >Lecture 25: Iterator and Iterable
The job of an Iterator is to keep track of whatever state is necessary to produce values one-at-a-time from a sequence, be it...
Read more >Is it possible to iterate through all real numbers in [a, b ... - Quora
Anything that is discrete, ordered and computable (and representation of real numbers in computer systems usually is) can be iterated through in any...
Read more >How to Iterate Through a Dictionary in Python
In this step-by-step tutorial, you'll take a deep dive into how to iterate through a dictionary in Python. Dictionaries are a fundamental data...
Read more >7. Iteration — How to Think Like a Computer Scientist
This program makes use of the mathematical law of trichotomy (given real numbers a and b, exactly one of these three must be...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks for the issue.
Torchtext needs to convert the string number to an
int
orfloat
somewhere down the line and it currently doesn’t do this. A quick fix would be to manually add a pipeline to thepostprocessing
argument that converts everything in theTARGETS
field to int. With a slightly modified version of your code:Edit: just noticed that your example uses doubles. changed my code accordingly
(tab separated file)
The following works on my machine in the meantime while we patch this:
Hope that helps.
for me the above one, didn’t work. if anyone is still wondering, change
postprocessing=data.Pipeline(lambda x: float(x))
topreprocessing= lambda x: float(x)
that made it work for me (pytorch 0.4 and torchtext 0.2.3)