Difficulty using the package due to outdated documentation and lack of examples
See original GitHub issueš Documentation
Description
Iām trying to use torchtext
for review rating prediction, but the new API is not well documented yet. I tried to learn from the migration jupyter notebook but it fails on cell #10 if I change torchtext
version to 0.10.0
.
There are a few things that remain are unclear to me:
- I would imagine Vectors to require specific tokenizer as they might encode special symbols differently? Is this correct? How does GloVe treat special characters?
- Are vector fixed or do they receive gradients as well?
- Are there current roadmaps on reviewing the documentation?
Thanks Pedro
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:6 (6 by maintainers)
Top Results From Across the Web
Outdated Documentation - Ubuntu Community Hub
The problem with it is that the vast majority of it is completely out of date (more on this later). While the official...
Read more >Outdated Document Management: 3 Things Your Business ...
Outdated Document Management - Here are 3 things your business needs to stop believing about outdated document management.
Read more >6 Problems Caused by Inefficient Document Management
4.) Problem: Manual processes have outdated or no security measures - filing cabinets are rapidly becoming outdated.
Read more >Poor Documentation: Why It Happens and How to Fix It
Combs agrees: "The most common cause of poor documentation is a lack of understanding of the specific information that needs to be included...
Read more >6 Reasons You Should Stop Using PDF for Business Content
Here's why you should ditch PDFs and switch to a more engaging, intelligent, mobile-friendly format instead.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Actually we have it for new vocab as well, refer here for documentation. Also refer to section ācreating Vocab from text fileā and āBackward Incompatible changesā in release notes for additional details and usage.
Thanks @pedropgusmao for bring this up. Yes, you are right this cell is not functional for version
0.10.0
due to update in Vocab. I will update this.This is a good question. I would suggest to refer to original source of vectors to learn more about how to tokenize. For example refer here for GloVe and FastText. We do not explicitly encode any special symbols and provide wrapper for whatās available as part of original source vectors. For unknown token queries, we simply return zero tensor by default (or initialized with specific value provided by user) with same dimension as original source vectors.
Vectors are simply containers that maps tokens to their corresponding vector representation. If you want your vectors to be trainable, I would suggest to use nn.embeddings
update: Related issue #1350
I really appreciate you bringing this up. Please do suggest or feel free to raise issues wherever you find the documentation is not appropriate. We will try our best to address it. Please note that with this new release (0.10.0), we have deprecated the legacy vocab and replaced it with new Vocab module. You can find additional details in the release note and refer to the documentation here. Also I would suggest to learn more through the tutorials here that are already updated with regard to latest features like iterable datasets and new Vocab module.