Batch does not carry index
See original GitHub issueUse Case:
replace_unk
most strategies of replacing <unk> tokens rely on aligning with the source sequence before numericialize
Problem: Using the Batch object, you are unable to retrieve the original text before padding and numericialize. There are no indexes stored with the batch to retrieve the original text in the dataset.
Quick work around: Define a field in dataset that is an ‘index’ field. While building your dataset, pass in indexes for each item.
Batch will then allow you to look up an index attribute.
Issue Analytics
- State:
- Created 6 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
can't access array index in batch file - Stack Overflow
What makes you think batch files have array indices? (Hint: they don't!) You can sort of get something like what you're trying: drop...
Read more >Batch Error: "Index was outside the bounds of the array" #486
I'm trying to use batch to delete a bunch of lists. Sometimes the list is not there but I want the ones that...
Read more >How to Index from Batch in Document Manager - TeamDynamix
Select the application that contains the batch that will be indexed by either: · From the menu, choose Utilities > Batch Index (List)....
Read more >I am indexing a batch that says for your reference do not index ...
Hi,. It looks like you've started to index one of the reference images that are either side of the one you should be...
Read more >Spring Batch - Reference Documentation
Spring Batch is not a scheduling framework. ... This is not a problem as long as the jobs are not sharing the same...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Tokenization is fully reversible if you have
(orth_id, has_space)
pairs. If you wanted a single sequence of ints, you would double the number of entries in the vocab in theory. Of course the extra bit introduces little extra entropy given the word ID.So, spaCy’s tokenizers are already fully reversible. You could use them as an internal mechanism to solve this, if you like 😃. It doesn’t have to change your user-facing API, I don’t think.
I’m planning to add PyTorch tensors as a back-end option for thinc, in addition to Cupy. I also need to write examples of hooking PyTorch models into spaCy.
While I’m here: is it easy to pass a gradient back to a PyTorch model? Most libraries seem to communicate by loss, which makes it harder to compose them with models outside the library.
For passing a gradient back to PyTorch,
var.backward
has an optionalgrad_output
argument that allows you to inject a gradient in a specific place in the computation graph. If you want to inject several gradients, you can usetorch.autograd.backward((var_1, var_2), (grad_1, grad_2))
I believe.