Using list2padded for sequences
See original GitHub issueI have referred to https://github.com/explosion/thinc/blob/master/examples/03_pos_tagger_basic_cnn.ipynb to get a better understanding of the thinc layers. The model is as follows in the example
model = strings2arrays() >> with_array(
HashEmbed(nO=width, nV=vector_width, column=0)
>> expand_window(window_size=1)
>> ReLu(nO=width, nI=width * 3)
>> ReLu(nO=width, nI=width)
>> Softmax(nO=nr_classes, nI=width)
)
Can someone please explain what a string2arrays does? The documentation says that it takes a sequence of sequence of string and produces a list[int2D].
The input X_train is something like [[“this”,“is”,“awesome”],[“thinc”,“is”,“cool”]]. What does the strings2arrays transform this example to? I am unable to wrap my head around what exactly strings2arrays does and how it transforms the input from a List[List] (2D lists) to List[Int2D] technically (3D lists/sequence).
Issue Analytics
- State:
- Created 4 years ago
- Comments:10 (10 by maintainers)
Top Results From Across the Web
Variable-length sequences Dataclasses for ragged, padded ...
Thinc's built-in layers support several ways to encode variable-length sequence ... padded = ops.list2padded(sequences) assert padded.data.shape == (7, 3, ...
Read more >spacy-io - Bountysource
Hello, I have an error when using radam optimizer with thinc version v8.0.0rc2 and spaCy V3. I´m just doing a regular NER pipeline...
Read more >Converging and Diverging Sequences Using Limits - YouTube
This calculus video tutorial provides a basic introduction into converging and diverging sequences using limits.
Read more >Calculus II - Series & Sequences - Pauls Online Math Notes
In this chapter we introduce sequences and series. We discuss whether a sequence converges or diverges, is increasing or decreasing, ...
Read more >GITENV file updated 3 (805f9d4f) · Commits · 2021_118- Gazette ...
with 340318 additions and 0 deletions. +340318 -0 ... from typing import Union, Iterable, Sequence, Any, Optional. import sys. import json as _builtin_json....
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@naveenjafer Ah you should use the
PyTorchLSTM
layer currently. I’m nearly done fixing the “native” LSTM implementation, but the one that’s there currently is a draft and isn’t very reliable.In your model there, you have the
LSTM
inside thewith_array
block, which is not correct — you need thePadded
format to be input for the LSTM. I think the type-checking should highlight a problem with your architecture (although the errors are sometimes hard to read).There are a couple of ways you could do the transformations. One is:
A couple of people have found the sequence formats confusing, so I’m thinking of ways to make it a bit simpler. For now, a good general strategy is to keep the data in a
List[Array]
format and use thewith_
transforms around blocks of your network. This is an easy way to get everything working, and then you can think about refactoring the transforms slightly if there’s a more efficient way to do things.For instance, here’s how you could have your network:
@naveenjafer Sorry I didn’t see this before.
Yes I did mean just the multi-head attention layer, which wouldn’t have any state associated to it. The transformer would then be built out of it and several other pieces. Anyway, I think the
list2padded
issue here should be fixed. so I’ll close this issue as old.