question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Using list2padded for sequences

See original GitHub issue

I have referred to https://github.com/explosion/thinc/blob/master/examples/03_pos_tagger_basic_cnn.ipynb to get a better understanding of the thinc layers. The model is as follows in the example

model = strings2arrays() >> with_array(
        HashEmbed(nO=width, nV=vector_width, column=0)
        >> expand_window(window_size=1)
        >> ReLu(nO=width, nI=width * 3)
        >> ReLu(nO=width, nI=width)
        >> Softmax(nO=nr_classes, nI=width)
    )

Can someone please explain what a string2arrays does? The documentation says that it takes a sequence of sequence of string and produces a list[int2D].

The input X_train is something like [[“this”,“is”,“awesome”],[“thinc”,“is”,“cool”]]. What does the strings2arrays transform this example to? I am unable to wrap my head around what exactly strings2arrays does and how it transforms the input from a List[List] (2D lists) to List[Int2D] technically (3D lists/sequence).

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:10 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
honnibalcommented, Feb 4, 2020

@naveenjafer Ah you should use the PyTorchLSTM layer currently. I’m nearly done fixing the “native” LSTM implementation, but the one that’s there currently is a draft and isn’t very reliable.

In your model there, you have the LSTM inside the with_array block, which is not correct — you need the Padded format to be input for the LSTM. I think the type-checking should highlight a problem with your architecture (although the errors are sometimes hard to read).

There are a couple of ways you could do the transformations. One is:

model = (
    strings2arrays()
    >> list2padded()
    >> with_array(HashEmbed(nO=width, nV=vector_width, column=0))
    >> PyTorchLSTM(nO=100, nI=width)
    >> with_array(Softmax(nO=nr_classes, nI=width))
    >> padded2list()
)

A couple of people have found the sequence formats confusing, so I’m thinking of ways to make it a bit simpler. For now, a good general strategy is to keep the data in a List[Array] format and use the with_ transforms around blocks of your network. This is an easy way to get everything working, and then you can think about refactoring the transforms slightly if there’s a more efficient way to do things.

For instance, here’s how you could have your network:

model = (
    strings2arrays()
    >> with_array(HashEmbed(nO=width, nV=vector_width, column=0))
    >> with_padded(PyTorchLSTM(nO=100, nI=width))
    >> with_array(Softmax(nO=nr_classes, nI=width))
)
0reactions
honnibalcommented, Jan 21, 2021

@naveenjafer Sorry I didn’t see this before.

Yes I did mean just the multi-head attention layer, which wouldn’t have any state associated to it. The transformer would then be built out of it and several other pieces. Anyway, I think the list2padded issue here should be fixed. so I’ll close this issue as old.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Variable-length sequences Dataclasses for ragged, padded ...
Thinc's built-in layers support several ways to encode variable-length sequence ... padded = ops.list2padded(sequences) assert padded.data.shape == (7, 3, ...
Read more >
spacy-io - Bountysource
Hello, I have an error when using radam optimizer with thinc version v8.0.0rc2 and spaCy V3. I´m just doing a regular NER pipeline...
Read more >
Converging and Diverging Sequences Using Limits - YouTube
This calculus video tutorial provides a basic introduction into converging and diverging sequences using limits.
Read more >
Calculus II - Series & Sequences - Pauls Online Math Notes
In this chapter we introduce sequences and series. We discuss whether a sequence converges or diverges, is increasing or decreasing, ...
Read more >
GITENV file updated 3 (805f9d4f) · Commits · 2021_118- Gazette ...
with 340318 additions and 0 deletions. +340318 -0 ... from typing import Union, Iterable, Sequence, Any, Optional. import sys. import json as _builtin_json....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found