Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Using list2padded for sequences

See original GitHub issue

I have referred to https://github.com/explosion/thinc/blob/master/examples/03_pos_tagger_basic_cnn.ipynb to get a better understanding of the thinc layers. The model is as follows in the example

model = strings2arrays() >> with_array(
        HashEmbed(nO=width, nV=vector_width, column=0)
        >> expand_window(window_size=1)
        >> ReLu(nO=width, nI=width * 3)
        >> ReLu(nO=width, nI=width)
        >> Softmax(nO=nr_classes, nI=width)
    )

Can someone please explain what a string2arrays does? The documentation says that it takes a sequence of sequence of string and produces a list[int2D].

The input X_train is something like [[“this”,“is”,“awesome”],[“thinc”,“is”,“cool”]]. What does the strings2arrays transform this example to? I am unable to wrap my head around what exactly strings2arrays does and how it transforms the input from a List[List] (2D lists) to List[Int2D] technically (3D lists/sequence).

Issue Analytics

State:
Created 4 years ago
Comments:10 (10 by maintainers)

Top GitHub Comments

1reaction

honnibalcommented, Feb 4, 2020

@naveenjafer Ah you should use the PyTorchLSTM layer currently. I’m nearly done fixing the “native” LSTM implementation, but the one that’s there currently is a draft and isn’t very reliable.

In your model there, you have the LSTM inside the with_array block, which is not correct — you need the Padded format to be input for the LSTM. I think the type-checking should highlight a problem with your architecture (although the errors are sometimes hard to read).

There are a couple of ways you could do the transformations. One is:

model = (
    strings2arrays()
    >> list2padded()
    >> with_array(HashEmbed(nO=width, nV=vector_width, column=0))
    >> PyTorchLSTM(nO=100, nI=width)
    >> with_array(Softmax(nO=nr_classes, nI=width))
    >> padded2list()
)

A couple of people have found the sequence formats confusing, so I’m thinking of ways to make it a bit simpler. For now, a good general strategy is to keep the data in a List[Array] format and use the with_ transforms around blocks of your network. This is an easy way to get everything working, and then you can think about refactoring the transforms slightly if there’s a more efficient way to do things.

For instance, here’s how you could have your network:

model = (
    strings2arrays()
    >> with_array(HashEmbed(nO=width, nV=vector_width, column=0))
    >> with_padded(PyTorchLSTM(nO=100, nI=width))
    >> with_array(Softmax(nO=nr_classes, nI=width))
)

0reactions

honnibalcommented, Jan 21, 2021

@naveenjafer Sorry I didn’t see this before.

Yes I did mean just the multi-head attention layer, which wouldn’t have any state associated to it. The transformer would then be built out of it and several other pieces. Anyway, I think the list2padded issue here should be fixed. so I’ll close this issue as old.

Top Results From Across the Web

Variable-length sequences Dataclasses for ragged, padded ...

Thinc's built-in layers support several ways to encode variable-length sequence ... padded = ops.list2padded(sequences) assert padded.data.shape == (7, 3, ...

spacy-io - Bountysource

Hello, I have an error when using radam optimizer with thinc version v8.0.0rc2 and spaCy V3. I´m just doing a regular NER pipeline...

Converging and Diverging Sequences Using Limits - YouTube

This calculus video tutorial provides a basic introduction into converging and diverging sequences using limits.

Calculus II - Series & Sequences - Pauls Online Math Notes

In this chapter we introduce sequences and series. We discuss whether a sequence converges or diverges, is increasing or decreasing, ...

GITENV file updated 3 (805f9d4f) · Commits · 2021_118- Gazette ...

with 340318 additions and 0 deletions. +340318 -0 ... from typing import Union, Iterable, Sequence, Any, Optional. import sys. import json as _builtin_json....