question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Does it support masking?

See original GitHub issue

Hello CyberZHG

I have a sequence of inputs and sequence of outputs where each input has an associated output(Label). lets say (part of speech tagging (POS tagging))

Seq_in[0][0:3] array([[15],[28], [23]])

Seq_out[0][0:3] array([[0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)

I am using the following code for training:

X_train, X_val, Y_train, Y_val = train_test_split(Seq_in,Seq_out, test_size=0.20)

model = Sequential() model.add(Masking(mask_value=5, input_shape= (Seq_in.shape[1],1))) # time steps is 500 model.add(Bidirectional(LSTM(256, return_sequences=True))) model.add(Dropout(0.2)) model.add(Bidirectional(LSTM(256, return_sequences=True))) model.add(Dropout(0.2)) model.add(seq_self_attention.SeqSelfAttention()) model.add(Dense(15, activation=‘softmax’))

sgd = optimizers.SGD(lr=.1,momentum=0.9,decay=1e-3,nesterov=True) model.compile(loss=‘categorical_crossentropy’, optimizer=sgd, metrics=[‘accuracy’])

model.fit(X_train,Y_train,epochs=2, validation_data=(X_val, Y_val),verbose=2)

I have a couple of concerns: it seems that the implementation supports masking, but what I am doing in the code is a correct way to use masking or there is another way?

why do we need the variable units in the constructor? does not the code figuer it out itself?

following the equations posted in the readme file, the process is to sum each neighbor states ht` with the state of the current time step ht, then taking the tanh of each unit in each state, which produce the same shape. first equation.

second, each states ht` is squashed to one value (scalar) using sigmoid function. Second equation.

Third, we find the softmax between the current state of the current time step with the other states ht`.

Finally, we multiply the softmax probability (attention weight) with each unit and then taking the weighted sum.

is my understanding correct? if so, why do we need the unit in the constructor?

Also, we have to methods multiplicative and additive, where can I see the difference in regard to the equations

Sorry, too many questions, I would appreciate your answers… Thank you

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
CyberZHGcommented, Nov 7, 2018

See UOI-1806.01264 which is also a tagging task. The attention weights would be approximately equal when initialized, however, they do distribute after several epochs.

image

1reaction
CyberZHGcommented, Nov 6, 2018

Also, we have to methods multiplicative and additive, where can I see the difference in regard to the equations

The default option is additive. The equation for multiplicative is in this section.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How well do face masks protect against COVID-19?
Face masks can help slow the spread of coronavirus disease 2019 (COVID-19). Learn about mask types, which masks to use and how to...
Read more >
Community Use of Masks to Control the Spread of SARS-CoV-2
Though CDC seeks to update Science Briefs when and as appropriate, given ongoing changes in scientific evidence an individual Science Brief might not...
Read more >
An evidence review of face masks against COVID-19 | PNAS
The science around the use of masks by the public to impede COVID-19 transmission is advancing rapidly. In this narrative review, ...
Read more >
Still Confused About Masks? Here's the Science Behind How ...
But health experts say the evidence is clear that masks can help prevent the spread of COVID-19 and that the more people wearing...
Read more >
Effectiveness of Mask Wearing to Control Community Spread ...
Prior to the coronavirus disease 2019 (COVID-19) pandemic, the efficacy of community mask wearing to reduce the spread of respiratory ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found