Is there any reason for `TimeDistributedDense`?
See original GitHub issueSimilar to how I removed time_distributed_softmax, and just made it always apply softmax to the last axis (i.e. the nb_dimensions) axis, is there any reason that I don’t make Dense
have the same behavour, so you can pass it either a (nb_samples, nb_dims) or a (nb_samples, nb_timesteps, nb_dims) matrix? Seems like it reduces complexity with no downside.
Issue Analytics
- State:
- Created 8 years ago
- Comments:11 (10 by maintainers)
Top Results From Across the Web
TimeDistributed vs. TimeDistributedDense Keras
TimeDistributed is a Keras wrapper which makes possible to get any static (non-sequential) layer and apply it in a sequential manner. So if...
Read more >How to Use the TimeDistributed Layer in Keras
One reason for this difficulty in Keras is the use of the TimeDistributed ... TimeDistributedDense applies a same Dense (fully-connected) ...
Read more >what is the interest of TimeDistributed after an LSTM layer?
Ok let's say you have an LSTM() layer with return_sequences = True set. That means each LSTM cell in it is outputting its...
Read more >What is time distributed dense layer in Keras? - Quora
Timedistributed dense layer is used on RNN, including LSTM, to keep one-to-one relations on input and output. Assume you have 60 time steps...
Read more >Keras 2 Released - Part 1 (2017) - Fast.ai forums
Part 1 of the class's TimeDistributedDense layer was removed, ... This causes an error when trying to import the utils.py module.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
So, from this discussion I glean that:
model.add(LSTM(512, input_shape=(maxlen, len(chars)),return_sequences=True)) model.add(LSTM(512, return_sequences=True)) #- original model.add(Dropout(0.2)) model.add(Dense(len(chars)))
model.add(Activation('softmax'))
and
model.add(LSTM(512, input_shape=(maxlen, len(chars)),return_sequences=True)) model.add(LSTM(512, return_sequences=True)) #- original model.add(Dropout(0.2)) model.add(TimeDistributed(Dense(len(chars))))
model.add(Activation('softmax'))
represent the exact same model. I also checked the summary and the number of parameters are also exactly the same which leads me to conclude that I can probably do away with TimeDistribute block altogether without affecting the model in any way. Please correct me here if this is incorrect
@falaktheoptimist Did you ever work this out? I’m wondering the same thing - the output dimensions, loss and accuracy are identical for me if I replace my final
TimeDistributed(Dense())
layer with just aDense
layer.EDIT: I’m wondering if https://github.com/fchollet/keras/pull/7554 is related.