Support for custom Encoders
See original GitHub issueIf you currently use word-level embeddings (e.g. fastText), whatlies
supports embeddings for sentences by summing the individual word embeddings. While this is reasonable default behaviour, its also an arbitrary and inflexible choice. Ideally whatlies
can support standard encoding schemes such as sum
, average
or max
, and otherwise offer the use of callables for any custom operation that users want.
Issue Analytics
- State:
- Created 3 years ago
- Comments:10 (9 by maintainers)
Top Results From Across the Web
Custom Encoder Mapping - YouTube
Encoder Mapping allows you to determine how parameters are assigned or populated on the encoders. Specific mappings can be created based on ...
Read more >Custom Encoder / Decoder supporting RawRepresentable
I give up figuring this out myself. A struct that conforms to RawRepresentable and to Codable whose rawValue conforms to Codable should have ......
Read more >Encoding and Decoding Custom Types - Apple Developer
To support both encoding and decoding, declare conformance to Codable , which combines the Encodable and Decodable protocols. This process is known as...
Read more >Custom Encoders - WCF - Microsoft Learn
This topic describes how to extend the Text , Binary , and MTOM message encoders that are included in WCF, or create your...
Read more >Support for adding custom encoders · Issue #330 - GitHub
I would like a way to add custom encoders using the config here. There is an Encoding option here that allows both console...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Supporting string metric values as well as callables is not an issue at all and could be done easily. However, I am not in favor of adding that
**kwds
argument mainly because we already have that in some of the language classes (for example here inHFTransformers
) for passing additional arguments to underlying language backend constructor. So if we want to use__init__
for this purpose and also support the callables as well as custom keyword arguments for them, we should either:__init__
likecombiner_kwd
, orlambda
or usingfunctools.partial
).That’s not different from having a default value of
None
for the relevant config value and therefore use whatever the language backend does by default in that case; so no worries there!Actually, that’s not entirely true. The
__CLS__
token which is present in some of the transformer models has been added and finetuned on downstream tasks so that it could be a good representation of the entire input sequence; however, nothing prevents you to use other alternative representation for the entire sequence based on the contextualized token embeddings of sequence, and also there is no guarantee that the__CLS__
token would out-perform all of the other representations. Actually, as I have already mentioned this in #92, thespacy
package uses average of token embeddings and thespacy-transformers
uses the sum of token embeddings (not the__CLS__
token). You can even go further, as I also mentioned this in #92, and use the representation given by the intermediate transformer layers or even the embedding layer itself. So this usually depends on the downstream task, the data or the analysis you want to perform (just as an example, see the result of different contextualized token embedding combinations in BERT paper for NER; here is a visual summary).