Preprocessor components
See original GitHub issueCurrently we lowercase and perform other token processing operations within the CountVectorFeaturizer
, which only affects the features and not the tokens. This means entity extractors don’t have access to the same preprocessing. Would be good to include component pipelines so that it’s clear what processing happens to the tokens at which point. Two would include:
LowercasePreprocessor
AccentStripPreproccessor
(and then remove this functionality from the components in which this processing currently happens e.g. strip_accents
in the CountVectorsFeaturizer
)
Issue Analytics
- State:
- Created 4 years ago
- Comments:12 (12 by maintainers)
Top Results From Across the Web
C preprocessor - Wikipedia
The preprocessor provides the ability for the inclusion of header files, macro expansions, conditional compilation, and line control. In many C implementations, ...
Read more >The C Preprocessor
The C preprocessor is a macro processor that is used automatically by the C compiler to transform your program before actual compilation.
Read more >C/C++ Preprocessors - GeeksforGeeks
As the name suggests, Preprocessors are programs that process our source code before compilation. There are a number of steps involved ...
Read more >Preprocessor - cppreference.com
The preprocessor has the source file translation capabilities: conditionally compile of parts of source file (controlled by directive #if, ...
Read more >C Programming/Preprocessor directives and macros - Wikibooks
Preprocessors are a way of making text processing with your C program before they are actually compiled. Before the actual compilation of every...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I would say it is not needed. We don’t have a component that depends on the plain token text as far as I know, all components depend on the features.
I’ll close this one then.