Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Preprocessor components

See original GitHub issue

Currently we lowercase and perform other token processing operations within the CountVectorFeaturizer, which only affects the features and not the tokens. This means entity extractors don’t have access to the same preprocessing. Would be good to include component pipelines so that it’s clear what processing happens to the tokens at which point. Two would include:

LowercasePreprocessor
AccentStripPreproccessor

(and then remove this functionality from the components in which this processing currently happens e.g. strip_accents in the CountVectorsFeaturizer)

Issue Analytics

State:
Created 4 years ago
Comments:12 (12 by maintainers)

Top GitHub Comments

1reaction

tabergmacommented, Oct 23, 2019

I would say it is not needed. We don’t have a component that depends on the plain token text as far as I know, all components depend on the features.

0reactions

wochingecommented, Dec 19, 2019

I’ll close this one then.

Top Results From Across the Web

C preprocessor - Wikipedia

The preprocessor provides the ability for the inclusion of header files, macro expansions, conditional compilation, and line control. In many C implementations, ...

The C Preprocessor

The C preprocessor is a macro processor that is used automatically by the C compiler to transform your program before actual compilation.

C/C++ Preprocessors - GeeksforGeeks

As the name suggests, Preprocessors are programs that process our source code before compilation. There are a number of steps involved ...

Preprocessor - cppreference.com

The preprocessor has the source file translation capabilities: conditionally compile of parts of source file (controlled by directive #if, ...

C Programming/Preprocessor directives and macros - Wikibooks

Preprocessors are a way of making text processing with your C program before they are actually compiled. Before the actual compilation of every...