question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Preprocessor components

See original GitHub issue

Currently we lowercase and perform other token processing operations within the CountVectorFeaturizer, which only affects the features and not the tokens. This means entity extractors don’t have access to the same preprocessing. Would be good to include component pipelines so that it’s clear what processing happens to the tokens at which point. Two would include:

  • LowercasePreprocessor
  • AccentStripPreproccessor

(and then remove this functionality from the components in which this processing currently happens e.g. strip_accents in the CountVectorsFeaturizer)

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:12 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
tabergmacommented, Oct 23, 2019

I would say it is not needed. We don’t have a component that depends on the plain token text as far as I know, all components depend on the features.

0reactions
wochingecommented, Dec 19, 2019

I’ll close this one then.

Read more comments on GitHub >

github_iconTop Results From Across the Web

C preprocessor - Wikipedia
The preprocessor provides the ability for the inclusion of header files, macro expansions, conditional compilation, and line control. In many C implementations, ...
Read more >
The C Preprocessor
The C preprocessor is a macro processor that is used automatically by the C compiler to transform your program before actual compilation.
Read more >
C/C++ Preprocessors - GeeksforGeeks
As the name suggests, Preprocessors are programs that process our source code before compilation. There are a number of steps involved ...
Read more >
Preprocessor - cppreference.com
The preprocessor has the source file translation capabilities: conditionally compile of parts of source file (controlled by directive #if, ...
Read more >
C Programming/Preprocessor directives and macros - Wikibooks
Preprocessors are a way of making text processing with your C program before they are actually compiled. Before the actual compilation of every...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found