Dealing with list of values (words)
See original GitHub issueMissing functionality When dealing with NLP tasks I always find myself doing the same EDA on words.
Sentence length, Word length, lemmas, character length, etc…
I don’t mind pandas profiling not being able to do NLP tasks but I’m frustrated not being able to get insights on lists of words.
Proposed feature When a type list of strings is found it could be nice to analyze automatically the most commons words, the mean length, max length, The sentence length etc… all the boilerplate code/analysis that we always do in NLP.
Alternatives considered
I do it by hand, adding myself the needed boilerplate code, with each column like var_1
/ var_1_len
/ var_1_len_max
etc…
Additional context This is done for corpus analysis so no labels only text processing.
I don’t know if there is any objective in this direction, but I tried to ask anyway. Have a great day
Issue Analytics
- State:
- Created 2 years ago
- Comments:10
The fix seems to work. I’m creating a release (v3.1.1) and will rerun the examples. If upgrading doesn’t resolve the issue then I’ll look into it again.
That’s a bug, worked neatly before. Will look into it