Provide a way to Create Records and Groups by a Row pattern
See original GitHub issueFrom @ettorerizza in #1340
I’ve before my eyes a user case which illustrates the potential utility of this variable “column”. Let’s take a txt file containing lines (for example those of an ocerized PDF) with no other structure than this one: the lines we are interested in are always followed by a line starting with the word “total”.
Example:
row1
row2
INTERESTING ROW1
total 1MB
row4
row5
row6
row7
INTERESTING ROW2
total 16MB
row8
INTERESTING ROW3
total 3MB
In the real file, interesting lines are not in capital letters. In fact, let’s say that they do not contain any pattern that allows to filter them by a regular expression. They can only be found by first identifying the rows starting with “total”, then taking the ones preceding them.
How to extract interesting lines with GREL? This is, I think, a fairly common problem.
_Originally posted by @ettorerizza in https://github.com/OpenRefine/OpenRefine/issues/1340#issuecomment-365876113_
Issue Analytics
- State:
- Created 4 years ago
- Comments:8 (8 by maintainers)
@antoine2711 Firstly, let me just say that I do appreciate your experiments with our lower level functions like
row.index
and updatedcross
to perform more data wrangling, but there’s a cognitive loss for our users.That was badly named, sorry. We want to save users from having to worry about row indexing and cross, they are both an indirection to this issue. We already provide Facets that do counting, and even have FacetCount (but it does not work with patterns; it doesn’t use regex currently).
If you imagine the simple workflow that you performed, it would be something like this:
So we need to have a function that makes 1. easier for the user. (in other tools, this is actually super easy to do) Then another function (or reuse
facetCount()
) to count the records as previously suggested in #2237#2298 and it’s research later might offer some more insight into this issue as well.