Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Provide a way to Create Records and Groups by a Row pattern

See original GitHub issue

From @ettorerizza in #1340

I’ve before my eyes a user case which illustrates the potential utility of this variable “column”. Let’s take a txt file containing lines (for example those of an ocerized PDF) with no other structure than this one: the lines we are interested in are always followed by a line starting with the word “total”.

Example:

row1
row2
INTERESTING ROW1
total 1MB
row4
row5
row6
row7
INTERESTING ROW2
total 16MB
row8
INTERESTING ROW3
total 3MB

In the real file, interesting lines are not in capital letters. In fact, let’s say that they do not contain any pattern that allows to filter them by a regular expression. They can only be found by first identifying the rows starting with “total”, then taking the ones preceding them.

How to extract interesting lines with GREL? This is, I think, a fairly common problem.

_Originally posted by @ettorerizza in https://github.com/OpenRefine/OpenRefine/issues/1340#issuecomment-365876113_

Issue Analytics

State:
Created 4 years ago
Comments:8 (8 by maintainers)

Top GitHub Comments

1reaction

thadguidrycommented, Apr 7, 2020

@antoine2711 Firstly, let me just say that I do appreciate your experiments with our lower level functions like row.index and updated cross to perform more data wrangling, but there’s a cognitive loss for our users.

I’m not sure I understand what would be a rowCount(<pattern>) wrapper.

That was badly named, sorry. We want to save users from having to worry about row indexing and cross, they are both an indirection to this issue. We already provide Facets that do counting, and even have FacetCount (but it does not work with patterns; it doesn’t use regex currently).

If you imagine the simple workflow that you performed, it would be something like this:

Create records from subdata in a column (a repeating pattern)
Count those records

So we need to have a function that makes 1. easier for the user. (in other tools, this is actually super easy to do) Then another function (or reuse facetCount()) to count the records as previously suggested in #2237

0reactions

thadguidrycommented, May 22, 2020

#2298 and it’s research later might offer some more insight into this issue as well.