question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Provide a way to Create Records and Groups by a Row pattern

See original GitHub issue

From @ettorerizza in #1340

I’ve before my eyes a user case which illustrates the potential utility of this variable “column”. Let’s take a txt file containing lines (for example those of an ocerized PDF) with no other structure than this one: the lines we are interested in are always followed by a line starting with the word “total”.

Example:

row1
row2
INTERESTING ROW1
total 1MB
row4
row5
row6
row7
INTERESTING ROW2
total 16MB
row8
INTERESTING ROW3
total 3MB

In the real file, interesting lines are not in capital letters. In fact, let’s say that they do not contain any pattern that allows to filter them by a regular expression. They can only be found by first identifying the rows starting with “total”, then taking the ones preceding them.

How to extract interesting lines with GREL? This is, I think, a fairly common problem.

_Originally posted by @ettorerizza in https://github.com/OpenRefine/OpenRefine/issues/1340#issuecomment-365876113_

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
thadguidrycommented, Apr 7, 2020

@antoine2711 Firstly, let me just say that I do appreciate your experiments with our lower level functions like row.index and updated cross to perform more data wrangling, but there’s a cognitive loss for our users.

I’m not sure I understand what would be a rowCount(<pattern>) wrapper.

That was badly named, sorry. We want to save users from having to worry about row indexing and cross, they are both an indirection to this issue. We already provide Facets that do counting, and even have FacetCount (but it does not work with patterns; it doesn’t use regex currently).

If you imagine the simple workflow that you performed, it would be something like this:

  1. Create records from subdata in a column (a repeating pattern)
  2. Count those records

So we need to have a function that makes 1. easier for the user. (in other tools, this is actually super easy to do) Then another function (or reuse facetCount()) to count the records as previously suggested in #2237

0reactions
thadguidrycommented, May 22, 2020

#2298 and it’s research later might offer some more insight into this issue as well.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to group rows in Excel to collapse and expand them
Go to the Data tab > Outline group, click the arrow under Group, and select Auto Outline.
Read more >
Grouping records based on a pattern of row values using ...
I would suggest you to use window functions. First use a window ordered by row_number to get an incremental sum of col2 ....
Read more >
Solved: Creating a RecordID based on a Pattern of Rows....
So I'm trying to give them each a RecordID so I can merge them together in one row later.
Read more >
Excel - Grouping (columns and rows) - YouTube
Many more great Excel tutorials linked below:http://www.youtube.com/playlist?list=PL8004DC1D703D348C&feature=plcpBe sure to watch my other ...
Read more >
Create Row Number for Each Group in Power BI ... - RADACAD
The first step is to Group the data, right-click on the field that you want to be your grouping field, and select Group...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found