question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

bug: `re_extract` behavior differs between backends

See original GitHub issue

The behavior of re_extract differs between backends, with regards to what the index parameter means.

This method has signature re_extract(pattern, index), where pattern is a regex pattern with optional groups, and index is an index of the group to extract (returning NULL if no match or no group matches that index).

  • duckdb, postgres, clickhouse, …: re_extract(pattern, 0) returns the part matching the first group if there’s a match, and NULL otherwise.
  • sqlite, pandas, dask, pyspark: re_extract(pattern, 0) returns the whole string if there’s a match, and NULL otherwise. You need to pass in 1 not 0 to extract the first group.

To put it another way, given a column with a value "row_one", column.re_extract("row_([a-z])", 0) returns "one" for backends in the first group, and "row_one" for backends in the second.

Given the docstring, I think the first group has the intended behavior.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
cpcloudcommented, Oct 3, 2022

Is there a reason to support arbitrary values here, or could we restrict pattern and index to literal strings/integers?

A column of strings and a column of groups to extract might be a case where you’d want this.

I think we just handle it in the backend and raise an error during compilation. We do this elsewhere in the codebase.

I’m in favor of not restricting it unless it’s breaking something to keep it that way.

1reaction
cpcloudcommented, Sep 30, 2022

+1 on matching the behavior of re.match.

Read more comments on GitHub >

github_iconTop Results From Across the Web

What is the difference between 'expected but not desired ...
A bug is when the user-observable behavior differs from the documented behavior. An undesired behavior is when the user-observable behavior correctly ...
Read more >
Exporting module: differing behavior vs 'backend' #9607
Bug report summary The exporting module has different connection behavior than the now deprecated backend environment.
Read more >
JSON Patch is a bizarre Frankenstein's monster made of the ...
JSON Patch is a bizarre Frankenstein's monster made of the cognitive dissonance of REST aficionados. JSON Patch is not REST. It is not...
Read more >
https://mirror.math.princeton.edu/pub/putty/putty-...
... Telnet and Rlogin differ?=t00000002 1 Chapter 2: Getting started with ... The PuTTY command line=t00000027 3 Section 3.8.1: Starting a session from...
Read more >
rurban/perl-compiler: Perl5 compiler backends B::Bytecode, B::C, B ...
See test 21. ... with standard Perl but gives a compile-time error with compiled Perl. See test 30. ... large numbers or on...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found