question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Splitter.onPattern drops last token when zero width matches are used

See original GitHub issue

Original issue created by travis.downs on 2012-11-04 at 10:17 AM


If you split on a zero-width regex, the last element may be dropped.

String input = “foo”; String regex = “(?=o)|(?<=o)”; Splitter splitter = Splitter.onPattern(regex); System.out.println(Arrays.asList(input.split(regex))); System.out.println(Arrays.asList(Iterables.toArray(splitter.split(input),String.class)));

This does zero-width lookaround for ‘o’, so the string will be split before and after any o, but the o characters will also be returned as individual items.

Note that String.split works correctly here, but Guava splitter drops the last o.

Examining the code, this was probably introduced in this fix (granted, it didn’t work at all before that):

https://github.com/google/guava/issues/936

In particular, this bit of logic in Splitter.java:

    if (offset == nextStart) {
      /*
       * (ommit comment)
       */
      offset++;
      if (offset >= toSplit.length()) {
        offset = -1;
      }
      continue;
    }

neglects to the the “last element” handling that the general logic above does (setting end to toSplit.lenth() if no more separators are found).

Issue Analytics

  • State:closed
  • Created 9 years ago
  • Comments:23 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
cpovirkcommented, Oct 28, 2016

Thanks. Will have a look, hopefully next week.

0reactions
jbduncancommented, Oct 29, 2016

Oh, it looks like #2086 proposes a very similar fix to mine! I now wonder if my fix covers all the possible edge cases…

Read more comments on GitHub >

github_iconTop Results From Across the Web

Regex to split the first from a "/token1/token2/token3"
I have tried: List<String> connectorPath = Splitter.on("^[/\\w+]+") .trimResults() .splitToList(actionPath);. Doesn't work for me, any ideas?
Read more >
RE/flex user guide - Computer Science, FSU
The find() , scan() and split() methods return a nonzero *"accept"* value, which corresponds to the regex group captured, or the methods return...
Read more >
pgn-extract: a Portable Game Notation (PGN) manipulator
This page documents the free, open-source program pgn-extract, which is a command-line program for searching, manipulating and formatting chess games ...
Read more >
fixest.pdf
Returns a matrix of the same dimension as the number of variables used in the estimation. Examples est = feols(Petal.Length ~ Petal.Width + ......
Read more >
W3C XML Schema Definition Language (XSD) 1.1 Part 2
It defines facilities for defining datatypes to be used in XML Schemas as well as other XML specifications. The datatype language, which is ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found