Splitter.onPattern drops last token when zero width matches are used
See original GitHub issueOriginal issue created by travis.downs on 2012-11-04 at 10:17 AM
If you split on a zero-width regex, the last element may be dropped.
String input = “foo”; String regex = “(?=o)|(?<=o)”; Splitter splitter = Splitter.onPattern(regex); System.out.println(Arrays.asList(input.split(regex))); System.out.println(Arrays.asList(Iterables.toArray(splitter.split(input),String.class)));
This does zero-width lookaround for ‘o’, so the string will be split before and after any o, but the o characters will also be returned as individual items.
Note that String.split works correctly here, but Guava splitter drops the last o.
Examining the code, this was probably introduced in this fix (granted, it didn’t work at all before that):
https://github.com/google/guava/issues/936
In particular, this bit of logic in Splitter.java:
if (offset == nextStart) {
/*
* (ommit comment)
*/
offset++;
if (offset >= toSplit.length()) {
offset = -1;
}
continue;
}
neglects to the the “last element” handling that the general logic above does (setting end to toSplit.lenth() if no more separators are found).
Issue Analytics
- State:
- Created 9 years ago
- Comments:23 (6 by maintainers)
Top GitHub Comments
Thanks. Will have a look, hopefully next week.
Oh, it looks like #2086 proposes a very similar fix to mine! I now wonder if my fix covers all the possible edge cases…