question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Document Using Regex for str.split

See original GitHub issue
import pandas as pd
df = pd.DataFrame({'col': ['a-b-c+e=d,f#t']*5})
df.col.str.split('+|=', expand=True)

Problem description

While passing two patterns separating with | to str.split() method, if one of them is +, panads returns the following error:

---------------------------------------------------------------------------
error                                     Traceback (most recent call last)
 in 
      2 import pandas as pd
      3 df = pd.DataFrame({'col': ['a-b-c+e=d,f#t']*5})
----> 4 df.col.str.split('+|=', expand=True)

~\Anaconda3\lib\site-packages\pandas\core\strings.py in split(self, pat, n, expand)
   2328     @copy(str_split)
   2329     def split(self, pat=None, n=-1, expand=False):
-> 2330         result = str_split(self._data, pat, n=n)
   2331         return self._wrap_result(result, expand=expand)
   2332 

~\Anaconda3\lib\site-packages\pandas\core\strings.py in str_split(arr, pat, n)
   1458             if n is None or n == -1:
   1459                 n = 0
-> 1460             regex = re.compile(pat)
   1461             f = lambda x: regex.split(x, maxsplit=n)
   1462     res = _na_map(f, arr)

~\Anaconda3\lib\re.py in compile(pattern, flags)
    231 def compile(pattern, flags=0):
    232     "Compile a regular expression pattern, returning a pattern object."
--> 233     return _compile(pattern, flags)
    234 
    235 def purge():

~\Anaconda3\lib\re.py in _compile(pattern, flags)
    299     if not sre_compile.isstring(pattern):
    300         raise TypeError("first argument must be string or compiled pattern")
--> 301     p = sre_compile.compile(pattern, flags)
    302     if not (flags & DEBUG):
    303         if len(_cache) >= _MAXCACHE:

~\Anaconda3\lib\sre_compile.py in compile(p, flags)
    560     if isstring(p):
    561         pattern = p
--> 562         p = sre_parse.parse(p, flags)
    563     else:
    564         pattern = None

~\Anaconda3\lib\sre_parse.py in parse(str, flags, pattern)
    853 
    854     try:
--> 855         p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
    856     except Verbose:
    857         # the VERBOSE flag was switched on inside the pattern.  to be

~\Anaconda3\lib\sre_parse.py in _parse_sub(source, state, verbose, nested)
    414     while True:
    415         itemsappend(_parse(source, state, verbose, nested + 1,
--> 416                            not nested and not items))
    417         if not sourcematch("|"):
    418             break

~\Anaconda3\lib\sre_parse.py in _parse(source, state, verbose, nested, first)
    614             if not item or (_len(item) == 1 and item[0][0] is AT):
    615                 raise source.error("nothing to repeat",
--> 616                                    source.tell() - here + len(this))
    617             if item[0][0] in _REPEATCODES:
    618                 raise source.error("multiple repeat",

error: nothing to repeat at position 0

INSTALLED VERSIONS

commit: None python: 3.6.8.final.0 python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 142 Stepping 10, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: None.None

pandas: 0.23.4 pytest: 3.7.1 pip: 18.1 setuptools: 40.2.0 Cython: 0.29.2 numpy: 1.15.4 scipy: 1.2.0 pyarrow: None xarray: 0.11.0 IPython: 7.1.1 sphinx: 1.7.6 patsy: 0.5.1 dateutil: 2.7.3 pytz: 2018.5 blosc: None bottleneck: 1.2.1 tables: 3.4.3 numexpr: 2.6.9 feather: None matplotlib: 3.0.2 openpyxl: 2.5.5 xlrd: 1.1.0 xlwt: 1.3.0 xlsxwriter: 1.0.5 lxml: 4.2.4 bs4: 4.7.1 html5lib: 1.0.1 sqlalchemy: 1.2.10 pymysql: None psycopg2: 2.7.6.1 (dt dec pq3 ext lo64) jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

3reactions
WillAydcommented, Feb 13, 2019

It’s consistent with regex behavior where + is a special character. You will get the same error with * amongst others as well

0reactions
WillAydcommented, Feb 15, 2019

@zangell44 I think it is documented in most methods but sure if you see others where it isn’t by all means include in a PR

Read more comments on GitHub >

github_iconTop Results From Across the Web

Split strings in Python (delimiter, line break, regex, etc.)
This article describes how to split strings by delimiters, line breaks, regular expressions, and the number of characters in Python.
Read more >
Python Regex Split String using re.split() - PYnative
The Python regex split() method split the string by the occurrences of the regex pattern and returns a list of the resulting substrings....
Read more >
JavaScript String.Split() Example with RegEx - freeCodeCamp
The split method accepts one argument – a breakpoint. This breakpoint determines the points at which the splitting should occur. This breakpoint ...
Read more >
How to Split a String by a Regex in JavaScript | bobbyhadz
To split a string by a regular expression, pass a regex as a parameter to the `split()` method, e.g. `str.split(/[,.\s]/)`.
Read more >
Python Split String by Regex
In this tutorial, we will learn how to split a string by a regular expression delimiter using re python package. Examples are provided...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found