question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] unicode.split does not allow to pass None for sep

See original GitHub issue

Describe the bug I’m hitting the difference in behaviour in between CPython and Cython for unicode.split - with Cython passing sep=None explicitly raises TypeError. Please find details below:

To Reproduce Code to reproduce the behaviour:

---- 8< ---- usplit.pyx

# cython: language_level=3

def mysplit(q):
    return unicode.split(q, None)

print(mysplit("hello world"))

Expected behavior

I expect it to behave the same as in Python - i.e. print [‘hello’, ‘world’]:

---- 8< ---- usplit_py.py

def mysplit(q):
    return str.split(q, None)

print(mysplit("hello world"))
$ python usplit_py.py 
['hello', 'world']

However what I get instead is the following exception that None could not be used for sep:

$ cythonize -i usplit.pyx 
Compiling /home/kirr/usplit.pyx because it changed.
[1/1] Cythonizing /home/kirr/usplit.pyx
running build_ext
building 'usplit' extension
creating /home/kirr/tmp3kckc5wa/home
creating /home/kirr/tmp3kckc5wa/home/kirr
x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -ffile-prefix-map=/build/python3.9-RNBry6/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -g -ffile-prefix-map=/build/python3.9-RNBry6/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/kirr/src/wendelin/venv/py3.venv/include -I/usr/include/python3.9 -c /home/kirr/usplit.c -o /home/kirr/tmp3kckc5wa/home/kirr/usplit.o
x86_64-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-z,relro -g -fwrapv -O2 -Wl,-z,relro -g -fwrapv -O2 -g -ffile-prefix-map=/build/python3.9-RNBry6/python3.9-3.9.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 /home/kirr/tmp3kckc5wa/home/kirr/usplit.o -o /home/kirr/usplit.cpython-39-x86_64-linux-gnu.so
$ python -c 'import usplit'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "usplit.pyx", line 6, in init usplit
    print(mysplit("hello world"))
  File "usplit.pyx", line 4, in usplit.mysplit
    return unicode.split(q, None)
TypeError: must be str, not NoneType

Environment (please complete the following information):

  • OS: [Debian GNU/Linux 11]
  • Python version [e.g. 3.9.2]
  • Cython version [e.g. 0.29.27]

Thanks beforehand, Kirill

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:16 (13 by maintainers)

github_iconTop GitHub Comments

1reaction
navytuxcommented, Apr 15, 2022

Sure, here is my current list:

  • count
  • endswith
  • find
  • index
  • rfind
  • rindex
  • split (this one)
  • startswith
0reactions
navytuxcommented, Apr 18, 2022
Read more comments on GitHub >

github_iconTop Results From Across the Web

Python: splitting string by all space characters - Stack Overflow
The function of this character is to allow a line break at positions where it normally would not be allowed, and is thus...
Read more >
Unicode Objects and Codecs — Python 3.11.1 documentation
This function checks that unicode is a Unicode object and the index is not out of bounds, in contrast to PyUnicode_READ_CHAR() , which...
Read more >
Web Access Gateway bugs and problems
This is an old bug list about the old Web Access Gateway, which is no longer maintained, having been largely replaced by my...
Read more >
How to use Split in Python Explained - KnowledgeHut
The split function is used when we need to break down a large string into smaller strings. Strings represent Unicode character values and...
Read more >
How to Split a String in Python - 24HourAnswers
Here, “sep” stands for separator or delimiter. This value defaults to whitespace if left blank or set to None. Delimiter characters are characters...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found