question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

IndexError: string index out of range

See original GitHub issue

There seems to be a bug when running the following test code:

import contractions
test_str = "He continued his studies at Maltepe Military High School in İzmir and then at the Turkish Military Academy, graduating in 2000, after which he returned to Azerbaijan."
contractions.fix(test_str)

Here is the Traceback:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "{...}\site-packages\contractions\__init__.py", line 239, in fix
    return ts.replace(s)
  File "{...}\site-packages\textsearch\__init__.py", line 561, in replace
    start, stop, result = handler(text, start, stop, norm)
  File "{...}\site-packages\textsearch\__init__.py", line 371, in bounds_check
    if len(text) != stop and text[stop] in self.right_bound_chars:
IndexError: string index out of range

Could it be due to the special characters in test_str, and if so, how we deal with it?

Thanks!

Edit: Also fails for the following examples:

contractions.fix("Imishli (İmişli) is a rayon of Azerbaijan.")
contractions.fix("Fenerbahçe have several supporter organisations, including Genç Fenerbahçeliler (GFB), Kill For You (KFY), Ultras Fener, Antu/Fenerlist, EuroFeb, Group CK (Cefakâr Kanaryalar), 1907 ÜNİFEB, Vamos Bien, and SUADFEB.")

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:9 (3 by maintainers)

github_iconTop GitHub Comments

4reactions
tomaarsencommented, May 31, 2021

@kootenpv

The bug at display is definitely tricky - but it boils down to the following:

>>> character = "İ"
>>> character
'İ'
>>> len(character)
1
>>> character.lower()
'i̇'
# Note: In my IDE (VSCode), this appears as roughly '.i'
>>> len(character.lower())
2

You may wonder where exactly this produces problems. Well, first, fix of contradictions gets called. This function is awfully simple, and just calls the replace method of a TextSearch object. (For reference:) https://github.com/kootenpv/textsearch/blob/92b6584513e9b8afc98268e91a8a434f3495b5f5/textsearch/__init__.py#L540-L584

In particular, on line 553 the input text is converted to lowercase. With the example of contractions.fix("İ jan."), the length of _text is one larger than the length of text. This length is then extracted on line 555 by self.automaton.iter. This length is used in handler, which is self.bounds_check at one point. This self.bounds_check then uses the original text, and the increased length of _text.

Then, in self.bounds_check, this increased length is used alongside the original text, causing the Index Error. This issue is thrown on line 371 here: https://github.com/kootenpv/textsearch/blob/92b6584513e9b8afc98268e91a8a434f3495b5f5/textsearch/__init__.py#L370-L371

This is more of an issue with TextSearch from https://github.com/kootenpv/textsearch.

0reactions
kootenpvcommented, May 24, 2021

It doesn’t seems to work on apostrophe, (possessive case of nouns). Is there any work around for this?

What’s the example?

Read more comments on GitHub >

github_iconTop Results From Across the Web

String Index Out Of Range Python - MindMajix Community
The string index out of range indicates that the index you are attempting to reach does not exist. Within a string, that implies...
Read more >
Python error: "IndexError: string index out of range"
You are iterating over one string ( word ), but then using the index into that to look up a character in so_far...
Read more >
IndexError: string index out of range - Net-Informations.Com
The string index out of range means that the index you are trying to access does not exist. In a string, that means...
Read more >
Python TypeError: string index out of range Solution
The “TypeError: string index out of range” error is raised when you try to access an item at an index position that does...
Read more >
IndexError: string index out of range - STechies
To access a specific element, you have to mention its index value. But in some cases you may encounter an error called “IndexError...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found