question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Bug] ValueError with text from a reference

See original GitHub issue

What a great library! I’m parsing PDF -> text files from the arXiv and a common motif that crashes the program is shown in this minimal example below:

from urlextract import URLExtract
extractor = URLExtract()

text = "et.al.[10]"
extractor.find_urls(text)

with the traceback

Traceback (most recent call last):
  File "failure.py", line 5, in <module>
    extractor.find_urls(text)
  File "/home/hoppeta/.pyenv/versions/3.6.0/lib/python3.6/site-packages/urlextract/urlextract_core.py", line 756, in find_urls
    return list(urls)
  File "/home/hoppeta/.pyenv/versions/3.6.0/lib/python3.6/site-packages/urlextract/urlextract_core.py", line 739, in gen_urls
    tmp_url = self._complete_url(text, offset + tld_pos, tld)
  File "/home/hoppeta/.pyenv/versions/3.6.0/lib/python3.6/site-packages/urlextract/urlextract_core.py", line 560, in _complete_url
    if not self._is_domain_valid(complete_url, tld):
  File "/home/hoppeta/.pyenv/versions/3.6.0/lib/python3.6/site-packages/urlextract/urlextract_core.py", line 632, in _is_domain_valid
    host = url_parts.gethost()
  File "/home/hoppeta/.pyenv/versions/3.6.0/lib/python3.6/site-packages/uritools/split.py", line 157, in gethost
    raise ValueError('Invalid host %r' % host)
ValueError: Invalid host 'et.al.[10]'

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
lipojacommented, Feb 27, 2019

Thank you for using this library. And bigger THANKS for reporting this issue! I will check it and try to fix it soon.

0reactions
thoppecommented, Mar 10, 2019

Awesome! I don’t need it to parse as a URL, just have it not crash. It’s actually a marker for a reference in an academic paper so missing it is the correct action.

On Sun, Mar 10, 2019, 10:21 AM Jan Lipovský notifications@github.com wrote:

Issue should be fixed, I’ve added both text to tests files.

@thoppe https://github.com/thoppe Right now urlextract does not return text “et.al.[10]” as URL. If you are parsing something specific you might use your own settings that fit your needs. For example by setting the stop characters using set_stop_chars_right and set_stop_chars_left.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lipoja/URLExtract/issues/30#issuecomment-471310568, or mute the thread https://github.com/notifications/unsubscribe-auth/AClOoqVBCzPz148TSIJ2YpzdbZXH0S4sks5vVRTvgaJpZM4bDjlY .

Read more comments on GitHub >

github_iconTop Results From Across the Web

Printing out actual error message for ValueError - Stack Overflow
I concur with @snapshoe, it isn't true, that the exception object has to be casted to a string before printing; you have this...
Read more >
Python ValueError Exception Handling Examples - DigitalOcean
Python ValueError is raised when a function receives an argument of the correct type but an inappropriate value. Also, the situation should not ......
Read more >
How to correct a #VALUE! error - Microsoft Support
Try using functions instead of operations​​ Functions will often ignore text values and calculate everything as numbers, eliminating the #VALUE! error.
Read more >
What is ValueError in Python? - Educative.io
ValueError in Python is raised when a user gives an invalid value to a function but is of a valid argument. It usually...
Read more >
#VALUE! Error In Excel – How To Fix
A #VALUE! error occurs when one of the values supplied isn't the value that the formula was expecting. For instance, leaving a referenced...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found