question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Getting wrong URL when there is dot before url

See original GitHub issue

For this text: extractor.find_urls("My name is claim...https://t.co/SZlazvFzYx")

URL extractor returns: ['claim...https://t.co/SZlazvFzYx']

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
lipojacommented, Jun 16, 2021

I am looking on this library as a tool that will return you as much domains as it founds even when they are “wrong” and there needs to be some post-processing.

We are trying to cover all general issues without limiting the number of returned results. Of course the results should be correct if possible. But I would rather return this domain that contains other text (e.g. website+www.example.com) rather then returning nothing at all. So users at least can see what was found and tune their parser or do some filtering.

But I would really appreciate any help in any form (discussion on some ideas, PRs, … ).

Thank you!

1reaction
Larraxcommented, Jun 24, 2019

I have run into many incorrectly extracted URLs because of this issue. What’s more, the dot is not the only problem. It’s also the at sign, colon, plus, etc. With the following input…

Visit us @www.example.com
Visit our website:www.example.com
Visit our website-www.example.com
Visit our website*www.example.com
Visit our website+www.example.com
Visit our website...www.example.com
Nonsense URL = '.example.com'

find_urls outputs this list…

@www.example.com
website:www.example.com
website-www.example.com
website*www.example.com
website+www.example.com
website...www.example.com
.example.com

And there might be more.

Read more comments on GitHub >

github_iconTop Results From Across the Web

"The resource cannot be found." error when there is a "dot" at ...
NET MVC Beta and I get the HTTP 404 (The resource cannot be found) error when I use this url which has a...
Read more >
Is using dots in URL path really a problem? | SEO Forum - Moz
we have a couple of pages displaying a dot in the URL path like domain.com/mr.smith/widget-mr.smith It displays fine in chrome, ...
Read more >
Url containg .(Dot) does not resolve. - Optimizely World
I think it should be logged as bug if not before and need a fix. Server Error in '/' Application. The resource cannot...
Read more >
Absolute domain names get trailing dot stripped from host ...
curl doesn't honour the domain name part of the redirected URL if it is an absolute name; curl strips the trailing dot.
Read more >
Typosquatting - Wikipedia
Typosquatting, also called URL hijacking, a sting site, or a fake URL, is a form of cybersquatting, and possibly brandjacking which relies on...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found