Missing URLs using find_urls
See original GitHub issueI have run into some text (email spam) that find_urls
fails to extract all URLs from. Example input:
One night's accommodation, double occupancy http://example.com/gxhcht-5kdpwgk3/
$8000 http://example.com/gxhchu-5kdpwgk4/
Value $166.00 http://example.com/gxhchv-5kdpwgk5/
https://content.idassociates.ca/images/shopico_new/spacer.png http://example.com/gxhchw-5kdpwgk6/
Like in the South - Deluxe Room http://example.com/gxhchy-5kdpwgk8/
$20308 http://example.com/gxhchz-5kdpwgk9/
Value $406.19 http://example.com/gxhci0-5kdpwgk6/
https://content.idassociates.ca/images/shopico_new/spacer.png
http://example.com/gxhci1-5kdpwgk7/
Camping de la rivire Nicolet http://example.com/gxhci2-5kdpwgk8/
Accommodations / Cottage http://example.com
There are 12 URLs, but urlextract finds only 7 of them. Found URLs:
http://example.com/gxhcht-5kdpwgk3/
http://example.com/gxhchu-5kdpwgk4/
http://example.com/gxhchv-5kdpwgk5/
https://content.idassociates.ca/images/shopico_new/spacer.png
http://example.com/gxhci1-5kdpwgk7/
http://example.com/gxhci2-5kdpwgk8/
http://example.com
The behavior is really strange. For example, if I remove the following URL from input: https://content.idassociates.ca/images/shopico_new/spacer.png
, all the remaining 11 URLs are found.
EDIT: Also sorry for not posting a smaller test input, but all bigger modifications led to the module working properly.
Used version 0.10
Issue Analytics
- State:
- Created 4 years ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
Django blows up with 1.1, Can't find urls module
I figured it out. I was missing a urls.py that I referenced (for some reason, SVN said it was in the repo but...
Read more >Search and Rescue: 4 Ways to Find Lost URL's After a Bad ...
Therefore, I'm going to list four ways to track down those lost URL's (if you are in the unfortunate situation of having lost...
Read more >HTML Improvements - unable to find URLs | Sh404sef - Weeblr
#815 – HTML Improvements - unable to find URLs ... I have tried to locate the above URL in "URL Manager" without success....
Read more >Finding missing URLs that need redirects
We are going to talk today about all of the ways you can find URLs with backlinks to your site that might need...
Read more >Works great but some features are missing - WordPress.org
I like the plugin very much! An option to filter messages that contain links would be really helpful (95% of all spams I...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
OK, closing issue. @Larrax can reopen it if some related bug is found.
Thanks @impredicative for testing! I should not do late night releases …
@impredicative Thanks! Forgotten print is removed in 0.12.1.