Unnecessary double encoding of url
See original GitHub issueThe url http://test.com/& is not treated correctly as of version 1.10.2 (or maybe even 1.10.1).
Jsoup.connect("http://test.com/" + URLEncoder.encode("&", "UTF-8")).get();
The url that gets passed to Jsoup is now http://test.com/%26 because the & was url encoded. So everything works fine in 1.9.2 because the encodeUrl(String url) method in the HttpConnection class does not modify the given url in this example because there is no space in the given url. The same url in 1.10.2 gets encoded again in the encodeUrl() method which leads to the following url: http://test.com/%2526 (the percent of the url passed to Jsoup is unnecessarily encoded again).
A workaround for this issue is to downgrade to 1.9.2 where the encodeUrl method was implemented differently (see below)
1.10.2:
private static String encodeUrl(String url) {
try {
URL u = new URL(url);
return encodeUrl(u).toExternalForm();
} catch (Exception e) {
return url;
}
}
1.9.2:
private static String encodeUrl(String url) {
if(url == null)
return null;
return url.replaceAll(" ", "%20");
}
Issue Analytics
- State:
- Created 7 years ago
- Reactions:1
- Comments:6 (2 by maintainers)

Top Related StackOverflow Question
jsoup 1.10.3 is out now: https://jsoup.org/news/release-1.10.3
Sorry about the issues here. This is fixed in 1.10.3 (upcoming). I fixed it back in 56a728d482ce0c88cd9513eca1e7c7585f0718f9
Will close when 1.10.3 is released