Support `file:/...` URLs for local testing
See original GitHub issueI’m writing unit tests for an application that uses JSoup, and I would like to write the unit tests in terms of local files mirroring a website rather than the website itself.
As a mitigation, I could write my tests in terms of Files rather than URLs, but this would require substantial refactoring in order to accomplish. Could we please add support for protocols like file:/...
?
Trace:
URL: file:/Users/user/Desktop/src/jsoupcrawler/target/test-classes/google/index.html
java.net.MalformedURLException: Only http & https protocols supported
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:417)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:410)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:164)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:153)
at orion.core.data.JSoupCrawler.recursiveCrawl(JSoupCrawler.java:66)
at orion.core.data.JSoupCrawler.recursiveCrawl(JSoupCrawler.java:19)
at orion.core.data.JSoupCrawlerTest.testRecursiveCrawl(JSoupCrawlerTest.java:53)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164)
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:110)
at org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:175)
at org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenForked(SurefireStarter.java:107)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68)
Issue Analytics
- State:
- Created 10 years ago
- Reactions:1
- Comments:8 (2 by maintainers)
Top Results From Across the Web
Restrictions on File Urls - text/plain
Navigating to a local file might result in that file opening in a handler application in a dangerous or unexpected way. The Same...
Read more >file URI scheme - Wikipedia
The file URI Scheme is a URI scheme defined in RFC 8089, typically used to retrieve files ... to be "localhost", the machine...
Read more >"file:" URL Test Cases - Mozilla
Running the test: Testing file URLs is complicated by the the checkloadURI feature, which disables file: URLs in network served (http: and https:)...
Read more >Flags for Local binary - BrowserStack
Flag Description Argument
‑h ‑‑help Display the help text. No args needed
‑V ‑‑version Display the current version of the binary. No args needed
‑‑only Multiple...
Read more >What is "Allow access to file URLs" in Chrome?
This option is mostly for web developers. Checking this option allows LastPass to login and fill forms for local files such as file:///C:/dev/test.html....
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Ah, could we make this line more general:
https://github.com/jhy/jsoup/blob/master/src/main/java/org/jsoup/Jsoup.java#L72
Instead of necessarily grabbing an HTTP connection, could we grab a connection based on the URL’s stated protocol, e.g.
file:/...
,ftp://...
, etc.?This is a REAL BAD IDEA from a security POV and I suggest you do NOT do this at all.