question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[FEATURE] Allow for custom HttpClient implementations

See original GitHub issue

Is your feature request related to a problem? Please describe. I’m always frustrated when I have to let skrape{it} open an OkHttp3 client (via KoHttp) when I already use ktor clients in the rest of my code for other purposes.

Describe the solution you’d like Provide an interface (similar to the existing it.skrape.core.fetcher.Fetcher) that the user has to fulfill in order to use their own Http client implementation

Describe alternatives you’ve considered Fetching the HTML code manually and loading it into skrape{it} as raw String. Works, but feels like a dirty workaround.

Additional context When using the web-scraping functionality of skrape{it} in a bigger project, I already use a (custom configured) ktor client with a connection pool and other fancy stuff. It feels wrong to fire up OkHttp3 only to fetch two website HTML documents.

There already exists a Fetcher interface, and I believe that if you changed the signature from fun fetch(): Result to fun fetch(request: Request): Result that would already be enough to allow for custom client implementations. Perhaps this will also require de-coupling some configuration-specific values like SSL verification and timeouts into a second Configuration interface because most clients will only require that kind of information once upon creation and not upon every single request.

Then, lastly, the user has to be able to “override” the client engine with any custom-built adapter in the same skrape {} block where things like url and mode are currently configured. It is unclear how this mechanism will (or won’t) supersede the mode setting.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
christian-draegercommented, Apr 19, 2020

What I could also imagine and think would be a nice solution is to pull the clients completely out of the skrape{it} core and just leave an interface that than can be implemented by people or we could just deliver the implementations we already have (okhttp and html-unit) as optional dependencies. The longer I think about it this would currently be my preferred way because after we now have all the jsoup connection nicely separated and if we would pull out none native kotlin stuff (what basically are really only the http client implementations) I should then be possible to may make skrape{it} become a multiplatform Lib someday in the future - which would be really great.

0reactions
christian-draegercommented, Apr 19, 2020

i think we could/should just kill the mode setting in that case because the only thing the mode setting is doing is switching between OkHttp client and HtmlUnit client. Other feasible option seams to be having a 3rd mode option called CUSTOM or sth. Best usability would be if the users would have the possibility to pass the client implementation they want (either their own or pre-configured once like the HttpFetcher or the BrowserFetcher). i would like to still ship the HttpFetcher and BrowserFetcher for people who don’t have the use-case to implement the http client themselfs and to allow an usage that is as easy and smooth as possible. I think especially the JS excution support that comes with the BrowserFetcher (HtmlUnit) is making skrape{it} unique but maybe it would need to get a more applicable name.

Because this is a really fundamentally decision regarding the design and usage of the library i added this issue to be part of milestone 1.0.0 - if we would ship the 1.0.0 final version without this feature it has potential to imply breaking changes what we should avoid after the first final release.

if this one and the both issues regarding the matchers (that are basically just about replacing strikt from our src/main packages) are done (i will do it as soon as possible) we are good to go for the first final 1.0.0 release 💯 this makes me really happy 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

HttpClient Features - Apache HttpComponents
Full implementation of all HTTP methods (GET, POST, PUT, DELETE, HEAD, OPTIONS, and TRACE) in an extensible OO framework. Supports encryption ...
Read more >
Use IHttpClientFactory to implement resilient HTTP requests
Learn how to use IHttpClientFactory, available since .NET Core 2.1, for creating `HttpClient` instances, making it easy for you to use it in ......
Read more >
Custom HTTP Header with the Apache HttpClient | Baeldung
The plugin gently guides you through the subtleties of the most popular JPA implementations, visually reminds you of JPA features, ...
Read more >
Custom HTTP Clients · Azure/azure-sdk-for-java Wiki - GitHub
Providing a custom HttpClient requires a few interfaces/classes to be implemented along with registering your implementation with Java's service provider ...
Read more >
Apache HttpClient - Quick Guide - Tutorialspoint
Http client is a transfer library, it resides on the client side, sends and receives HTTP messages. It provides up to date, feature-rich...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found