Default Accepts header
See original GitHub issueNew Feature Proposal
Description
Add a default Accept: text/html
header to the HttpRequester
created by Configuration.Default.WithDefaultLoader
.
Background
I’m very new to AngleSharp so be kind if this is common sense and understood by everyone else in the community. Came across it this morning when I came to a conclusion that regex used until now was too brittle. Tried to use examples found online, such as:
var context = BrowsingContext.New(Configuration.Default.WithDefaultLoader();
var document = context.OpenAsync(url).Result;
var divs = document.QuerySelectorAll("div");
foreach (var div in divs)
{
// do something
}
To my surprise, this code would find no divs at all. Turned out that all HTML content was wrapped in another HTML document and a pre
tag, which I traced to a commit related to #331. (And sure enough execution was breakpointing in HtmlDocument.LoadTextAsync
.)
Fiddler showed that indeed response was being sent with Content-Type: text/plain;charset=UTF-8
instead of the expected Content-Type: text/html;charset=UTF-8
. Adding code adapted from #367:
var requester = new HttpRequester();
requester.Headers["Accept"] = "text/html";
var context = BrowsingContext.New(Configuration.Default.WithDefaultLoader(requesters: new[] { requester }));
…solved the problem. Not sure what web server is being used on the other side (trying to load this url but I suppose loading a web page and doing some manipulation might be a common scenario.
Because at least some web servers will return a text (rather than HTML) response if given no Accept
header, I would like to propose either adding a default Accept
header or metioning this in docs / quick guides to save newcomers an hour of figuring out what is happening and why.
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (2 by maintainers)
Top GitHub Comments
@FlorianRappl thank you for looking into this and incredibly quick turnaround! Should have a few minutes this evening to test the change, will report back.
Have just checked out the devel branch and can confirm it works as expected. Was a joy to see the new user-agent thrown in the mix too. Thanks for making this happen and fingers crossed for v0.14 going into prod soon!
EDIT: Wait a sec, v0.14 is already in prod. That makes my day even more betterer 😉