question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Default Accepts header

See original GitHub issue

New Feature Proposal

Description

Add a default Accept: text/html header to the HttpRequester created by Configuration.Default.WithDefaultLoader.

Background

I’m very new to AngleSharp so be kind if this is common sense and understood by everyone else in the community. Came across it this morning when I came to a conclusion that regex used until now was too brittle. Tried to use examples found online, such as:

var context = BrowsingContext.New(Configuration.Default.WithDefaultLoader();
var document = context.OpenAsync(url).Result;
var divs = document.QuerySelectorAll("div");
foreach (var div in divs)
{
    // do something
}

To my surprise, this code would find no divs at all. Turned out that all HTML content was wrapped in another HTML document and a pre tag, which I traced to a commit related to #331. (And sure enough execution was breakpointing in HtmlDocument.LoadTextAsync.)

Fiddler showed that indeed response was being sent with Content-Type: text/plain;charset=UTF-8 instead of the expected Content-Type: text/html;charset=UTF-8. Adding code adapted from #367:

var requester = new HttpRequester();
requester.Headers["Accept"] = "text/html";
var context = BrowsingContext.New(Configuration.Default.WithDefaultLoader(requesters: new[] { requester }));

…solved the problem. Not sure what web server is being used on the other side (trying to load this url but I suppose loading a web page and doing some manipulation might be a common scenario.

Because at least some web servers will return a text (rather than HTML) response if given no Accept header, I would like to propose either adding a default Accept header or metioning this in docs / quick guides to save newcomers an hour of figuring out what is happening and why.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
davidciecommented, Mar 31, 2020

@FlorianRappl thank you for looking into this and incredibly quick turnaround! Should have a few minutes this evening to test the change, will report back.

0reactions
davidciecommented, Mar 31, 2020

Have just checked out the devel branch and can confirm it works as expected. Was a joy to see the new user-agent thrown in the mix too. Thanks for making this happen and fingers crossed for v0.14 going into prod soon!

EDIT: Wait a sec, v0.14 is already in prod. That makes my day even more betterer 😉

Read more comments on GitHub >

github_iconTop Results From Across the Web

List of default Accept values - HTTP - MDN Web Docs
List of default Accept values. This article documents the default values for the HTTP Accept header for specific inputs and browser versions.
Read more >
Accept - HTTP - MDN Web Docs
The Accept request HTTP header indicates which content types, expressed as MIME types, the client is able to understand.
Read more >
Understanding Browser HTTP Accept Headers
The Accepts header gives the browser a chance to tell the server which format it wants for a resource. By giving a list...
Read more >
How does browser determine the Accept header?
It's essentially a historical record of formats that browser manufacturers wanted to make it easy to identify support for.
Read more >
HTTP/1.1: Header Field Definitions
Accept headers can be used to indicate that the request is specifically limited to a small set of desired types, as in the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found