question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bug Report

Prerequisites

  • [Y ] Can you reproduce the problem in a MWE?
  • [ Y] Are you running the latest version of AngleSharp?
  • [?] Did you check the FAQs to see if that helps you?
  • [Y] Are you reporting to the correct repository? (there are multiple AngleSharp libraries, e.g., AngleSharp.Css for CSS support)
  • [Y] Did you perform a search in the issues?

For more information, see the CONTRIBUTING guide.

Description

I’m seeing the same issue as #416 - downloaded page is gzip encoded and AngleSharp is not decompressing it.

Steps to Reproduce

Does not work:

var config = Configuration.Default.WithLocaleBasedEncoding().WithDefaultLoader();
var address = "https://www.powerball.com";
var context = BrowsingContext.New(config);
var document = await context.OpenAsync(address);

Expected behavior: [What you expected to happen]

Document should be decompressed plain text.

Actual behavior:

Document is “garbled” gzipped content.

Environment details: [OS, .NET Runtime, …]

Windows 11 x64, .NET 7.0.302, AngleSharp 1.0.3

Possible Solution

Works:

HttpClientHandler handler = new HttpClientHandler()
{
    AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate
};        
var http = new HttpClient(handler);
var body = await http.GetStringAsync("https://www.powerball.com");

var config = Configuration.Default;
var context = BrowsingContext.New(config);
var document = await context.OpenAsync(req => req.Content(body));

Working around this by using HTTP Client. Is this the recommended approach? I searched for “org:AngleSharp gzip” and didn’t see any recommendations or FAQ guidance. I assume this should be out-of-box automatic behavior, so maybe I’m missing something.

Issue Analytics

  • State:closed
  • Created 3 months ago
  • Comments:5 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
FlorianRapplcommented, Jun 24, 2023

By the way I could finally reproduce this. The server is sometimes returning Brotli (“br”) compressed responses - even though the Accept-Encoding header tells it that only “deflate” and “gzip” are supported.

In such case we’ll now throw an exception. The response would have been gibberish anyway and this way one can react. Should be a rare case though - this is certainly a problem on the webserver.

1reaction
FlorianRapplcommented, Jun 14, 2023

Working around this by using HTTP Client. Is this the recommended approach?

Yes, the requester coming with AngleSharp is not using the HttpClient and should only be used in simple cases. If you heavily rely on IO then you should use AngleSharp.Io.

I’ll see if this is a general problem with the requester (not being able to process gzip) or if this is something with the page. If its a general problem then we need to drop the “gzip” from accepted encodings.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Content-Encoding - HTTP - MDN Web Docs
Content encoding is mainly used to compress the message data without losing information about the origin media type. Note that the original ...
Read more >
Transfer-Encoding: gzip vs. Content-Encoding: gzip
Content -encoding refers to the content encoding on the server in the abstract, i.e. the content will consistently be served in specified ...
Read more >
gzip - Wikipedia
gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and...
Read more >
gzip — Support for gzip files — Python 3.11.4 documentation
Source code: Lib/gzip.py This module provides a simple interface to compress and ... gzip.open(filename, mode='rb', compresslevel=9, encoding=None, ...
Read more >
How To Optimize Your Site With GZIP Compression
The header “Content-encoding: gzip” means the contents were sent compressed. chrome gzip header. Click the “Use large rows” icon to get more details,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found