Add an overload to ParseDocument(Stream stream, Encoding encoding) to specify an encoding
See original GitHub issueIs there a way to load a HtmlDocument from a byte array / a stream specifying which character encoding to use? We currently are loading the HTML document this way:
Encoding encoding = ...
using (MemoryStream ms = new MemoryStream(bytes))
{
this.document = parser.ParseDocument(ms); // I cannot specify the given encoding
}
If I understood correctly, AngleSharp tries to detect the proper encoding, but encoding could be specified in various ways (HTTP headers, Byte Order Mark, meta content-type and meta charset), so at least the HTTP header case cannot be known by the parser.
We are trying to move away from HtmlAgilityPack, which in its equivalent class has an overload to specify the encoding to use. In our program byte array can be read from the web, or be loaded from a local DB. In most cases we already have the encoding, plus we wish to permit users to visualize HTML content with a different encoding.
Does AngleSharp already provide a way to do it?
If not, can I request an implementation?
Can you add an overload:
HtmlParser.ParseDocument(Stream stream, Encoding encoding)
to specify an encoding
If the current method tries to detect the encoding, it would mean also a performance improvement when the encoding is already know.
Thank you
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (4 by maintainers)
Top GitHub Comments
Hey, that was a prompt and detailed answer! A lot to study for me… thank you!
Landed in
devel
.