Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Response is returned with "Byte order mark" (BOM) and some xmlSerilizers fail to parse

See original GitHub issue

We used SOAP core to convert some of our legacy WCF (HTTP\ SOAP) project to NetCore. after some regression tests we found out that the SOAP response from our net NetCore services failed at serialization with this error content-is-not-allowed-in-prolog-when-parsing-perfectly-valid-xm

after some digging and viewing hidden characters indeed this is the case and the response encoding returns BOM (Byte order mark) which causing the issue.

we have workedaround this issue by injecting a MemoryStream into the response and “Hijacking it” and after the response is returned from SoapCore middleware we use our own encoding to write this is the middleware we used:

public static class BomKillerMiddleware
{
    /// <summary>
    /// Please read this page: https://www.freecodecamp.org/news/a-quick-tale-about-feff-the-invisible-character-cd25cd4630e7/
    /// SoapCore adds a hidden character to the response: FEFF, an invisible UTF-8 that ruins XML parsing. It causes several erros
    /// and the only way we found to solve it is like this: rewriting the response to the stream with UTF8 encoding. This omits the hidden character
    /// from the response.
    /// </summary>
    public static IApplicationBuilder UseBomKiller(this IApplicationBuilder app)
    {
        return app.Use(async (context, _next) =>
        {
            using (var inMemoryResponse = new MemoryStream())
            {
                var originalResponseStream = context.Response.Body;
                context.Response.Body = inMemoryResponse;

                await _next.Invoke();

                inMemoryResponse.Seek(0, SeekOrigin.Begin);

                using (var streamReader = new StreamReader(inMemoryResponse))
                {
                    var bodyAsText = await streamReader.ReadToEndAsync();

                    context.Response.Body = originalResponseStream;
                    await context.Response.WriteAsync(bodyAsText, Encoding.UTF8);
                }
            }
        });
    }
}

Issue Analytics

State:
Created 4 years ago
Reactions:4
Comments:7

Top GitHub Comments

13reactions

RafVandesandecommented, Mar 11, 2020

For my case I found a better solution by passing in a binding with a specific TextEncoding:

new BasicHttpBinding
{
   TextEncoding = new UTF8Encoding(false)
}

The boolean passed to the UTF8Encoding’s constructor indicates that it should not write a BOM. See this article: UTF8Encoding constructor

SoapCore uses this as a WriteEncoding: SoapEndpointExtensions.cs

2reactions

l3endercommented, Dec 29, 2021

It can also be solved by specifying the encoding on the soap options object:

var encoder = new SoapEncoderOptions();
encoder.WriteEncoding = new UTF8Encoding(false);

app.UseEndpoints(endpoints => {
    endpoints.UseSoapEndpoint<IPingService>("/Service.svc", encoder, SoapSerializer.DataContractSerializer);
});

Top Results From Across the Web

c# - XmlReader breaks on UTF-8 BOM

In my request handler I'm serializing a response object and sending it back as a string. The serialization process adds a UTF-8 BOM...

Byte Order Marks and XmlDocument Streaming to HTTP

Basically the problem is that .NET's default XmlTextWriter encoding uses UTF-8 and the default Encoding includes generation of a BOM as part of ......

SAX Error - Content is not allowed in prolog

Invalid text or BOM before the XML declaration or different encoding will cause the SAX Error – Content is not allowed in prolog....

Removing a BOMB from Your Text!

Anyway, a BOM (Byte Order Mark) is firstly a zero-width, non-breaking space. This means that it should never be rendered, which is why...

https://raw.githubusercontent.com/dotnet/samples/m...

Console Some resource strings got lost in the move to GitHub. ... IO Add option on StreamWriter not to emit Byte Order Mark...