Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Large parts cannot be written on .NET Core due to OutOfMemoryException

See original GitHub issue

Description

I was using ClosedXML to create large excel files (100k rows with 63 colums) and I faced issue with OutOfMemoryException. I found some examples of OpenXML using SAX, so I tried to switch to OpenXML, but it didn’t help me. I tried to remove all my code (reading from DB, etc.) to try if that works, but I still get OutOfMemoryException.

My code is based on http://polymathprogrammer.com/2012/08/06/how-to-properly-use-openxmlwriter-to-write-large-excel-files/

Information

.NET Target: .NET Core 3.1
DocumentFormat.OpenXml Version: 2.11.3

EDIT: Same code is working fine in .NET Framework 4.7.2 with same DocumentFormat.OpenXml version.

Repro This is simple code that I use at the moment. I am testing with RAM limited to 500 MB (testing purpose). I don’t think that this code can consume so much RAM.

using (SpreadsheetDocument document = SpreadsheetDocument.Create(filePath, SpreadsheetDocumentType.Workbook))
{
    document.AddWorkbookPart();

    WorksheetPart wsp = document.WorkbookPart.AddNewPart<WorksheetPart>();

    using (OpenXmlWriter writer = OpenXmlWriter.Create(wsp))
    {
        List<OpenXmlAttribute> oxa;
        writer.WriteStartElement(new Worksheet());
        writer.WriteStartElement(new SheetData());

        for (int i = 0; i < 100000; i++)
        {
            oxa = new List<OpenXmlAttribute>();
            oxa.Add(new OpenXmlAttribute("r", null, i.ToString()));
            writer.WriteStartElement(new Row(), oxa);

            for (int j = 0; j < 40; j++)
            {
                oxa = new List<OpenXmlAttribute>();
                oxa.Add(new OpenXmlAttribute("t", null, "str"));
                writer.WriteStartElement(new Cell(), oxa);
                writer.WriteElement(new CellValue("test"));
                writer.WriteEndElement();
            }

            writer.WriteEndElement();
        }

        writer.WriteEndElement(); // end of sheetdata
        writer.WriteEndElement(); //end of worksheet
    }

    using (OpenXmlWriter writer = OpenXmlWriter.Create(document.WorkbookPart))
    {
        writer.WriteStartElement(new Workbook());
        writer.WriteStartElement(new Sheets());

        writer.WriteElement(new Sheet() { Id = document.WorkbookPart.GetIdOfPart(wsp), SheetId = 1, Name = "Test" });

        writer.WriteEndElement();
        writer.WriteEndElement();
    }
}

Observed

System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
   at System.IO.MemoryStream.set_Capacity(Int32 value)
   at System.IO.MemoryStream.EnsureCapacity(Int32 value)
   at System.IO.MemoryStream.Write(Byte[] buffer, Int32 offset, Int32 count)
   at System.Xml.XmlUtf8RawTextWriter.FlushBuffer()
   at System.Xml.XmlUtf8RawTextWriter.RawText(Char* pSrcBegin, Char* pSrcEnd)
   at System.Xml.XmlUtf8RawTextWriter.RawText(String s)
   at System.Xml.XmlUtf8RawTextWriter.WriteEndElement(String prefix, String localName, String ns)
   at System.Xml.XmlWellFormedWriter.WriteEndElement()
   at DocumentFormat.OpenXml.OpenXmlPartWriter.WriteEndElement()

Expected

Excel file in filePath with 100k rows and 40 columns with string “test” in all cells.

Issue Analytics

State:
Created 3 years ago
Reactions:2
Comments:18 (4 by maintainers)

Top GitHub Comments

5reactions

blakepellcommented, Jun 1, 2021

Here’s more of a working sample for .NET Core that should get people started on the workaround. I just wrote out a 500,000 row dataset and the memory footprint stayed pretty low. There are a few places where some extension methods are used (like SafeLeft), you can remove those and put in what you need (GetCell in particular isn’t super clean, keep in mind, proof of concept). What you’ll be interested really is the order of the ToFile static method and it follows the outline in the bulleted list on my last comment.

The biggest limitation I can see is that you can only write one large sheet to the document (after that ReadWrite is required, I could never get a second Write only stream to work).

https://gist.github.com/blakepell/8fe938624f1dad8c28ff93a334687d77

4reactions

twsouthwickcommented, Jan 10, 2023

I’d like to get this fixed (at least a work around) for v3.0. I’ve created a set of abstractions that allow for more control over things and I think we could automate some of the work arounds here (at least in an opt-in way). For the abstractions, see: #1295.

My thoughts would be to model what @M4urici0GM did, but in a more transparent way. Of course, it would be better to have this fixed in the underlying package model, but that hasn’t gone anywhere in too many years.

My initial thoughts to implementing this would be:

(1) Provide an abstraction of IPackage that would intercept calls to GetStream and write them to some temporary location (2) On save, first, save the package as normal (3) Then reopen the package in just write mode (this should allow the replacing of things without the explosion of memory) (4) write the streams from the temporary location (5) Close the package again and reopen with original mode/access

The abstractions I have should allow building this, except we’d need a way to “Reload” the underlying package. Building off of the abstractions, I’m thinking of enabling the following:

public interface IPackageFeature
{
  IPackage Package { get; }

+ bool CanReload { get; }

+ void Reload();
}

This could automatically be supported for files opened with paths or streams, but if a package is given, then it would not be supported (since we didn’t manage the package) without additional information from a user.