Large parts cannot be written on .NET Core due to OutOfMemoryException
See original GitHub issueDescription
I was using ClosedXML to create large excel files (100k rows with 63 colums) and I faced issue with OutOfMemoryException. I found some examples of OpenXML using SAX, so I tried to switch to OpenXML, but it didn’t help me. I tried to remove all my code (reading from DB, etc.) to try if that works, but I still get OutOfMemoryException.
My code is based on http://polymathprogrammer.com/2012/08/06/how-to-properly-use-openxmlwriter-to-write-large-excel-files/
Information
- .NET Target: .NET Core 3.1
- DocumentFormat.OpenXml Version: 2.11.3
EDIT: Same code is working fine in .NET Framework 4.7.2 with same DocumentFormat.OpenXml version.
Repro This is simple code that I use at the moment. I am testing with RAM limited to 500 MB (testing purpose). I don’t think that this code can consume so much RAM.
using (SpreadsheetDocument document = SpreadsheetDocument.Create(filePath, SpreadsheetDocumentType.Workbook))
{
document.AddWorkbookPart();
WorksheetPart wsp = document.WorkbookPart.AddNewPart<WorksheetPart>();
using (OpenXmlWriter writer = OpenXmlWriter.Create(wsp))
{
List<OpenXmlAttribute> oxa;
writer.WriteStartElement(new Worksheet());
writer.WriteStartElement(new SheetData());
for (int i = 0; i < 100000; i++)
{
oxa = new List<OpenXmlAttribute>();
oxa.Add(new OpenXmlAttribute("r", null, i.ToString()));
writer.WriteStartElement(new Row(), oxa);
for (int j = 0; j < 40; j++)
{
oxa = new List<OpenXmlAttribute>();
oxa.Add(new OpenXmlAttribute("t", null, "str"));
writer.WriteStartElement(new Cell(), oxa);
writer.WriteElement(new CellValue("test"));
writer.WriteEndElement();
}
writer.WriteEndElement();
}
writer.WriteEndElement(); // end of sheetdata
writer.WriteEndElement(); //end of worksheet
}
using (OpenXmlWriter writer = OpenXmlWriter.Create(document.WorkbookPart))
{
writer.WriteStartElement(new Workbook());
writer.WriteStartElement(new Sheets());
writer.WriteElement(new Sheet() { Id = document.WorkbookPart.GetIdOfPart(wsp), SheetId = 1, Name = "Test" });
writer.WriteEndElement();
writer.WriteEndElement();
}
}
Observed
System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at System.IO.MemoryStream.set_Capacity(Int32 value)
at System.IO.MemoryStream.EnsureCapacity(Int32 value)
at System.IO.MemoryStream.Write(Byte[] buffer, Int32 offset, Int32 count)
at System.Xml.XmlUtf8RawTextWriter.FlushBuffer()
at System.Xml.XmlUtf8RawTextWriter.RawText(Char* pSrcBegin, Char* pSrcEnd)
at System.Xml.XmlUtf8RawTextWriter.RawText(String s)
at System.Xml.XmlUtf8RawTextWriter.WriteEndElement(String prefix, String localName, String ns)
at System.Xml.XmlWellFormedWriter.WriteEndElement()
at DocumentFormat.OpenXml.OpenXmlPartWriter.WriteEndElement()
Expected
Excel file in filePath with 100k rows and 40 columns with string “test” in all cells.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:18 (4 by maintainers)
Top GitHub Comments
Here’s more of a working sample for .NET Core that should get people started on the workaround. I just wrote out a 500,000 row dataset and the memory footprint stayed pretty low. There are a few places where some extension methods are used (like SafeLeft), you can remove those and put in what you need (GetCell in particular isn’t super clean, keep in mind, proof of concept). What you’ll be interested really is the order of the
ToFile
static method and it follows the outline in the bulleted list on my last comment.The biggest limitation I can see is that you can only write one large sheet to the document (after that ReadWrite is required, I could never get a second Write only stream to work).
https://gist.github.com/blakepell/8fe938624f1dad8c28ff93a334687d77
I’d like to get this fixed (at least a work around) for v3.0. I’ve created a set of abstractions that allow for more control over things and I think we could automate some of the work arounds here (at least in an opt-in way). For the abstractions, see: #1295.
My thoughts would be to model what @M4urici0GM did, but in a more transparent way. Of course, it would be better to have this fixed in the underlying package model, but that hasn’t gone anywhere in too many years.
My initial thoughts to implementing this would be:
(1) Provide an abstraction of IPackage that would intercept calls to GetStream and write them to some temporary location (2) On save, first, save the package as normal (3) Then reopen the package in just write mode (this should allow the replacing of things without the explosion of memory) (4) write the streams from the temporary location (5) Close the package again and reopen with original mode/access
The abstractions I have should allow building this, except we’d need a way to “Reload” the underlying package. Building off of the abstractions, I’m thinking of enabling the following:
This could automatically be supported for files opened with paths or streams, but if a package is given, then it would not be supported (since we didn’t manage the package) without additional information from a user.