question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

OutOfMemoryException when parsing Excel document / endless while-loop

See original GitHub issue

I’m trying to parse an Excel document (xls) using the OpenMcdf.Extensions package.

We call AsOLEProperties extension method and eventually, the execution gets stuck in this while loop (https://github.com/ironfede/openmcdf/blob/master/sources/OpenMcdf/CompoundFile.cs#L1493-L1514) :

while (true)
{
    if (nextSecID == Sector.ENDOFCHAIN)
        break;

    Sector ms = new Sector(Sector.MINISECTOR_SIZE, sourceStream);
    byte[] temp = new byte[Sector.MINISECTOR_SIZE];

    ms.Id = nextSecID;
    ms.Type = SectorType.Mini;

    miniStreamView.Seek(nextSecID * Sector.MINISECTOR_SIZE, SeekOrigin.Begin);
    miniStreamView.Read(ms.GetData(), 0, Sector.MINISECTOR_SIZE);

    result.Add(ms);

    miniFATView.Seek(nextSecID * 4, SeekOrigin.Begin);
    nextSecID = miniFATReader.ReadInt32();
}

When the loop is entered, the nextSecID is 27. At the end of the loop the nextSecID is set to 0. And the nextSecID keeps being null as the same data is read on each loop.

Is 0 even a valid value for nextSecID?

Any idea, what to do about this?

The document in question can be opened fine in Excel. Unfortunately, we got it from a customer of ours, so we can’t share it.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:11 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
henning-krausecommented, Oct 28, 2018

@ironfede Of course, this doesn’t fix the underlying problem, but at least the load operation fails fast and does not cause an OutOfMemoryException. So it’s a quick fix.

That being said, I don’t even think that this PR fixed all problems - only the simple ones. In multiple places, the following check is performed:

if (next != nextSecID)
    nextSecID = next;
else
    throw new CFCorruptedFileException("Cyclic sector chain found. File is corrupted");

This certainly helps when you have a corrupt file where one sector points to itself. But what if you have something like a cyclic chain?

1 -> 2 -> 3 -> 1

This wouldn’t be catched and we would be in the same mess as before. I would propose a change in that check that we keep a list of all processed ids and if we hit one we had before, we’ll throw the CFCorruptedFileException.

As for the source file: It was created by one of our customers. I’ll try to strip out all the content and to reproduce the error. If it still happens, I might be able to share it with you.

0reactions
Numpsycommented, Jan 22, 2019

Yes, there are potentially issues in multiple places (there is a file attached to #40 that makes it blow up in GetDifatSectorChain).

Read more comments on GitHub >

github_iconTop Results From Across the Web

OutOfMemory issue while creating XSSFWorkbook ...
This can either be an error in the library which leads to an endless loop (which will end in an OOME no matter...
Read more >
System.OutOfMemoryException when reading corrupt ...
OutOfMemoryException with the call stack: at System. ... OutOfMemoryException when parsing Excel document / endless while-loop #30.
Read more >
Getting Out of Memory Exception when downloading large ...
I am trying to download the document from file server. FileSize maybe more than 1GB. Here the thing is, very first time file...
Read more >
System.OutOfMemoryException While loading data from ...
Excel cannot hold an infinite amount of data. You probably really are running the machine out of memory or you're Exceeding the amount...
Read more >
How to avoid 'system.outofmemoryexception' error in ...
So, I tried fetching the data in ranges(row limits) to a datatable in a dataset and write the corresponding data to an Output...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found