question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Pickle Crashes for Large Data

See original GitHub issue

Describe the bug, what’s wrong, and what you expected.

pickle crashes for large data.

Steps to reproduce the bug.

import pickle

import numpy as np
import pyvista as pv
from vtkmodules.vtkIOLegacy import vtkDataSetWriter


def pickle_vtk(mesh, filename):
    writer = vtkDataSetWriter()
    writer.SetInputDataObject(mesh)
    writer.SetWriteToOutputString(True)
    writer.SetFileTypeToBinary()
    writer.Write()
    to_serialize = writer.GetOutputString()

    with open(filename, 'wb') as handle:
        pickle.dump(to_serialize, handle, protocol=pickle.HIGHEST_PROTOCOL)

    return filename


if __name__ == '__main__':
    dims = (2154, 1500, 1167)

    volume = pv.UniformGrid(
        dims=(2154, 1500, 1167),
        spacing=(1, 1, 1),
        origin=(0, 0, 0),
    )

    volume.point_data['scalars'] = np.zeros(
        shape=(dims[0] * dims[1] * dims[2],), dtype=np.uint8
    )

    fname = 'filename.vtkpickle'
    pickle_vtk(volume, fname)  # Crashes

However, I have no problem pickling the numpy array by itself to file (this is 3.7 GB).

System Information

--------------------------------------------------------------------------------
  Date: Sun Sep 04 11:59:36 2022 Pacific Daylight Time

                OS : Windows
            CPU(s) : 96
           Machine : AMD64
      Architecture : 64bit
               RAM : 190.7 GiB
       Environment : Python
        GPU Vendor : NVIDIA Corporation
      GPU Renderer : Quadro RTX 8000/PCIe/SSE2
       GPU Version : 4.5.0 NVIDIA 516.40

  Python 3.8.10 (tags/v3.8.10:3d8993a, May  3 2021, 11:48:03) [MSC v.1928 64
  bit (AMD64)]

           pyvista : 0.37.dev0
               vtk : 9.1.0
             numpy : 1.23.2
           imageio : 2.21.2
           appdirs : 1.4.4
            scooby : 0.5.12
        matplotlib : 3.5.3
         pyvistaqt : 0.9.0
           IPython : 8.4.0
              tqdm : 4.64.0
            meshio : 5.3.4
--------------------------------------------------------------------------------

Screenshots

No response

Issue Analytics

  • State:closed
  • Created a year ago
  • Reactions:1
  • Comments:19 (19 by maintainers)

github_iconTop GitHub Comments

2reactions
adeakcommented, Sep 6, 2022

To focus on the issue at hand (as in pyvista issue), I’d like to recap:

  1. the snippet in the original comment here crashes for @adam-grant-hendry on windows
  2. it runs fine for @whophil on a system with much less memory (“runs fine” meaning writer.Writer() returns successfully).
  3. since all of this is VTK, this is a potential VTK issue, and independent from pickling.

If we all agree, my recommendation would be to reduce the example to remove pyvista (by using a native VTK grid) and remove pickling (which is a red herring) and if the code still crashes for Adam and still doesn’t crash for Phil, compare the two setups (OS, VTK version mainly) and open an issue with VTK. They will be able to tell whether what we see is expected (due to some memory management quirk) or a bug.

This will also affect PyVista through our pickling mechanism, but as long as we rely on these VTK writers we’ll have to wait for such potential bugs to be fixed upstream.

2reactions
whophilcommented, Sep 5, 2022

I’m struggling to find what’s going on in VTK, for what it’s worth. vtkDataSetWriter only seems to have a WriteData() method, but no Write()

vtkDataSetWriter.Write() is inherited from vtkWriter (see https://vtk.org/doc/nightly/html/classvtkDataSetWriter-members.html)

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to perform a pickling so that it is robust against crashing?
I routinely use pickle.dump() to save large files in Python 2.7. In my code, I have one .pickle file that I continually update...
Read more >
Crash when load pickle from pandas · Issue #9307 - GitHub
pandas.read_pickle('2017-01-01.pickle') Program received signal ... extensions = true context ID: adaptive or shared L1 data = false FMA ...
Read more >
Issue 25465: Pickle uses O(n) memory overhead - Issue Tracker
That is, using python 3: data = {'%06d' % i: i for i in range(30 * 1000 ** 2)} # data consumes a...
Read more >
Pickling in Python - The Very Basics - Ian London's Blog
You just ran through a time-consuming process to load a bunch of data into a python object. Maybe you scraped data from thousands...
Read more >
crash - session_pickled = pickle.dumps(self ... - Google Groups
This is clearly a case of file being corrupted. If you remove your cache files does it happen again? Do you have a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found