question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Other datatype for LabelMap than float32

See original GitHub issue

🚀 Feature I noticed that a LabelMap and an IntensityImage are both saved as float32 tensors, which means that the LabelMap uses a lot more memory than needed. This is because this piece of code in io.py which casts all image to float32:

def _read_sitk(path: TypePath) -> Tuple[torch.Tensor, np.ndarray]:
    if Path(path).is_dir():  # assume DICOM
        image = _read_dicom(path)
    else:
        image = sitk.ReadImage(str(path))
    data, affine = sitk_to_nib(image, keepdim=True)
    if data.dtype != np.float32:
        data = data.astype(np.float32)
    tensor = torch.from_numpy(data)
    return tensor, affine

Is there a reason for this .astype(np.float32)?

This can be made a lot more memory friendly by removing this cast and storing segmentations in memory as uint8 for example. Also I expect spatial augmentations which requires resampling to be a lot faster when they work with uint8 instead of float32

Motivation

  • Better use of memory
  • Faster augmentations which require resampling

Pitch

No cast to float32 for all tensors, allowing different dtypes

Could these two lines be removed? All tests still pass when I comment them out. Maybe only cast bool to np.uint8 because SimpleITK does not support bool?

 if data.dtype != np.float32:
        data = data.astype(np.float32)

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:2
  • Comments:11 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
fepegarcommented, Dec 10, 2020

@romainVala I think the proposal is not really forcing a specific type, but stopping forcing everything to be float 32. So your partial volume maps (which maybe shouldn’t be instantiated as a label map, as they don’t contain categorical labels) would still be processed fine.

1reaction
fepegarcommented, Dec 10, 2020

I just tried this code

import numpy as np
import SimpleITK as sitk

array = 256 * np.random.rand(256, 256, 180)

im_float = sitk.GetImageFromArray(array.astype(np.float32))
im_char = sitk.GetImageFromArray(array.astype(np.uint8))

transform = sitk.Euler3DTransform()
transform.SetRotation(10, 20, 30)

And then these:

In [2]: %timeit sitk.Resample(im_float, transform)                                                                                                      
20.2 ms ± 484 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [3]: %timeit sitk.Resample(im_char, transform)                                                                                                       
15.2 ms ± 158 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

So you’re right, it’s faster in uint8. I did this because some transforms required float and so I just transformed everything to float. Another reason is that having a consistent data type, everything works smoothly with a data loader:

In [9]: import torch 
   ...: import numpy as np 
   ...:  
   ...: class Dataset: 
   ...:     def __len__(self): 
   ...:         return 10 
   ...:      
   ...:     def __getitem__(self, i): 
   ...:         x = 10 * np.random.rand(10) 
   ...:         if i % 2: 
   ...:             x = x.astype(np.uint8) 
   ...:         return x 
   ...:  
   ...: loader = torch.utils.data.DataLoader(Dataset(), batch_size=5) 
   ...: next(iter(loader)) 
[...]

~/miniconda3/envs/episurg/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py in default_collate(batch)
     54             storage = elem.storage()._new_shared(numel)
     55             out = elem.new(storage)
---> 56         return torch.stack(batch, 0, out=out)
     57     elif elem_type.__module__ == 'numpy' and elem_type.__name__ != 'str_' \
     58             and elem_type.__name__ != 'string_':

RuntimeError: Expected object of scalar type Double but got scalar type Byte for sequence element 1 in sequence argument at position #1 'tensors'

The tests probably pass because they typically use images that are created in float 32 (and obviously because they’re not complete enough).

I agree that saving in float by default is not good. There should be at least a kwarg for the dtype.

So what do you think? I suppose there could be a Cast transform that could be used before a data loader (and by transforms that need float) but this would be quite backwards incompatible. But if it makes the library way faster, might be a good thing to do.

Read more comments on GitHub >

github_iconTop Results From Across the Web

dvid/labelmap.go at master · janelia-flyem/dvid - GitHub
Package labelmap handles both volumes of label data as well as indexing to. quickly find and generate sparse volumes of any particular label....
Read more >
Data Types and Formats – Data Analysis and Visualization in ...
If we divide one integer by another, we get a float. The result on Python 3 is different than in Python 2, where...
Read more >
TypeError: object of type 'numpy.float32' has no len()
I can try this, I just don't see why it would be giving me this error on my custom model and not every...
Read more >
Basic Data Types in Python
Learn the basic data types that are built into Python, like numbers, strings, and Booleans. You'll also get an overview of Python's built-in...
Read more >
Data types — NumPy v1.24 Manual
There are 5 basic numerical types representing booleans (bool), integers (int), unsigned integers (uint) floating point (float) and complex. Those with numbers ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found