Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Wrong Segmentation Slices Position

See original GitHub issue

I’m writing a script to attach AI made lesion segmentations to existing DICOM files (It’s my first time that I dive so deep into DICOM, so I might have missed something).

I have found your repository and used hd.seg.Segmentation, but unfortunately the order of the segmentation slices was incorrect.

First I tried to use just a dummy segmentation (simple 3D box over an MRI), and after encountering the issue I decided to try another approach: I took an existing DICOM file (of a CT scan) that contains both a scan and a segmentation, converted the segmentation to np.array, and used is as the mask input, while the scan folder is the source_images.

From the original segmentation file (9 slices between -224.5mm to -192.5mm, 4mm thickness): (0020,0032) DS [-189.136\-320.136\-224.5] # 24, 3 ImagePositionPatient (0020,0032) DS [-189.136\-320.136\-220.5] # 24, 3 ImagePositionPatient : (0020,0032) DS [-189.136\-320.136\-192.5] # 24, 3 ImagePositionPatient

Versions: Python 3.8.10 numpy 1.21.2 pydicom 2.2.1 highdicom 0.10.0

Code snippets:

:

# Collection the scan images
image_datasets = [dcmread(str(f)) for f in image_files]

# Creating a segmentation mask from the existing segmentation. shape = (108, 512, 512)
mask = np.zeros(
    shape=(
        len(image_datasets),
        image_datasets[0].Rows,
        image_datasets[0].Columns
    ),
    dtype=bool
)
mask[86:95, :, :] = np.array(hd.seg.segread(
    '/home/ben/.../original_seg.dcm'
    ).pixel_array, dtype=bool)

# Printing to check if the order is correct
sl_num = -568.5
for slice in mask:
    print(sl_num, np.unique(slice))
    sl_num += 4

Output:

-568.5 [False]
-564.5 [False]
-560.5 [False]
:
-236.5 [False]
-232.5 [False]
-228.5 [False]
-224.5 [False  True]
-220.5 [False  True]
-216.5 [False  True]
-212.5 [False  True]
-208.5 [False  True]
-204.5 [False  True]
-200.5 [False  True]
-196.5 [False  True]
-192.5 [False  True]
-188.5 [False]
-184.5 [False]
-180.5 [False]
:
-148.5 [False]
-144.5 [False]
-140.5 [False]

Creating the segmentation:

# Get meta-data information from the existing series
series_instance_uid = image_datasets[0].SeriesInstanceUID
series_number = image_datasets[0].SeriesNumber
sop_instance_uid = image_datasets[0].SOPInstanceUID
instance_number = image_datasets[0].InstanceNumber
MANUFACTURER = 'Ben' # image_datasets[0].Manufacturer
MANUFACTURER_MODEL_NAME = "Prost" # image_datasets[0].ManufacturersModelName
SOFTWARE_VERSIONS = 'v1.0'
DEVICE_SERIAL_NUMBER = '0' # image_datasets[0].DeviceSerialNumber

# Describe the algorithm that created the segmentation family:
# http://dicom.nema.org/medical/dicom/current/output/chtml/part16/sect_CID_7162.html
algorithm_identification = hd.AlgorithmIdentificationSequence(
    name=MANUFACTURER_MODEL_NAME,
    version=SOFTWARE_VERSIONS,
    family=codes.cid7162.ArtificialIntelligence
)

# Describe the segment:
# https://highdicom.readthedocs.io/en/latest/package.html#highdicom.seg.SegmentDescription
# segmented_property_category:
# http://dicom.nema.org/medical/dicom/current/output/chtml/part16/sect_CID_7150.html
# segmented_property_type:
# http://dicom.nema.org/medical/dicom/current/output/chtml/part16/sect_CID_7160.html
description_segment = hd.seg.SegmentDescription(
    segment_number=1,
    segment_label='Lesions',
    segmented_property_category=codes.cid7150.AnatomicalStructure,
    segmented_property_type=codes.cid7160.Prostate,
    algorithm_type=hd.seg.SegmentAlgorithmTypeValues.AUTOMATIC,
    algorithm_identification=algorithm_identification,
    tracking_uid=hd.UID(),
    tracking_id='Lesion Segmentation of a Prostate MR Image',
    # anatomic_regions=Code("41216001", "SCT", "Prostate"), # BA - error, seems like the others are enough
)

# Create the Segmentation instance
# https://highdicom.readthedocs.io/en/latest/package.html#highdicom.seg.Segmentation
seg_dataset = hd.seg.Segmentation(
    source_images=image_datasets,
    pixel_array=mask,
    segmentation_type=hd.seg.SegmentationTypeValues.BINARY, # FRACTIONAL,
    segment_descriptions=[description_segment],
    series_instance_uid=series_instance_uid, # hd.UID(),
    series_number=series_number,
    sop_instance_uid=sop_instance_uid, # hd.UID(),
    instance_number=instance_number,
    manufacturer=MANUFACTURER,
    manufacturer_model_name=MANUFACTURER_MODEL_NAME,
    software_versions=SOFTWARE_VERSIONS,
    device_serial_number=DEVICE_SERIAL_NUMBER,
    omit_empty_frames=True,
    # content_creator_name=manufacturer,
)


# Compare generated and original segmentations
print('seg:')
for i, slice in enumerate(seg_dataset.PerFrameFunctionalGroupsSequence):
    print(float(str(slice['PlanePositionSequence']._value[0])[-7:-1]), np.unique(seg_dataset.pixel_array[i]))

ref_path = '/home/ben/.../original_seg.dcm'
ref_seg = hd.seg.segread(ref_path)
print('\noriginal:')
for i, slice in enumerate(ref_seg.PerFrameFunctionalGroupsSequence):
    print(float(str(slice['PlanePositionSequence']._value[0])[-7:-1]), np.unique(ref_seg.pixel_array[i]))

And the outputs:

seg:
-564.5 [0 1]
-552.5 [0 1]
-520.5 [0 1]
-488.5 [0 1]
-404.5 [0 1]
-328.5 [0 1]
-324.5 [0 1]
-244.5 [0 1]
-200.5 [0 1]

original:
-224.5 [0 1]
-220.5 [0 1]
-216.5 [0 1]
-212.5 [0 1]
-208.5 [0 1]
-204.5 [0 1]
-200.5 [0 1]
-196.5 [0 1]
-192.5 [0 1]

As seen, the newly generated segmentation got wrong PlanePositionSequence. The same happens when I use omit_empty_frames=False.

I’m still reading through the library source code, found another bug and told @hackermd about it but it was not related.

Had a thought about adding a reorder by original position function, to make sure the segmentation is being mapped correctly in case the original scan files are not in order.

Any other thoughts or comments?

Many thanks to all of you anyway, it’s a really impressive library and I’m glad you’ve released it exactly when I started working on this project 😃

Issue Analytics

State:
Created 2 years ago
Comments:10 (3 by maintainers)

Top GitHub Comments

2reactions

benarnon8commented, Oct 6, 2021

So I managed to solve it, not ideal and foolproof, but it handles to source of the issue.

To collect the source files I used glob, which collects according to the arbitrary-looking filesystem appearance (source).

It was solved using sorted and reverse: image_files = sorted(series_dir.glob("*"), reverse=True)

When time allows I’ll try to implement using the order_datasets function, but for now sorting the files is good enough. And in case I find other bugs I’ll create another issue or pull request.

You can close the issue or implement the ordering verification function for the next version first.

I thank you all again, I really appreciate the time and effort!

2reactions

CPBridgecommented, Oct 6, 2021

Hi @benarnon8, thanks for using the library and for taking the time to write a nice bug report. I am open minded to there being a bug in the library but I want to eliminate a few other possibilities first. Unfortunately the order of frames within the segmentation images is much more complicated than you might initially think. This is because the standard also supports things like slide microscopy images, which are drawn from tiles of a 2D space.

I suspect your issue is due to a mismatch in the order of the planes between the input CT series and whatever the pydicom .pixel_array returns. Unfortunately there’s no reason at all to assume that these two would match. Instead of @hackermd 's approach, can you please try the following two steps (at the same time), and let me know whether this fixes your issue?

Order the input frames. Your input image frames may be out of order (depending on how you read in the images), could you please order them spatially using this function:

from typing import List
import pydicom
import numpy as np

def order_datasets(datasets: List[pydicom.Dataset]) -> List[pydicom.Dataset]:
    def calc_slice_distance(dataset: pydicom.Dataset) -> float:
        orientation = np.array(dataset.ImageOrientationPatient, dtype=float)
        position = np.array(dataset.ImagePositionPatient, dtype=float)
        normal = np.cross(orientation[0:3], orientation[3:6])
        return float(np.dot(normal, position))

    return sorted(datasets, key=calc_slice_distance)

Use highdicom tools to read segmentation frames. Make sure the way you are reading frames from the segmentation image matches your list of input files. Due to the way order is defined within the segmentation standard, it’s nearly always a bad idea to use pydicom’s .pixel_array property to access the pixel data of a segmentation. Please use the highdicom method get_pixels_by_source_instance like this to have highdicom figure this out for you and ensure that the two lists match:

seg = hd.seg.segread('/home/ben/.../original_seg.dcm')
source_sop_instance_uids = [im.SOPInstanceUID for im in image_datasets]
mask = seg.get_pixels_by_source_instance(
    source_sop_instance_uids,
    assert_missing_frames_are_empty=True,  # there are empty frames in the output
    relabel=True                           # remove the segment dimension (there is only one segment)
)

This should return the full array of the mask, including all the empty frames (so you can remove the mask[86:95, :, :] line). I know that the documentation is lacking on this point currently and will hopefully improve it soon!

If you follow these two steps, the order of the frames in the mask and the source datasets should match and I think everything should work correctly. We’ll find out!

I realise that this may not be possible, but would you be able to provide me with the set of files you are using, assuming that they are anonymised? That would help debugging immensely. Also, do you know what software created the segmentation you are reading?