wsiprocess package

Subpackages

Submodules

wsiprocess.annotation module

Annotation object.

Annotation object is optional metadata for slide. This object can handle ASAP or WSIViewer style annotation. By adding annotationparser, you can process annotation data from other types of annotation tools.

Example

Loading annotation data:: python

import wsiprocess as wp annotation = wp.annotation(“path_to_annotation_file.xml”)

Loading annotation data from image:: python

import wsiprocess as wp annotation = wp.annotation(“”)

class wsiprocess.annotation.Annotation(path=False, is_image=False, slide=False)

Bases: object

add_class(classes)
base_mask(cls, mask_height, mask_width)

Masks have same size of as the slide.

Masks are canvases of 0s.

Parameters:
  • cls (str) – Class name for each mask.

  • mask_height (int) – The height of base masks.

  • mask_width (int) – The width of base masks.

base_masks(size, wsi_height, wsi_width)

Make base masks.

Parameters:
  • size (int) – The long side of masks.

  • wsi_height (int) – The height of wsi.

  • wsi_width (int) – The width of wsi.

check_classes(annotation_class, rule_class)
check_memory_consumption(wsi_height, wsi_width)
dot_to_bbox(width=30, height=False)

Translate dot annotations to bounding boxes.

If the len(self.mask_coords[cls][idx]) is 1, the annotation is a dot. And, the dot is the midpoint of the bounding box.

Parameters:
  • width (int) – Width of the translated bounding box.

  • height (int) – Height of the translated bounding box. If not set, height is equal to width.

exclude_coords(rule)

Exclude coordinations following the rule.

Parameters:

rule (wsiprocess.rule.Rule) – Rule object.

exclude_masks(rule)

Exclude area from base mask with following the rule.

Parameters:

rule (wsiprocess.rule.Rule) – Rule object.

export_mask(save_to, cls)

Export one binary mask image.

Export mask image with 0 or 1 binaries.

Parameters:
  • save_to (str) – Parent directory to save the thumbnails.

  • cls (str) – Class name for each mask.

export_masks(save_to)

Export binary mask images.

For later computing such as segmentation, export the mask images. Exported masks have 0 or 1 binary data.

Parameters:

save_to (str) – Parent directory to save the thumbnails.

export_thumb_mask(cls, save_to='.', size=512)

Export a thumbnail of one of the masks.

For prior check, export one thumbnail of one of the masks.

Parameters:
  • cls (str) – Class name for each mask.

  • save_to (str, optional) – Parent directory to save the thumbnails.

  • size (int, optional) – Length of the long side of thumbnail.

export_thumb_masks(save_to='.', size=512)

Export thumbnail of masks.

For prior check, export thumbnails of masks.

Parameters:
  • save_to (str) – Parent directory to save the thumbnails.

  • size (int) – Length of the long side of thumbnail.

fix_mask_size()
foreground_mask(slide, size=5000, wsi_height=False, wsi_width=False, fn='otsu', min_=30, max_=190)

Make foreground mask.

With otsu thresholding, make simple foreground mask.

Parameters:
  • slide (wsiprocess.slide.Slide) – Slide object.

  • size (int, or function, optional) – Size of foreground mask on calculating with the Otsu Thresholding.

  • wsi_width (int or bool) – Width of the wsi.

  • wsi_height (int or bool) – Height of the wsi.

  • fn (str or function, optional) – Binarization method. As default, calculates with Otsu Thresholding.

  • min (int, optional) – Used if method is “minmax”. Annotation object defines foreground as the pixels with the value between “min” and “max”.

  • max (int, optional) – Used if method is “minmax”. Annotation object defines foreground as the pixels with the value between “min” and “max”.

from_image(mask, cls)

Load mask data from an image.

Parameters:
  • mask (numpy.ndarray) – 2D mask image with background as 0, and foreground as 255.

  • cls (str) – Name of the class of the mask image.

get_patch_mask(cls, x, y, w, h)
get_scale(size, wsi_height, wsi_width)
include_masks(rule)

Merge masks following the rule.

Parameters:

rule (wsiprocess.rule.Rule) – Rule object.

main_mask(cls, scale)
main_masks(size, wsi_height, wsi_width)

Main masks

Write border lines following the rule and fill inside with 255.

make_masks(slide, rule=False, foreground_fn='otsu', size=5000, min_=30, max_=190)

Make masks from the slide and rule.

Masks are for each class and foreground area.

Parameters:
  • slide (wsiprocess.slide.Slide) – Slide object

  • rule (wsiprocess.rule.Rule, optional) – Rule object

  • foreground_fn (str or callable, optional) – This can be {otsu, minmax} or a user specified function.

  • size (int, optional) – Size of foreground mask.

  • min (int, optional) – Used if method is “minmax”. Annotation object defines foreground as the pixels with the value between “min” and “max”.

  • max (int, optional) – Used if method is “minmax”. Annotation object defines foreground as the pixels with the value between “min” and “max”.

merge_include_coords(rule)

Merge coordinations following the rule.

Parameters:

rule (wsiprocess.rule.Rule) – Rule object.

read_annotation(annotation_type=False)

Parse the annotation data.

Parameters:

annotation_type (str) – If provided, pass the auto type detection.

resize_mask(wsi_height, wsi_width, cls)

Resize a mask as the same size as the slide

resize_masks(wsi_height, wsi_width)

Resize the masks as the same size as the slide

Parameters:

slide (wsiprocess.slide.Slide) – Slide object.

set_scale(size, wsi_height, wsi_width)

wsiprocess.cli module

class wsiprocess.cli.Args(command)

Bases: object

add_annotation_args(parser, slide_is_sparse=False)
add_binarization_method(parser)
add_on_annotation(parser, slide_is_sparse=False)
add_on_foreground(parser, slide_is_sparse=False)
build_args(command)

Base Parser

fillattr(key: str)
fillattrs(keys)
set_base_parser()
set_classification_args()

Arguments for classification

set_common_args(parser)

Common Arguments

set_detection_args()

Arguments for detection tasks.

set_evaluation_args()
set_method_args()
set_segmentation_args()

Arguments for segmentation tasks.

set_wsi_arg(parser)
wsiprocess.cli.main(command=None)
wsiprocess.cli.process_annotation(args, slide, rule)

wsiprocess.converter module

Convert wsiprocess style annotation data to COCO or VOC style.

class wsiprocess.converter.Converter(root, save_to, ratio_arg)

Bases: object

Converter Class Args:

Attributes:

to_coco()
to_voc()
to_yolo()

wsiprocess.error module

Custom errors for wsiprocess.

exception wsiprocess.error.AnnotationLabelError(message)

Bases: WsiProcessError

Error of annotations

Parameters:

message (str) – Message to show in the stdout.

exception wsiprocess.error.MissCombinationError(message)

Bases: WsiProcessError

Error of the combination of the method and the anntoation file.

Args

message (str): Message to show in the stdout.

exception wsiprocess.error.OnParamError(message)

Bases: WsiProcessError

Error of on_annotation.

on_annotation must be more than 0 and up to 1.

Parameters:

message (str) – Message to show in the stdout.

exception wsiprocess.error.PatchSizeTooSmallError(message)

Bases: WsiProcessError

Error of the size of patches.

This should be warning?

Parameters:

message (str) – Message to show in the stdout.

exception wsiprocess.error.SizeError(message)

Bases: WsiProcessError

Error of sizes.

The slide size is larger than the patches, and the patch size is larger than the overlap size.

Parameters:

message (str) – Message to show in the stdout.

exception wsiprocess.error.SlideLoadError(message)

Bases: WsiProcessError

Error on loading slides.

Parameters:

message (str) – Message to show in the stdout.

exception wsiprocess.error.WsiProcessError(message)

Bases: Exception

Root error class.

Parameters:

message (str) – Message to show in the stdout.

wsiprocess.patcher module

Patcher object to extract patches from whole slide images.

class wsiprocess.patcher.Patcher(slide, method, annotation=False, save_to='.', patch_width=256, patch_height=256, overlap_width=0, overlap_height=0, offset_x=0, offset_y=0, on_foreground=0.5, on_annotation=0.5, ext='jpg', magnification=False, start_sample=False, finished_sample=False, no_patches=False, crop_bbox=False, verbose=False, dryrun=False)

Bases: object

Patcher object.

Parameters:
  • slide (wsiprocess.slide.Slide) – Slide object.

  • method (str) – Method name to run. One of {“evaluation”, “classification”, “detection”, “segmentation}. Characters are converted to lowercase.

  • annotation (wsiprocess.annotation.Annotation, optional) – Annotation object.

  • save_to (str, optional) – The root of the output directory.

  • patch_width (int, optional) – The width of the output patches.

  • patch_height (int, optional) – The height of the output patches.

  • overlap_width (int, optional) – The width of the overlap areas of patches.

  • overlap_height (int, optional) – The height of the overlap areas of patches.

  • offset_x (int, optional) – The offset pixels along the x-axis.

  • offset_y (int, optional) – The offset pixels along the y-axis.

  • on_foreground (float, optional) – Ratio of overlap area between patches and foreground area.

  • on_annotation (float or dict) – Ratio of overlap area between patches and annotation. dict is also available ex: {“label”:value}.

  • magnification (int, optional) – Magnification of output patches.

  • ext (str, optional) – Extension of extracted patches.

  • start_sample (bool, optional) – Whether to save sample patches on Patcher starting.

  • finished_sample (bool, optional) – Whether to save sample patches on Patcher finished its work.

  • extract_patches (bool, optional) – This is deprecated because unless “no_patches” is set, Patcher extracts patches.

  • no_patches (bool, optional) – If set, Patcher runs without extracting patches and saves them to disk.

  • verbose (bool, optional) – If set, a progress bar appears when patching.

  • dryrun (bool, optional) – Only run patching for first 100 patches.

slide

Slide object.

Type:

wsiprocess.slide.Slide

wsi_width

Width of the slide.

Type:

int

wsi_height

Height of the slide.

Type:

int

filepath

Path to the whole slide image.

Type:

str

filestem

Stem of the file name.

Type:

str

method

Method name to run. One of {“evaluation”, “classification”, “detection”, “segmentation}

Type:

str

annotation

Annotation object.

Type:

wsiprocess.annotation.Annotation

masks

Masks to show the location of classes.

Type:

dict

classes

Classes to extract.

Type:

list

save_to

The root of the output directory.

Type:

str

p_width

The width of the cropped patches. It magnification is not set, it is same as output patches.

Type:

int

p_height

The height of the cropped patches. It magnification is not set, it is same as output patches.

Type:

int

p_area

The area of single patch.

Type:

int

o_width

The width of the overlap areas of patches.

Type:

int

o_height

The height of the overlap areas of patches.

Type:

int

offset_x

The The offset pixels along the x-axis.

Type:

int

offset_y

The The offset pixels along the y-axis.

Type:

int

on_foreground

Ratio of overlap area between patches and foreground area.

Type:

float

on_annotation

Ratio of overlap area between patches and annotation. dict is also available ex: {“label”:value}.

Type:

float or dict

magnification

Magnification of output patches.

Type:

int

p_scale

Ratio of p_width or p_height to output patch size.

Type:

float

ext

Extension of extracted patches.

Type:

str

start_sample

Whether to save sample patches on Patcher start.

Type:

bool

finished_sample

Whether to save sample patches on Patcher finish.

Type:

bool

extract_patches

Whether to save patches when Patcher runs.

Type:

bool

no_patches

Whether to save patches when Patcher runs.

Type:

bool

verbose

If set, a progress bar appears when patching.

Type:

bool, optional

dryrun

Only run patching for first 100 patches.

Type:

bool, optional

x_lefttop

Offsets of patches to the x-axis direction except for the right edge.

Type:

list

y_lefttop

Offsets of patches to the y-axis direction except for the bottom edge.

Type:

list

iterator

Offset coordinates of patches.

Type:

list

last_x

X-axis offset of the right edge patch.

Type:

int

last_y

Y-axis offset of the right edge patch.

Type:

int

result

Temporary storage for the computed result of patches.

Type:

dict

annotation_cover_patch(coords, x, y)

Check if the annotation is covering the whole patch.

Parameters:
  • coords (np.array) – Coordinations of annotations.

  • px (int) – X coordinate of left top corner of the patch.

  • py (int) – Y coordinate of left top corner of the patch.

Returns:

List of np.int64s which are the indices

of bounding boxes on the patch.

Return type:

idx_of_bb_on_patch (list)

corner_on_patch(coords, x, y)

Check if at least one of the corners is on the patch.

Parameters:
  • coords (np.array) – Coordinations of annotations.

  • px (int) – X coordinate of left top corner of the patch.

  • py (int) – Y coordinate of left top corner of the patch.

Returns:

List of np.int64s which are the indices

of bounding boxes on the patch.

Return type:

idx_of_bb_on_patch (list)

crop_patch(x, y, w=False, h=False)
find_bbs(x, y, cls)

Find bounding boxes which are on the patch.

Bounding boxes with one of its corners on the patch is on the patch.
exannotation.mask_coords[“benign”][0]

= [small_x, small_y, large_x, large_y] = [bbleft, bbtop, bbright, bbbottom]

Parameters:
  • x (int) – X-axis offset of patch.

  • y (int) – Y-axis offset of patch.

  • cls (str) – Class of the patch or the bounding box or the segmented area.

find_masks(x, y, cls)

Get the masked area corresponding to the given patch area.

Parameters:
  • x (int) – X-axis offset of a patch.

  • y (int) – Y-axis offset of a patch.

  • cls (str) – Class of the patch or the bounding box or the segmented area.

Returns:

List containing a dict of coords and its class. This

coords is a path to the png image.

Return type:

masks (list)

get_iterator(dryrun=False)
get_mini_patch_parallel(classes=False)
get_patch(x, y, classes=False)

Extract a single patch.

Parameters:
  • x (int) – X-axis offset of a patch.

  • y (int) – Y-axis offset of a patch.

  • classes (list) – For the case of method is classification, extract the patch for multiple times if the patch is on the border of two or more classes. To prevent patcher to extract a single patch for multiple classes, on_annotation=1.0 should work.

get_patch_parallel(classes=False, max_workers=-1)

Run get_patch() in parallel.

Parameters:
  • classes (list) – Classes to extract.

  • max_workers (int) – Workers to run. -1 runs with cores*5 threads.

get_random_sample(phase, sample_count=1)

Get random patch to check if the patcher can work properly.

Parameters:
  • phase (str) – When to check. One of {start, finish}

  • sample_count (int) – Number of patches to extract.

patch_on_annotation(cls, x, y)

Check if the patch is on the annotation area of a class.

Parameters:
  • cls (str) – Class of the patch or the bounding box or the segmented area.

  • x (int) – X-axis offset of a patch.

  • y (int) – Y-axis offset of a patch.

Returns:

Whether the patch is on the anntation.

Return type:

(bool)

patch_on_foreground(x, y)

Check if the patch is on the foreground area.

Parameters:
  • x (int) – X-axis offset of a patch.

  • y (int) – Y-axis offset of a patch.

Returns:

Whether the patch is on the foreground area.

Return type:

(bool)

remove_dup_in_results()

Remove duplicate results in self.result[“results”]

save_patch(patch, save_as)
save_patch_result(x, y, cls)

Save the extracted patch data to result

Parameters:
  • x (int) – X-axis offset of patch.

  • y (int) – Y-axis offset of patch.

  • cls (str) – Class of the patch or the bounding box or the segmented area.

save_results()

Save the extraction results.

Saves some metadata with the patches results.

set_magnification(slide, magnification)
side_on_patch(coords, x, y)

Check if at least one of the side is on the patch.

Parameters:
  • coords (np.array) – Coordinations of annotations.

  • px (int) – X coordinate of left top corner of the patch.

  • py (int) – Y coordinate of left top corner of the patch.

Returns:

List of np.int64s which are the indices

of bounding boxes on the patch.

Return type:

idx_of_bb_on_patch (list)

to_bb(coord)

Convert coordinates to voc coordinates.

Parameters:

coord (list) –

List of coordinates stored as below:

[[xOfOneCorner, yOfOneCorner],
 [xOfApex,      yOfApex]]

Returns:

List of coordinates stored as below:

[[xmin, ymin],
 [xmin, ymax],
 [xmax, ymax],
 [xmax, ymin]]

Return type:

outer_coord (list)

wsiprocess.rule module

Object to define rules for extracting patches. Rule file should be a json file or a dict. The content of rule.json is like below.

Example

Json data below defines

  • extract the patches of benign and malignant

  • benign includes stroma but excludes malignant or uncertain

  • malignant means malignant itself but excludes benign

{
    "benign" : {
        "includes" : [
            "stroma"
        ],
        "excludes : [
            "malignant",
            "uncertain"
        ]
    },
    "malignant": {
        "includes" : [
        ],
        "excludes" :[
            "benign"
        ]
    }
}
class wsiprocess.rule.Rule(rule)

Bases: object

Base class for rule.

Parameters:

rule (str or dict) – Path to the rule.json or the rule dict.

classes

List of the classes. i.e. [“benign”, “malignant”]

Type:

list

assert_incl_excl(incl_excl: dict)

Assert the rule has assumed keys.

Supposed shape is like below

{“includes”: [“stroma”], “excludes”: [“malignant”, “uncertain”]}

load_rule(rule: dict)

Read the rule file.

Parse the rule file and save as the classes.

wsiprocess.slide module

Slide object to pass to annotation object and patcher object. Slide is whole slide image, scanned with whole slide scanners. Mannually you can make pyramidical tiff file, which you can handle just the same as the scanned digital data, except for the magnification.

class wsiprocess.slide.Slide(path, backend='openslide')

Bases: object

Slide object.

Parameters:
  • path (str) – Path to the whole slide image file.

  • backend (str) – Openslide or pyivps.

path

Path to the whole slide image file.

Type:

str

slide

pyvips Image object.

Type:

pyvips.Image

wsi_width

Width of slide.

Type:

int

wsi_height

Height of slide.

Type:

int

crop(x, y, w, h)
export_thumbnail(save_as='./thumb.png', size=500)

Export thumbnail image.

Parameters:
  • save_as (str) – Path to save as the thumbnail image.

  • size (int, optional) – Size of the exported thumbnail.

get_thumbnail(size=500)

Get thumbnail image.

Parameters:

size (int or tuple, optional) – Size of the exported thumbnail.

load_slide()
set_properties()

Read the properties and set as attributes of slide obj.

magnification

Objective power of slide obj.

Type:

int

wsiprocess.utils module

wsiprocess.utils.show_bounding_box(patch_path, result_path, save_as)
wsiprocess.utils.show_mask_on_patch(patch_path, mask_path, save_as)

wsiprocess.verify module

Verification script runs before the patcher works. Verify class works for verification of the output directory, annotation files, rule files, etc. Mainly runs for cli.

class wsiprocess.verify.Verify(save_to, filestem, method, start_sample, finished_sample, no_patches, crop_bbox)

Bases: object

Verification class.

Parameters:
  • save_to (str) – The root of the output directory.

  • filestem (str) – The name of the output directory.

  • method (str) – Method name to run. One of {“none”, “classification”, “detection”, “segmentation}

  • start_sample (bool) – Whether to save sample patches on Patcher start.

  • finished_sample (bool) – Whether to save sample patches on Patcher finish.

  • extract_patches (bool) – [Deleted]Whether to save patches when Patcher runs.

  • no_patches (bool) – Whether to save patches when Patcher runs.

save_to

The root of the output directory.

Type:

str

filestem

The name of the output directory.

Type:

str

method

Method name to run. One of {“none”, “classification”, “detection”, “segmentation}

Type:

str

start_sample

Whether to save sample patches on Patcher start.

Type:

bool

finished_sample

Whether to save sample patches on Patcher finish.

Type:

bool

extract_patches

[Deleted]Whether to save patches when Patcher runs.

Type:

bool

no_patches

Whether to save patches when Patcher runs.

Type:

bool

magnification(slide, magnification)

Check if the slide has data for the magnification the user specified.

Parameters:
  • slide (wp.slide.Slide) – Slide object to check.

  • magnification (int) – Target magnification value which has to be smaller than the magnification of the slide.

static make_dir(path)

Make output directory.

make_dirs()

Ensure the output directories exists for each tasks.

on_params(on_annotation, on_foreground)

Verify the ratio of on_annotation.

Parameters:
  • on_annotation (float) – Overlap ratio of patches and annotations.

  • on_foreground (float) – Overlap ratio of patches and foreground area

Raises:

wsiprocess.error.SizeError – If the sizes are invalid.

sizes(wsi_width, wsi_height, offset_x, offset_y, patch_width, patch_height, overlap_width, overlap_height, dot_bbox_width=False, dot_bbox_height=False)

Verify the sizes of the slide, the patch and the overlap area.

Raises:

wsiprocess.error.SizeError – If the sizes are invalid.

Module contents