Subcellular Protein Localization

class CellAligner_Cell(coords, boundary_coords=None, intensities=None, nucleus=None, metric='euclidean')

A cell representation for CellAligner analysis.

This class encapsulates cell morphology and intensity information needed for Gromov-Wasserstein mapping and distance computations between cells.

Parameters
  • coords (numpy.ndarray) – Array of shape (N, 2) containing (x, y) coordinates for each pixel in the cell.

  • boundary_coords (numpy.ndarray or None) – Array of shape (M, 2) containing (x, y) coordinates sampled from the cell boundary. Default is None.

  • intensities (dict or None) – Dictionary mapping channel names (str) to intensity arrays (numpy.ndarray) of length N, where N is the number of cell pixels. Default is None.

  • nucleus (numpy.ndarray or None) – Array of length N indicating nuclear identity (0 or 1) for each cell pixel. Default is None.

  • metric (str or None) – Distance metric for computing coordinate distance matrices. Options are ‘euclidean’, ‘geodesic’, or None. Default is ‘euclidean’.

process_image(image, channels, cell_mask_image, nucleus_mask_image=None, ds_factor=None, ds_target_size=None, filter_border_cells=True, n_boundary_points=100, save_path=None, return_objects=True)

Create CellAligner_Cell objects from segmented microscopy images.

Processes a multi-channel microscopy image with cell and nuclear segmentation masks to create a list of CellAligner_Cell objects suitable for CellAligner analysis.

Parameters
  • image – numpy.ndarray 3D numpy array of shape (H, W, C) representing the multi-channel image.

  • channels – list of str List of channel names corresponding to the last dimension of the image.

  • cell_mask_image – numpy.ndarray 2D numpy array of shape (H, W) with integer labels for each cell (0 for background).

  • nucleus_mask_image – numpy.ndarray, optional 2D numpy array of shape (H, W) with integer labels for nuclei (0 for background). Default is None.

  • ds_factor – int, optional Downsampling factor. If provided, downsample by this factor. Default is None.

  • ds_target_size – int, optional Target number of pixels per cell after downsampling. If provided, downsample to achieve this pixel count. Default is None.

  • filter_border_cells – bool, optional If True, exclude cells touching the image border. Default is True.

  • n_boundary_points – int, optional Number of points to sample from the cell boundary. If None, boundary sampling is skipped. Default is 100.

  • save_path – str, optional Directory path to save the processed cell objects as pickle files. If None, objects are not saved. Default is None.

  • return_objects – bool, optional If True, return the list of CellAligner_Cell objects. Default is True.

Returns

If return_objects is True, returns a list of CellAligner_Cell objects; otherwise returns None.

Return type

list[CellAligner_Cell] or None

gw_pairwise_parallel(cell_objects, points='boundary', num_processes=4, chunksize=20, n_approx_anchors=None, initial_anchor=0)

Compute pairwise Gromov-Wasserstein distances (optionally in parallel).

Calculates the Gromov-Wasserstein distance matrix for a colloection of cells using either exact computation or traiangle inequality approximation with anchors.

Parameters
  • cell_objects (list) – list of CellAligner_Cell objects or file paths

  • points (str) – ‘boundary’ or ‘full’ (default: ‘boundary’)

  • num_processes (int) – number of parallel processes to use

  • chunksize (int) – chunk size for parallel imap

  • n_approx_anchors (int or None) – number of anchors for approximation (None = exact)

  • initial_anchor (int) – initial anchor index for approximation

Returns

symmetric GW distance matrix of shape (N, N)

Return type

numpy.ndarray

map_cell_to_cell(cell_object_from, cell_object_to, channels, compartment_specific=True, method='fused', fused_channel='protein', fused_cost=10, fused_param=0.1, unbalanced_param=70, nuclear_fraction=0.2)

Map protein distributions from one cell onto another via Fused Gromov-Wasserstein.

Parameters
  • cell_object_from (CellAligner_Cell) – source CellAligner_Cell

  • cell_object_to (CellAligner_Cell) – target CellAligner_Cell

  • channels (list[str]) – list of channel names to map

  • compartment_specific (bool) – whether to use compartment-specific mapping

  • method (str) – ‘fused’ or ‘fused_unbalanced’

  • fused_channel (str) – channel name to use for fused morphology cost

  • fused_cost (float) – fused channel cost multiplier

  • fused_param (float) – alpha parameter for fused GW

  • unbalanced_param (float) – regularization for unbalanced GW

  • nuclear_fraction (float) – nuclear fraction for compartment scaling

Returns

mapped distributions with shape (len(channels), n_target_pixels)

Return type

numpy.ndarray

map_to_anchor_cell(cell_objects, channels, target_cell_ind, compartment_specific=True, method='fused', fused_channel='protein', fused_cost=10, fused_param=0.1, unbalanced_param=70, nuclear_fraction=0.2)

Map protein distributions from all cells to a single target cell.

Parameters
  • cell_objects (list) – list of CellAligner_Cell objects or file paths

  • channels (list[str]) – channel names to map

  • target_cell_ind (int) – index of the target cell

  • compartment_specific (bool) – whether to use compartment-specific mapping

  • method (str) – mapping method (‘fused’ or ‘fused_unbalanced’)

  • fused_channel (str) – channel used for fused cost

  • fused_cost (float) – cost multiplier for fused channel

  • fused_param (float) – alpha parameter for fused GW

  • unbalanced_param (float) – regularization for unbalanced GW

  • nuclear_fraction (float) – probabilistic fraction considered nuclear for compartment-specific mapping (should roughly correspond to fraction of nuclear pixels)

  • parallel – whether to run in parallel

gw_mapped_ot_pairwise_parallel(cell_object, mapped_cell_dists, num_processes=4, chunksize=20, index_pairs=None, n_approx_anchors=None, initial_anchor=0)

Compute pairwise OT distances between mapped protein distributions.

Calculates pairwise optimal transport distances for protein distribution after mapping to a common cell morphology.

Parameters
  • cell_object (CellAligner_Cell or str) – target CellAligner_Cell or path to pickled object

  • mapped_cell_dists (numpy.ndarray) – array of mapped distributions (len(channels), N, n_target_pixels)

  • num_processes (int) – number of processes for parallel execution

  • chunksize (int) – chunk size for parallel imap

  • index_pairs (iterable[tuple[int, int]] or None) – optional iterable of (i, j) index pairs to compute

  • n_approx_anchors (int or None) – number of anchors for triangle inequality approximation

  • initial_anchor (int) – initial anchor index

Returns

if index_pairs is None returns array (len(channels), N, N), else (len(channels), len(index_pairs))

Return type

numpy.ndarray

find_centroid(distance_matrix)

Return the index of the centroid point (minimizes sum of distances).

Parameters

distance_matrix (numpy.ndarray) – square symmetric pairwise distance matrix

Returns

index of centroid point

Return type

int

plot_cell_image(cellaligner_cell, channels, make_square=True, ax=None, mask_alpha=0.2)

Plot a cell image with the specified channels as an RGB composite.

Creates a visualization of cell image data by combining multiple channels into an RGB representation with an optional transparent mask overlay.

Parameters
  • image (numpy.ndarray) – 3D numpy array of shape (H, W, C) representing the multi-channel image.

  • channels (list[str]) – List of channel names corresponding to the last dimension of the image.

  • cell_mask_image (numpy.ndarray) – 2D numpy array of shape (H, W) with integer labels for each cell (0 for background).

  • nucleus_mask_image (numpy.ndarray or None) – 2D numpy array of shape (H, W) with integer labels for nuclei (0 for background). Default is None.

  • ds_factor (int or None) – Downsampling factor. If provided, downsample by this factor. Default is None.

  • ds_target_size (int or None) – Target number of pixels per cell after downsampling. If provided, downsample to achieve this pixel count. Default is None.

  • filter_border_cells (bool) – If True, exclude cells touching the image border. Default is True.

  • n_boundary_points (int or None) – Number of points to sample from the cell boundary. If None, boundary sampling is skipped. Default is 100.

  • save_path (str or None) – Directory path to save the processed cell objects as pickle files. If None, objects are not saved. Default is None.

  • return_objects (bool) – If True, return the list of CellAligner_Cell objects. Default is True.

Returns

If return_objects is True, returns a list of CellAligner_Cell objects; otherwise returns None.

Return type

list[CellAligner_Cell] or None