Subcellular Protein Localization
- class CellAligner_Cell(coords, boundary_coords=None, intensities=None, nucleus=None, metric='euclidean')
A cell representation for CellAligner analysis.
This class encapsulates cell morphology and intensity information needed for Gromov-Wasserstein mapping and distance computations between cells.
- Parameters
coords (numpy.ndarray) – Array of shape (N, 2) containing (x, y) coordinates for each pixel in the cell.
boundary_coords (numpy.ndarray or None) – Array of shape (M, 2) containing (x, y) coordinates sampled from the cell boundary. Default is None.
intensities (dict or None) – Dictionary mapping channel names (str) to intensity arrays (numpy.ndarray) of length N, where N is the number of cell pixels. Default is None.
nucleus (numpy.ndarray or None) – Array of length N indicating nuclear identity (0 or 1) for each cell pixel. Default is None.
metric (str or None) – Distance metric for computing coordinate distance matrices. Options are ‘euclidean’, ‘geodesic’, or None. Default is ‘euclidean’.
- process_image(image, channels, cell_mask_image, nucleus_mask_image=None, ds_factor=None, ds_target_size=None, filter_border_cells=True, n_boundary_points=100, save_path=None, return_objects=True)
Create CellAligner_Cell objects from segmented microscopy images.
Processes a multi-channel microscopy image with cell and nuclear segmentation masks to create a list of CellAligner_Cell objects suitable for CellAligner analysis.
- Parameters
image – numpy.ndarray 3D numpy array of shape (H, W, C) representing the multi-channel image.
channels – list of str List of channel names corresponding to the last dimension of the image.
cell_mask_image – numpy.ndarray 2D numpy array of shape (H, W) with integer labels for each cell (0 for background).
nucleus_mask_image – numpy.ndarray, optional 2D numpy array of shape (H, W) with integer labels for nuclei (0 for background). Default is None.
ds_factor – int, optional Downsampling factor. If provided, downsample by this factor. Default is None.
ds_target_size – int, optional Target number of pixels per cell after downsampling. If provided, downsample to achieve this pixel count. Default is None.
filter_border_cells – bool, optional If True, exclude cells touching the image border. Default is True.
n_boundary_points – int, optional Number of points to sample from the cell boundary. If None, boundary sampling is skipped. Default is 100.
save_path – str, optional Directory path to save the processed cell objects as pickle files. If None, objects are not saved. Default is None.
return_objects – bool, optional If True, return the list of CellAligner_Cell objects. Default is True.
- Returns
If return_objects is True, returns a list of CellAligner_Cell objects; otherwise returns None.
- Return type
list[CellAligner_Cell] or None
- gw_pairwise_parallel(cell_objects, points='boundary', num_processes=4, chunksize=20, n_approx_anchors=None, initial_anchor=0)
Compute pairwise Gromov-Wasserstein distances (optionally in parallel).
Calculates the Gromov-Wasserstein distance matrix for a colloection of cells using either exact computation or traiangle inequality approximation with anchors.
- Parameters
cell_objects (list) – list of CellAligner_Cell objects or file paths
points (str) – ‘boundary’ or ‘full’ (default: ‘boundary’)
num_processes (int) – number of parallel processes to use
chunksize (int) – chunk size for parallel imap
n_approx_anchors (int or None) – number of anchors for approximation (None = exact)
initial_anchor (int) – initial anchor index for approximation
- Returns
symmetric GW distance matrix of shape (N, N)
- Return type
- map_cell_to_cell(cell_object_from, cell_object_to, channels, compartment_specific=True, method='fused', fused_channel='protein', fused_cost=10, fused_param=0.1, unbalanced_param=70, nuclear_fraction=0.2)
Map protein distributions from one cell onto another via Fused Gromov-Wasserstein.
- Parameters
cell_object_from (CellAligner_Cell) – source
CellAligner_Cellcell_object_to (CellAligner_Cell) – target
CellAligner_Cellcompartment_specific (bool) – whether to use compartment-specific mapping
method (str) – ‘fused’ or ‘fused_unbalanced’
fused_channel (str) – channel name to use for fused morphology cost
fused_cost (float) – fused channel cost multiplier
fused_param (float) – alpha parameter for fused GW
unbalanced_param (float) – regularization for unbalanced GW
nuclear_fraction (float) – nuclear fraction for compartment scaling
- Returns
mapped distributions with shape (len(channels), n_target_pixels)
- Return type
- map_to_anchor_cell(cell_objects, channels, target_cell_ind, compartment_specific=True, method='fused', fused_channel='protein', fused_cost=10, fused_param=0.1, unbalanced_param=70, nuclear_fraction=0.2)
Map protein distributions from all cells to a single target cell.
- Parameters
cell_objects (list) – list of CellAligner_Cell objects or file paths
target_cell_ind (int) – index of the target cell
compartment_specific (bool) – whether to use compartment-specific mapping
method (str) – mapping method (‘fused’ or ‘fused_unbalanced’)
fused_channel (str) – channel used for fused cost
fused_cost (float) – cost multiplier for fused channel
fused_param (float) – alpha parameter for fused GW
unbalanced_param (float) – regularization for unbalanced GW
nuclear_fraction (float) – probabilistic fraction considered nuclear for compartment-specific mapping (should roughly correspond to fraction of nuclear pixels)
parallel – whether to run in parallel
- gw_mapped_ot_pairwise_parallel(cell_object, mapped_cell_dists, num_processes=4, chunksize=20, index_pairs=None, n_approx_anchors=None, initial_anchor=0)
Compute pairwise OT distances between mapped protein distributions.
Calculates pairwise optimal transport distances for protein distribution after mapping to a common cell morphology.
- Parameters
cell_object (CellAligner_Cell or str) – target
CellAligner_Cellor path to pickled objectmapped_cell_dists (numpy.ndarray) – array of mapped distributions (len(channels), N, n_target_pixels)
num_processes (int) – number of processes for parallel execution
chunksize (int) – chunk size for parallel imap
index_pairs (iterable[tuple[int, int]] or None) – optional iterable of (i, j) index pairs to compute
n_approx_anchors (int or None) – number of anchors for triangle inequality approximation
initial_anchor (int) – initial anchor index
- Returns
if
index_pairsis None returns array (len(channels), N, N), else (len(channels), len(index_pairs))- Return type
- find_centroid(distance_matrix)
Return the index of the centroid point (minimizes sum of distances).
- Parameters
distance_matrix (numpy.ndarray) – square symmetric pairwise distance matrix
- Returns
index of centroid point
- Return type
- plot_cell_image(cellaligner_cell, channels, make_square=True, ax=None, mask_alpha=0.2)
Plot a cell image with the specified channels as an RGB composite.
Creates a visualization of cell image data by combining multiple channels into an RGB representation with an optional transparent mask overlay.
- Parameters
image (numpy.ndarray) – 3D numpy array of shape (H, W, C) representing the multi-channel image.
channels (list[str]) – List of channel names corresponding to the last dimension of the image.
cell_mask_image (numpy.ndarray) – 2D numpy array of shape (H, W) with integer labels for each cell (0 for background).
nucleus_mask_image (numpy.ndarray or None) – 2D numpy array of shape (H, W) with integer labels for nuclei (0 for background). Default is None.
ds_factor (int or None) – Downsampling factor. If provided, downsample by this factor. Default is None.
ds_target_size (int or None) – Target number of pixels per cell after downsampling. If provided, downsample to achieve this pixel count. Default is None.
filter_border_cells (bool) – If True, exclude cells touching the image border. Default is True.
n_boundary_points (int or None) – Number of points to sample from the cell boundary. If None, boundary sampling is skipped. Default is 100.
save_path (str or None) – Directory path to save the processed cell objects as pickle files. If None, objects are not saved. Default is None.
return_objects (bool) – If True, return the list of CellAligner_Cell objects. Default is True.
- Returns
If return_objects is True, returns a list of CellAligner_Cell objects; otherwise returns None.
- Return type
list[CellAligner_Cell] or None