{ "cells": [ { "cell_type": "markdown", "id": "dca49492", "metadata": {}, "source": [ "# Tutorial 6: Morphology-Aware Analysis of Subcellular Protein Localization (CellAligner)" ] }, { "cell_type": "markdown", "id": "c27d59b5", "metadata": {}, "source": [ "To demonstrate the functionality of CellAligner-OT, this tutorial analyzes 2D immunofluorescence images from the Human Protein Atlas. We will use a small subset of 373 cells from 70 images, available for download [here](https://www.dropbox.com/scl/fi/63tquyl5b6psiczrgihdn/hpa_images_metadata.zip?rlkey=7iz9cl5u35bvfupip6f0iicf3&st=ocpnazb7&dl=0)." ] }, { "cell_type": "code", "execution_count": null, "id": "be4cb3a2", "metadata": {}, "outputs": [], "source": [ "import os\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "from tqdm import tqdm\n", "import skimage as ski\n", "from cajal.subcellular import *\n", "\n", "# change to path to where data is located\n", "data_path = '/workspaces/CellAligner/hpa_images_metadata/'\n", "\n", "# load image metadata\n", "image_metadata = pd.read_csv(os.path.join(data_path, 'image_metadata.csv'), index_col=0)" ] }, { "cell_type": "markdown", "id": "7d7166a0", "metadata": {}, "source": [ "We begin by processing the cell images by sampling points from the cell boundary for morphological analysis and extracting information needed for localization analysis. We assume that cell segmentation has already been performed for each image. Nuclear segmentation is optional, but can substantilly improve localization analysis. " ] }, { "cell_type": "code", "execution_count": 3, "id": "e03a6861", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████| 60/60 [06:32<00:00, 6.54s/it]\n" ] } ], "source": [ "# create list to store cell objects\n", "cell_objects = []\n", "cell_metadata = pd.DataFrame(columns=image_metadata.columns)\n", "for i in tqdm(range(image_metadata.shape[0])):\n", " im_path = os.path.join(data_path, 'images', image_metadata.iloc[i]['image_file'])\n", " # load image\n", " im = ski.io.imread(im_path)\n", " channels = ['microtubules', 'protein', 'DNA'] # names of channels in image\n", " # load cell and nuclear segmentation masks\n", " im_cell_mask = ski.io.imread(im_path.replace('blue_red_green.jpg','predictedmask.png'))\n", " im_nuc_mask = ski.io.imread(im_path.replace('blue_red_green.jpg','predictednucmask.png'))\n", " # create cell objects from image\n", " image_cell_objects = process_image(im, channels, im_cell_mask, im_nuc_mask, ds_target_size=1000)\n", " cell_objects.extend(image_cell_objects)\n", " # save metadata for each cell\n", " n_image_cells = len(image_cell_objects)\n", " cell_metadata = pd.concat([cell_metadata, image_metadata.iloc[i:i+1].reset_index(drop=True).loc[np.repeat(0, n_image_cells)]], ignore_index=True)" ] }, { "cell_type": "markdown", "id": "363d4b4a", "metadata": {}, "source": [ "The resulting `CellAligner_Cell` objects can be kept in memory for faster analysis or written to disk when memory is limited. All functions that take `CellAligner_Cell` objects as input also accept paths to saved `CellAligner_Cell` objects.\n", "\n", "To quantify morphological variation across cells, we compute the Gromov-Wasserstein distance between each pair of cells based on points sampled from their boundaries using intracellular Euclidean distances." ] }, { "cell_type": "code", "execution_count": 4, "id": "d451e68e", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████| 69378/69378 [31:22<00:00, 36.86it/s] \n" ] } ], "source": [ "gw_dmat = gw_pairwise_parallel(cell_objects, num_processes=cpu_count(), chunksize=20) " ] }, { "cell_type": "markdown", "id": "7c526dd3", "metadata": {}, "source": [ "We can then cluster cells based on their pairwise Gromov-Wasserstein morphology distances to identify groups with similar morphologies. For visualization, we can use UMAP to embed the morphology space in two dimensions." ] }, { "cell_type": "code", "execution_count": 6, "id": "a7317fd0", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/opt/conda/lib/python3.12/site-packages/umap/umap_.py:1865: UserWarning:\n", "\n", "using precomputed metric; inverse_transform will be unavailable\n", "\n", "/opt/conda/lib/python3.12/site-packages/umap/umap_.py:1952: UserWarning:\n", "\n", "n_jobs value 1 overridden to 1 by setting random_state. Use no seed for parallelism.\n", "\n" ] }, { "data": { "text/html": [ " \n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "