{ "cells": [ { "cell_type": "markdown", "id": "dca49492", "metadata": {}, "source": [ "# Tutorial 6: Quantifying variation in subcellular protein localization (CellAligner)" ] }, { "cell_type": "markdown", "id": "c27d59b5", "metadata": {}, "source": [ "To demonstrate the functionality of GW-OT, we will perform an analysis on immunoflourescence data from the Human Protein Atlas. We will working with a small subset of 373 cells from 70 images, which can be downloaded from this [link](https://www.dropbox.com/scl/fi/63tquyl5b6psiczrgihdn/hpa_images_metadata.zip?rlkey=7iz9cl5u35bvfupip6f0iicf3&st=ocpnazb7&dl=0).\n", "\n", "First, we will process the cell images, sample points from the cell boundary for morphological analysis and storing the subcellular protein information for localization analysis. We assume that cell segmentation has been performed on each image. Nuclear segmentation is optional, but can improve compartmental specificity in the localization analysis. \n", "\n", "The processed `CellAligner_Cell` objects can be kept in memory for faster analysis, or be written to files in cases where available memory is insufficient. All functions that take `CellAligner_Cell` objects as input, can also take in the paths to the saved `CellAligner_Cell` objects." ] }, { "cell_type": "code", "execution_count": 1, "id": "be4cb3a2", "metadata": {}, "outputs": [], "source": [ "import os\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "from tqdm import tqdm\n", "import skimage as ski\n", "from cajal.subcellular import *" ] }, { "cell_type": "code", "execution_count": 2, "id": "c1ce056f", "metadata": {}, "outputs": [], "source": [ "# change to path to where data is located\n", "data_path = '/path/to/data/'\n", "\n", "# load image metadata\n", "image_metadata = pd.read_csv(os.path.join(data_path, 'image_metadata.csv'), index_col=0)" ] }, { "cell_type": "code", "execution_count": 3, "id": "e03a6861", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████| 60/60 [05:51<00:00, 5.86s/it]\n" ] } ], "source": [ "# create list to store cell objects\n", "cell_objects = []\n", "cell_metadata = pd.DataFrame(columns=image_metadata.columns)\n", "for i in tqdm(range(image_metadata.shape[0])):\n", " im_path = os.path.join(data_path, 'images', image_metadata.iloc[i]['image_file'])\n", " # load image\n", " im = ski.io.imread(im_path)\n", " channels = ['microtubules', 'protein', 'DNA'] # names of channels in image\n", " # load cell and nuclear segmentation masks\n", " im_cell_mask = ski.io.imread(im_path.replace('blue_red_green.jpg','predictedmask.png'))\n", " im_nuc_mask = ski.io.imread(im_path.replace('blue_red_green.jpg','predictednucmask.png'))\n", " # create cell objects from image\n", " image_cell_objects = process_image(im, channels, im_cell_mask, im_nuc_mask, ds_target_size=1000)\n", " cell_objects.extend(image_cell_objects)\n", " # save metadata for each cell\n", " n_image_cells = len(image_cell_objects)\n", " cell_metadata = pd.concat([cell_metadata, image_metadata.iloc[i:i+1].reset_index(drop=True).loc[np.repeat(0, n_image_cells)]], ignore_index=True)" ] }, { "cell_type": "markdown", "id": "363d4b4a", "metadata": {}, "source": [ "To capture the morphological variation between cells, we compute the Gromov-Wasserstein distance between each pair of cells, using points sampled from each cell boundary." ] }, { "cell_type": "code", "execution_count": null, "id": "d451e68e", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████| 69378/69378 [30:15<00:00, 38.22it/s] \n" ] } ], "source": [ "gw_dmat = gw_pairwise_parallel(cell_objects, num_processes=cpu_count(), chunksize=20) " ] }, { "cell_type": "markdown", "id": "7c526dd3", "metadata": {}, "source": [ "We can the cluster the cells based on the computed Gromov-Wasserstein morphology space to identify groups of cells that display similar morphologies. We can also use UMAP to embed the morphology space into 2 dimensions for visualization." ] }, { "cell_type": "code", "execution_count": 7, "id": "a7317fd0", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/opt/conda/lib/python3.10/site-packages/umap/umap_.py:1780: UserWarning:\n", "\n", "using precomputed metric; inverse_transform will be unavailable\n", "\n", "/opt/conda/lib/python3.10/site-packages/plotly/express/_core.py:1992: FutureWarning:\n", "\n", "When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.\n", "\n" ] }, { "data": { "text/html": [ " \n", " " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "