Tutorial 2: Genetic Determinants of Neuronal Morphology

We will illustrate the utility of the Laplacian score in identifying genes that contribute to the neuronal plasticity in the C. elegans. This example utilizes a dataset consisting of 799 3D neuronal reconstructions of the C.elegans DVB neuron across various mutant and control strains during days 1 to 5 of adulthood. The dataset can be downloaded from the following folder. In this tutorial we assume that the SWC files are located in the folder CAJAL/data_worm/swc. The DVB neuron is an excitatory GABAergic motor interneuron located in the dorso-rectal ganglion of the worm, and is known to undergo post-developmental neurite outgrowth in males. This outgrowth alters the neuron’s morphology and synaptic connectivity, contributing to changes in the spicule protraction step of male mating behavior. More information about this dataset can be found at:

To begin our analysis, we calculate the Gromov-Wasserstein distance between each pair of cells. For the sake of time, here we just sample 50 points per cell. This computation typically requires 20-30 minutes to complete on a standard desktop computer. A larger number of sampled points would offer better results, but would also increase the computing time.

[2]:
import cajal.sample_swc
import cajal.swc
import cajal.run_gw

cajal.sample_swc.compute_icdm_all_geodesic(
    infolder="CAJAL/data_worm/swc/",
    out_csv="CAJAL/data_worm/c_elegans_icdm.csv",
    preprocess=cajal.swc.preprocessor_geo(
        structure_ids="keep_all_types"),
    n_sample=50,
    num_processes=8)  # num_processes can be set to the number of cores on your machine

cajal.run_gw.compute_gw_distance_matrix(
    "CAJAL/data_worm/c_elegans_icdm.csv",
    "CAJAL/data_worm/c_elegans_gw_dist.csv",
    num_processes=8)
100%|████████████████████████████████████████████████████████████████████████████████▉| 798/799 [00:31<00:00, 25.03it/s]
100%|█████████████████████████████████████████████████████████████████████████| 318801/318801 [02:09<00:00, 2460.13it/s]
[2]:
(array([[0.        , 5.11936112, 3.68814622, ..., 4.50253393, 3.74849009,
         2.41605872],
        [5.11936112, 0.        , 4.16953034, ..., 5.1718854 , 3.86677755,
         3.06617002],
        [3.68814622, 4.16953034, 0.        , ..., 5.37682889, 3.85930797,
         2.66918667],
        ...,
        [4.50253393, 5.1718854 , 5.37682889, ..., 0.        , 3.52210097,
         4.01724968],
        [3.74849009, 3.86677755, 3.85930797, ..., 3.52210097, 0.        ,
         3.07758098],
        [2.41605872, 3.06617002, 2.66918667, ..., 4.01724968, 3.07758098,
         0.        ]]),
 None)

We can generate a UMAP plot that visualizes the cell morphology space, with each point colored according to the age of each worm in days. The metadata for each neuron in this example is provided in the file CAJAL/data/c_elegans_features.csv, which can be found in the GitHub repository of CAJAL. This metadata includes information such as the age of the worm in days and the genotype of each gene (0: wild-type; 1: mutant).

[1]:
import plotly.io as pio
pio.renderers.default = 'iframe'

import cajal.utilities
import umap
import pandas
import plotly.express

# Read GW distance matrix
cells, gw_dist_dict = cajal.utilities.read_gw_dists("CAJAL/data_worm/c_elegans_gw_dist.csv", header=True)
gw_dist = cajal.utilities.dist_mat_of_dict(gw_dist_dict, cells)

# Compute UMAP representation
reducer = umap.UMAP(metric="precomputed", random_state=1)
embedding = reducer.fit_transform(gw_dist)

# Download metadata
metadata = pandas.read_csv("CAJAL/data_worm/c_elegans_features.csv", index_col = "cell_name")

# Visualize UMAP
plotly.express.scatter(x=embedding[:,0],
                       y=embedding[:,1],
                       template="simple_white",
                       hover_name=[m + ".swc" for m in cells],
                       color = [str(m) for m in metadata["day"]])
2023-07-20 21:18:46.439529: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
/opt/conda/lib/python3.10/site-packages/umap/umap_.py:1780: UserWarning: using precomputed metric; inverse_transform will be unavailable
  warn("using precomputed metric; inverse_transform will be unavailable")
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.