Tutorial 2: Genetic Determinants of Neuronal Morphology
We will illustrate the utility of the Laplacian score in identifying genes that contribute to the neuronal plasticity in the C. elegans. This example utilizes a dataset consisting of 799 3D neuronal reconstructions of the C.elegans DVB neuron across various mutant and control strains during days 1 to 5 of adulthood. The dataset can be downloaded from the following
file. In this tutorial we assume that the SWC files are located in the folder CAJAL/data_worm/swc
. The DVB neuron is an excitatory GABAergic motor interneuron located in the dorso-rectal ganglion of the worm, and is known to undergo post-developmental neurite outgrowth in males. This outgrowth alters the neuron’s morphology and synaptic
connectivity, contributing to changes in the spicule protraction step of male mating behavior. More information about this dataset can be found at:
Hart, M. P. & Hobert, O. Neurexin controls plasticity of a mature, sexually dimorphic neuron. Nature 553, 165-170, (2018).
Govek, K. W. et al. CAJAL enables analysis and integration of single-cell morphological data using metric geometry. Nature Communications 14, 3672, (2023).
To begin our analysis, we calculate the Gromov-Wasserstein distance between each pair of cells. For the sake of time, here we just sample 50 points per cell. This computation typically requires 20-30 minutes to complete on a standard desktop computer. A larger number of sampled points would offer better results, but would also increase the computing time.
[1]:
import cajal.sample_swc
import cajal.swc
import cajal.run_gw
cajal.sample_swc.compute_icdm_all_geodesic(
infolder="CAJAL/data_worm/swc/",
out_csv="CAJAL/data_worm/c_elegans_icdm.csv",
preprocess=cajal.swc.preprocessor_geo(
structure_ids="keep_all_types"),
n_sample=50,
num_processes=8) # num_processes can be set to the number of cores on your machine
cajal.run_gw.compute_gw_distance_matrix(
"CAJAL/data_worm/c_elegans_icdm.csv",
"CAJAL/data_worm/c_elegans_gw_dist.csv",
num_processes=8)
100%|████████████████████████████████████████████████████████████████████████████████▉| 798/799 [00:14<00:00, 55.44it/s]
[1]:
(array([[ 0. , 4.14913195, 6.31674974, ..., 7.00963386,
5.775626 , 7.50550224],
[ 4.14913195, 0. , 2.48756867, ..., 7.15243735,
8.49970259, 3.69792914],
[ 6.31674974, 2.48756867, 0. , ..., 4.60601077,
8.27791246, 3.44626558],
...,
[ 7.00963386, 7.15243735, 4.60601077, ..., 0. ,
4.28019819, 7.54258831],
[ 5.775626 , 8.49970259, 8.27791246, ..., 4.28019819,
0. , 11.23488354],
[ 7.50550224, 3.69792914, 3.44626558, ..., 7.54258831,
11.23488354, 0. ]]),
None)
We can generate a UMAP plot that visualizes the cell morphology space, with each point colored according to the age of each worm in days. The metadata for each neuron in this example is provided in the file CAJAL/data/c_elegans_features.csv
, which can be found in the GitHub repository of CAJAL. This metadata includes information such as the age of the worm in days and the genotype of each gene (0: wild-type; 1: mutant).
[1]:
import plotly.io as pio
# Choose the adequate plotly renderer for visualizing plotly graphs in your system
pio.renderers.default = 'notebook_connected'
#pio.renderers.default = 'iframe'
import cajal.utilities
import umap
import pandas
import plotly.express
# Read GW distance matrix
cells, gw_dist_dict = cajal.utilities.read_gw_dists("CAJAL/data_worm/c_elegans_gw_dist.csv", header=True)
gw_dist = cajal.utilities.dist_mat_of_dict(gw_dist_dict, cells)
# Compute UMAP representation
reducer = umap.UMAP(metric="precomputed", random_state=1)
embedding = reducer.fit_transform(gw_dist)
# Download metadata
metadata = pandas.read_csv("CAJAL/data_worm/c_elegans_features.csv", index_col = "cell_name")
# Visualize UMAP
plotly.express.scatter(x=embedding[:,0],
y=embedding[:,1],
template="simple_white",
hover_name=[m + ".swc" for m in cells],
color = [str(m) for m in metadata["day"]])
/opt/conda/lib/python3.10/site-packages/umap/umap_.py:1780: UserWarning:
using precomputed metric; inverse_transform will be unavailable
Unsurprisingly, the age of the worm plays a significant role in shaping the morphology of its neurons. This is evident in the UMAP representation above, which reveals that neurons of different ages cluster in distinct regions of the UMAP. To quantify this association, we can use the Laplacian score:
[4]:
import cajal.laplacian_score
import numpy
from scipy.spatial.distance import squareform
laplacian = pandas.DataFrame(cajal.laplacian_score.laplacian_scores(numpy.array(metadata["day"]).reshape(799,1),
gw_dist,
numpy.median(squareform(gw_dist)),
permutations = 5000,
covariates = None,
return_random_laplacians = False)[0])
print(laplacian)
feature_laplacians laplacian_p_values laplacian_q_values
0 0.95148 0.0002 0.0002
A very small p value suggests a strong association between the age of the worm and the morphology of the DVB neuron.
Moving forward, our goal is to identify mutations that impact the morphology of the DVB neuron. To achieve this, we will rely on the Laplacian score once again. However, it is essential to consider the unequal representation of worms with a given genotype across different ages in the dataset. To address this issue, we will account for the uneven distribution of ages for each genotype. As an example, we will investigate the impact of deleterious mutations in the unc-25 gene. Let us first look at their distribution in the cell morphology space:
[5]:
plotly.express.scatter(x=embedding[:,0],
y=embedding[:,1],
template="simple_white",
hover_name=[m + ".swc" for m in cells],
color = [str(m) for m in metadata["unc-25"]])
The UMAP representation reveals that cells with a deleterious mutation in unc-25 exhibit similar morphology, a finding supported by the small p-value of the Laplacian score of unc-25 in the cell morphology space:
[6]:
laplacian = pandas.DataFrame(cajal.laplacian_score.laplacian_scores(numpy.array(metadata["unc-25"]).reshape(799,1),
gw_dist,
numpy.median(squareform(gw_dist)),
permutations = 5000,
covariates = None,
return_random_laplacians = False)[0])
print(laplacian)
feature_laplacians laplacian_p_values laplacian_q_values
0 0.995076 0.0018 0.0018
However, most of the samples with a mutation in unc-25 were obrained from worms with ages 1 or 3 days:
[7]:
metadata.loc[metadata["unc-25"]==1,"day"].value_counts()
[7]:
day
1 18
3 6
Name: count, dtype: int64
This leads to the question: is the comparable morphology of neurons with a deleterious mutation in unc-25 attributed to the mutation itself or the similar age of the worms? To address this issue, we can employ the Laplacian score but treating the age of the worm as a covariate:
[8]:
laplacian = pandas.DataFrame(cajal.laplacian_score.laplacian_scores(numpy.array(metadata.iloc[:,0:11]),
gw_dist,
numpy.median(squareform(gw_dist)),
permutations = 5000,
covariates = numpy.array(metadata["day"]),
return_random_laplacians = False)[0])
laplacian.index = metadata.columns.values.tolist()[0:11]
print(laplacian)
feature_laplacians laplacian_p_values laplacian_q_values beta_0 \
nrx-1 0.996719 0.006399 0.011731 0.956887
mir-1 1.000039 0.111378 0.136128 1.026848
unc-49 0.996520 0.003599 0.007918 1.028211
nlg-1 0.994465 0.000600 0.003299 0.963889
unc-25 0.995076 0.000800 0.002933 0.939057
unc-97 0.961623 0.000200 0.002200 1.018694
lim-6 1.000994 0.297341 0.327075 1.014122
lat-2 0.994034 0.001000 0.002749 1.005568
ptp-3 0.999564 0.075985 0.104479 0.996278
sup-17 0.997842 0.015797 0.024824 1.000428
pkd-2 1.001726 0.557489 0.557489 1.000327
beta_1 beta_1_p_value regression_coefficients_fstat_p_values \
nrx-1 0.044518 0.001075 0.002150
mir-1 -0.025304 0.967517 0.064965
unc-49 -0.026688 0.972844 0.054311
nlg-1 0.037528 0.003827 0.007655
unc-25 0.062309 0.000005 0.000009
unc-97 -0.017181 0.885955 0.228091
lim-6 -0.012625 0.813719 0.372562
lat-2 -0.004112 0.613235 0.773530
ptp-3 0.005176 0.356134 0.712268
sup-17 0.001059 0.470654 0.941307
pkd-2 0.001098 0.468699 0.937398
laplacian_p_values_post_regression laplacian_q_values_post_regression
nrx-1 0.043991 0.060488
mir-1 0.030594 0.048076
unc-49 0.000800 0.003519
nlg-1 0.003999 0.010998
unc-25 0.021796 0.039959
unc-97 0.000200 0.002200
lim-6 0.159568 0.175525
lat-2 0.000800 0.003519
ptp-3 0.095181 0.116332
sup-17 0.016597 0.036513
pkd-2 0.577684 0.577684
Upon examining the table, we note that the q-value of unc-25 shifts from 0.003 to 0.04 after adjusting for the covariate effect. Consistent with this, the F-statistic suggests an impact of the covariate on the Laplacian score of unc-25, as evidenced by the low p-value of the F-statistic.