Tutorial 2: Genetic Determinants of Neuronal Morphology

We will illustrate the utility of the Laplacian score in identifying genes that contribute to the neuronal plasticity in the C. elegans. This example utilizes a dataset consisting of 799 3D neuronal reconstructions of the C.elegans DVB neuron across various mutant and control strains during days 1 to 5 of adulthood. The dataset can be downloaded from the following file. In this tutorial we assume that the SWC files are located in the folder CAJAL/data_worm/swc. The DVB neuron is an excitatory GABAergic motor interneuron located in the dorso-rectal ganglion of the worm, and is known to undergo post-developmental neurite outgrowth in males. This outgrowth alters the neuron’s morphology and synaptic connectivity, contributing to changes in the spicule protraction step of male mating behavior. More information about this dataset can be found at:

To begin our analysis, we calculate the Gromov-Wasserstein distance between each pair of cells. For the sake of time, here we just sample 50 points per cell. This computation typically requires 20-30 minutes to complete on a standard desktop computer. A larger number of sampled points would offer better results, but would also increase the computing time.

[1]:
import cajal.sample_swc
import cajal.swc
import cajal.run_gw

cajal.sample_swc.compute_icdm_all_geodesic(
    infolder="CAJAL/data_worm/swc/",
    out_csv="CAJAL/data_worm/c_elegans_icdm.csv",
    preprocess=cajal.swc.preprocessor_geo(
        structure_ids="keep_all_types"),
    n_sample=50,
    num_processes=8)  # num_processes can be set to the number of cores on your machine

cajal.run_gw.compute_gw_distance_matrix(
    "CAJAL/data_worm/c_elegans_icdm.csv",
    "CAJAL/data_worm/c_elegans_gw_dist.csv",
    num_processes=8)
100%|████████████████████████████████████████████████████████████████████████████████▉| 798/799 [00:14<00:00, 55.44it/s]
[1]:
(array([[ 0.        ,  4.14913195,  6.31674974, ...,  7.00963386,
          5.775626  ,  7.50550224],
        [ 4.14913195,  0.        ,  2.48756867, ...,  7.15243735,
          8.49970259,  3.69792914],
        [ 6.31674974,  2.48756867,  0.        , ...,  4.60601077,
          8.27791246,  3.44626558],
        ...,
        [ 7.00963386,  7.15243735,  4.60601077, ...,  0.        ,
          4.28019819,  7.54258831],
        [ 5.775626  ,  8.49970259,  8.27791246, ...,  4.28019819,
          0.        , 11.23488354],
        [ 7.50550224,  3.69792914,  3.44626558, ...,  7.54258831,
         11.23488354,  0.        ]]),
 None)

We can generate a UMAP plot that visualizes the cell morphology space, with each point colored according to the age of each worm in days. The metadata for each neuron in this example is provided in the file CAJAL/data/c_elegans_features.csv, which can be found in the GitHub repository of CAJAL. This metadata includes information such as the age of the worm in days and the genotype of each gene (0: wild-type; 1: mutant).

[ ]:
import plotly.io as pio

# Choose the adequate plotly renderer for visualizing plotly graphs in your system
#pio.renderers.default = 'notebook_connected'
pio.renderers.default = 'iframe'

import cajal.utilities
import umap
import pandas
import plotly.express

# Read GW distance matrix
cells, gw_dist_dict = cajal.utilities.read_gw_dists("CAJAL/data_worm/c_elegans_gw_dist.csv", header=True)
gw_dist = cajal.utilities.dist_mat_of_dict(gw_dist_dict, cells)

# Compute UMAP representation
reducer = umap.UMAP(metric="precomputed", random_state=1)
embedding = reducer.fit_transform(gw_dist)

# Download metadata
metadata = pandas.read_csv("CAJAL/data_worm/c_elegans_features.csv", index_col = "cell_name")

# Visualize UMAP
plotly.express.scatter(x=embedding[:,0],
                       y=embedding[:,1],
                       template="simple_white",
                       hover_name=[m + ".swc" for m in cells],
                       color = [str(m) for m in metadata["day"]])
/opt/conda/lib/python3.10/site-packages/umap/umap_.py:1780: UserWarning:

using precomputed metric; inverse_transform will be unavailable

Unsurprisingly, the age of the worm plays a significant role in shaping the morphology of its neurons. This is evident in the UMAP representation above, which reveals that neurons of different ages cluster in distinct regions of the UMAP. To quantify this association, we can use the Laplacian score:

[6]:
import cajal.laplacian_score
import numpy
from scipy.spatial.distance import squareform

laplacian = pandas.DataFrame(cajal.laplacian_score.laplacian_scores(numpy.array(metadata["day"]).reshape(799,1),
                                       gw_dist,
                                       numpy.median(squareform(gw_dist)),
                                       permutations = 5000,
                                       covariates = None,
                                       return_random_laplacians = False)[0])

print(laplacian)
   feature_laplacians  laplacian_p_values  laplacian_q_values
0             0.95148              0.0002              0.0002

A very small p value suggests a strong association between the age of the worm and the morphology of the DVB neuron.

Moving forward, our goal is to identify mutations that impact the morphology of the DVB neuron. To achieve this, we will rely on the Laplacian score once again. However, it is essential to consider the unequal representation of worms with a given genotype across different ages in the dataset. To address this issue, we will account for the uneven distribution of ages for each genotype. As an example, we will investigate the impact of deleterious mutations in the unc-25 gene. Let us first look at their distribution in the cell morphology space:

[7]:
plotly.express.scatter(x=embedding[:,0],
                       y=embedding[:,1],
                       template="simple_white",
                       hover_name=[m + ".swc" for m in cells],
                       color = [str(m) for m in metadata["unc-25"]])

The UMAP representation reveals that cells with a deleterious mutation in unc-25 exhibit similar morphology, a finding supported by the small p-value of the Laplacian score of unc-25 in the cell morphology space:

[8]:
laplacian = pandas.DataFrame(cajal.laplacian_score.laplacian_scores(numpy.array(metadata["unc-25"]).reshape(799,1),
                                       gw_dist,
                                       numpy.median(squareform(gw_dist)),
                                       permutations = 5000,
                                       covariates = None,
                                       return_random_laplacians = False)[0])

print(laplacian)
   feature_laplacians  laplacian_p_values  laplacian_q_values
0            0.995076              0.0022              0.0022

However, most of the samples with a mutation in unc-25 were obrained from worms with ages 1 or 3 days:

[9]:
metadata.loc[metadata["unc-25"]==1,"day"].value_counts()
[9]:
day
1    18
3     6
Name: count, dtype: int64

This leads to the question: is the comparable morphology of neurons with a deleterious mutation in unc-25 attributed to the mutation itself or the similar age of the worms? To address this issue, we can employ the Laplacian score but treating the age of the worm as a covariate:

[10]:
laplacian = pandas.DataFrame(cajal.laplacian_score.laplacian_scores(numpy.array(metadata.iloc[:,0:11]),
                                       gw_dist,
                                       numpy.median(squareform(gw_dist)),
                                       permutations = 5000,
                                       covariates = numpy.array(metadata["day"]),
                                       return_random_laplacians = False)[0])
laplacian.index = metadata.columns.values.tolist()[0:11]

print(laplacian)
        feature_laplacians  laplacian_p_values  laplacian_q_values    beta_0  \
nrx-1             0.996719            0.005399            0.009898  0.983381
mir-1             1.000039            0.120976            0.147859  0.981426
unc-49            0.996520            0.004799            0.010558  0.998679
nlg-1             0.994465            0.001200            0.005279  0.969925
unc-25            0.995076            0.001600            0.004399  0.937768
unc-97            0.961623            0.000200            0.002200  0.994308
lim-6             1.000994            0.310538            0.341592  1.002060
lat-2             0.994034            0.001200            0.005279  1.016689
ptp-3             0.999564            0.081584            0.112178  1.001231
sup-17            0.997842            0.014997            0.023567  1.025323
pkd-2             1.001726            0.550890            0.550890  1.004567

          beta_1  beta_1_p_value  regression_coefficients_fstat_p_values  \
nrx-1   0.018048        0.102606                                0.205211
mir-1   0.020022        0.080986                                0.161973
unc-49  0.002780        0.423838                                0.847675
nlg-1   0.031515        0.016015                                0.032029
unc-25  0.063590        0.000006                                0.000012
unc-97  0.007133        0.315329                                0.630659
lim-6  -0.000624        0.516852                                0.966297
lat-2  -0.015206        0.850024                                0.299952
ptp-3   0.000222        0.493974                                0.987948
sup-17 -0.023833        0.949655                                0.100690
pkd-2  -0.003103        0.588585                                0.822831

        laplacian_p_values_post_regression  laplacian_q_values_post_regression
nrx-1                             0.009598                            0.017596
mir-1                             0.320936                            0.353029
unc-49                            0.005999                            0.013197
nlg-1                             0.003999                            0.014664
unc-25                            0.023795                            0.037393
unc-97                            0.000200                            0.002200
lim-6                             0.301740                            0.368793
lat-2                             0.000400                            0.002200
ptp-3                             0.082184                            0.113002
sup-17                            0.005799                            0.015947
pkd-2                             0.491902                            0.491902

Upon examining the table, we note that the q-value of unc-25 shifts from 0.004 to 0.04 after adjusting for the covariate effect. Consistent with this, the F-statistic suggests an impact of the covariate on the Laplacian score of unc-25, as evidenced by the low p-value of the F-statistic.