Tutorial 1: Predicting the Molecular Type of Neurons

To demonstrate some of the main functionalities of CAJAL, here we perform some basic analysis on a set of neuron morphological reconstructions obtained from the Allen Brain Atlas. To facilitate the analysis, we provide a compressed *.tar.gz file containing the *.SWC files of 509 neurons used in this example, which can be downloaded directly from this link. In this tutorial we assume that the SWC files are located in the folder /home/jovyan/swc. More information about this dataset can be found at:

For this analysis, we focus on the morphology of the dendrites and exclude the axons of the neurons. To achieve this, we set structure_ids = [1,3,4], which tells CAJAL to only sample points from the soma and the basal and apical dendrites. We sample 100 points from each neuron and compute the Euclidean distance between each pair of points in that neuron using the following code:

[2]:
import cajal.sample_swc
import cajal.swc

cajal.sample_swc.compute_icdm_all_euclidean(
    infolder="/home/jovyan/swc",
    out_csv="/home/jovyan/swc_bdad_100pts_euclidean_icdm.csv",
    preprocess=cajal.swc.preprocessor_eu(
        structure_ids=[1,3,4],
        soma_component_only=False),
    n_sample=100,
    num_processes=8)  # num_processes can be set to the number of cores on your machine
100%|████████████████████████████████████████████████████████████████████████████████▊| 508/509 [06:02<00:00,  1.40it/s]
[2]:
[]

Once the sampling is completed, we compute the Gromov-Wasserstein distance between each pair of neurons. To compute the Gromov-Wasserstein distance matrix we use the code:

[3]:
import cajal.run_gw

cajal.run_gw.compute_gw_distance_matrix(
    "/home/jovyan/swc_bdad_100pts_euclidean_icdm.csv",
    "/home/jovyan/swc_bdad_100pts_euclidean_GW_dmat.csv",
    num_processes=8)
100%|██████████████████████████████████████████████████████████████████████████| 129286/129286 [03:52<00:00, 556.95it/s]
[3]:
(array([[  0.        ,  76.53525355,  48.81215985, ...,  36.25765651,
          39.63267218, 107.27192268],
        [ 76.53525355,   0.        ,  90.55259238, ...,  69.27173625,
          82.74822498,  50.54451328],
        [ 48.81215985,  90.55259238,   0.        , ...,  26.48503494,
          16.99102489, 129.81156708],
        ...,
        [ 36.25765651,  69.27173625,  26.48503494, ...,   0.        ,
          21.15960915, 107.41792624],
        [ 39.63267218,  82.74822498,  16.99102489, ...,  21.15960915,
           0.        , 121.93211717],
        [107.27192268,  50.54451328, 129.81156708, ..., 107.41792624,
         121.93211717,   0.        ]]),
 None)

We can visualize the resulting space of cell morphologies using UMAP:

[1]:
import plotly.io as pio
pio.renderers.default = 'iframe'

import cajal.utilities
import umap
import plotly.express

# Read GW distance matrix
cells, gw_dist_dict = cajal.utilities.read_gw_dists("/home/jovyan/swc_bdad_100pts_euclidean_GW_dmat.csv", header=True)
gw_dist = cajal.utilities.dist_mat_of_dict(gw_dist_dict, cells)

# Compute UMAP representation
reducer = umap.UMAP(metric="precomputed", random_state=1)
embedding = reducer.fit_transform(gw_dist)

# Visualize UMAP
plotly.express.scatter(x=embedding[:,0],
                       y=embedding[:,1],
                       template="simple_white",
                       hover_name=[m + ".swc" for m in cells])
2023-07-20 21:31:24.609344: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
/opt/conda/lib/python3.10/site-packages/umap/umap_.py:1780: UserWarning: using precomputed metric; inverse_transform will be unavailable
  warn("using precomputed metric; inverse_transform will be unavailable")
OMP: Info #276: omp_set_nested routine deprecated, please use omp_set_max_active_levels instead.