Sampling from SWC Files

get_sample_pts_euclidean(forest: list[cajal.swc.NeuronTree], step_size: float) list[numpy.ndarray[Any, numpy.dtype[numpy.float64]]]

Sample points uniformly throughout the forest, starting at the roots, at the given step size.

Returns

a list of (x,y,z) coordinate triples, represented as numpy floating point arrays of shape (3,). The list length depends (inversely) on the value of step_size.

Parameters
Return type

list[numpy.ndarray[Any, numpy.dtype[numpy.float64]]]

icdm_euclidean(forest: list[cajal.swc.NeuronTree], num_samples: int) ndarray[Any, dtype[float64]]

Compute the (Euclidean) intracell distance matrix for the forest with n sample points.

Parameters
Returns

A condensed (vectorform) matrix of length n* (n-1)/2.

Return type

ndarray[Any, dtype[float64]]

geodesic_distance(wt1: Union[WeightedTreeRoot, WeightedTreeChild], h1: float, wt2: Union[WeightedTreeRoot, WeightedTreeChild], h2: float) float

Return the geodesic distance between p1=(wt1,h1) and p2=(wt2,h2).

Here, p1 is a point in a weighted tree which lies at height h1 above wt1. Similarly, p2 is a point in a weighted tree which lies at height h2 above wt2.

Parameters
  • wt1 (Union[WeightedTreeRoot, WeightedTreeChild]) – A node in a weighted tree.

  • h1 (float) – Represents a point p1 which lies h1 above wt1 in the tree, along the line segment connecting wt1 to its parent. h1 is assumed to be less than the distance between wt1 and wt1.parent; or if wt1 is a root node, h1 is assumed to be zero.

  • wt2 (Union[WeightedTreeRoot, WeightedTreeChild]) – A node in a weighted tree.

  • h2 (float) – Represents a point p2 which lies h2 above wt2 in the tree, along the line segment connecting wt2 to its parent. Similar assumptions as for h1.

Return type

float

get_sample_pts_geodesic(tree: NeuronTree, num_sample_pts: int) list[tuple[Union[cajal.weighted_tree.WeightedTreeRoot, cajal.weighted_tree.WeightedTreeChild], float]]

Sample points uniformly throughout the body of tree, starting at the root, returning a list of length num_sample_pts.

“Sample points uniformly” means that there is some scalar step_size such that a point p on a line segment of tree will be in the return list iff its geodesic distance from the origin is an integer multiple of step_size.

Returns

a list of pairs (wt, h), where wt is a node of tree, and h is a floating point real number representing a point p which lies a distance of h above wt on the line segment between wt and its parent. If wt is a child node, h is guaranteed to be less than the distance between wt and its parent. If wt is a root, h is guaranteed to be zero.

Parameters
Return type

list[tuple[Union[cajal.weighted_tree.WeightedTreeRoot, cajal.weighted_tree.WeightedTreeChild], float]]

icdm_geodesic(tree: NeuronTree, num_samples: int) ndarray[Any, dtype[float64]]

Compute the intracell distance matrix for tree using the geodesic metric.

Sample num_samples many points uniformly throughout the body of tree, compute the pairwise geodesic distance between all sampled points, and return the matrix of distances.

Returns

A numpy array, a “condensed distance matrix” in the sense of scipy.spatial.distance.squareform(), i.e., an array of shape (num_samples * num_samples - 1/2, ). Contains the entries in the intracell geodesic distance matrix for tree lying strictly above the diagonal.

Parameters
Return type

ndarray[Any, dtype[float64]]

compute_icdm_all_euclidean(infolder: str, out_csv: str, n_sample: int, preprocess: ~typing.Callable[[list[cajal.swc.NeuronTree]], ~typing.Union[~cajal.utilities.Err[~cajal.utilities.T], list[cajal.swc.NeuronTree]]] = <function <lambda>>, num_processes: int = 8, name_validate: ~typing.Callable[[str], bool] = <function default_name_validate>) list[tuple[str, cajal.utilities.Err[T]]]

Compute the intracell Euclidean distance matrices for all swc cells in infolder.

For each *.swc file in infolder, read the *.swc file into memory as an SWCForest, forest. Apply a preprocessing function preprocess to forest, which can return either an error message (because the file is for whatever reason unsuitable for processing or sampling) or a potentially modified SWCForest processed_forest. Sample n_sample many points from the neuron, evenly spaced, and compute the Euclidean intracell matrix. Write the resulting intracell distance matrices for all cells passing the preprocessing test to a csv file with path out_csv.

Parameters
  • infolder (str) – Directory of input *.swc files.

  • out_csv (str) – Output file to write to.

  • n_sample (int) – How many points to sample from each cell.

  • preprocess (Callable[[list[cajal.swc.NeuronTree]], Union[Err[T], list[cajal.swc.NeuronTree]]]) –

    preprocess is expected to be roughly of the following form:

    1. Apply such-and-such tests of data quality and integrity to the SWCForest. (For example, check that the forest has only a single connected component, that it has only a single soma node, that it has at least one soma node, that it contains nodes from the axon, that it does not have any elements whose structure_id is 0 (for ‘undefined’), etc.)

    2. If any of the tests are failed, return an instance of utilities.Err with a message explaining why the *.swc file was ineligible for sampling.

    3. If all tests are passed, apply a transformation to forest and return the modified new_forest. (For example, filter out all axon nodes to focus on the dendrites, or filter out all undefined nodes, or filter out all components which have fewer than 10% of the nodes in the largest component.)

    If preprocess(forest) returns an instance of the utilities.Err class, this file is not sampled from, and its name is added to a list together with the error returned by preprocess. If preprocess(forest) returns a SWCForest, this is what will be sampled. By default, no preprocessing is performed, and the neuron is processed as-is.

  • num_processes (int) – the intracell distance matrices will be computed in parallel processes, num_processes is the number of processes to run simultaneously. Recommended to set equal to the number of cores on your machine.

  • name_validate (Callable[[str], bool]) – A boolean test on strings. Files will be read from the directory if name_validate is True (truthy).

Returns

List of pairs (cell_name, error), where cell_name is the cell for which sampling failed, and error is a wrapper around a message indicating why the neuron was not sampled from.

Return type

list[tuple[str, cajal.utilities.Err[~T]]]

compute_icdm_all_geodesic(infolder: str, out_csv: str, n_sample: int, num_processes: int = 8, preprocess: ~typing.Callable[[list[cajal.swc.NeuronTree]], ~typing.Union[~cajal.utilities.Err[~cajal.utilities.T], ~cajal.swc.NeuronTree]] = <function <lambda>>) list[tuple[str, cajal.utilities.Err[T]]]

Compute the intracell geodesic distance matrices for all swc cells in infolder.

This function is substantially the same as cajal.sample_swc.compute_icdm_all_euclidean() and the user should consult the documentation for that function. However, note that preprocess has a different type signature, it is expected to return a NeuronTree rather than an SWCForest. There is not a meaningful notion of geodesic distance between points in two different components of a graph.

The default preprocessing is to take the largest component.

Parameters
Return type

list[tuple[str, cajal.utilities.Err[~T]]]