Processing SWC Files
CAJAL supports neuronal tracing data in the SWC spec as specified here: http://www.neuronland.org/NLMorphologyConverter/MorphologyFormats/SWC/Spec.html
The sample_swc.py file contains functions to help the user sample points from an *.swc file.
- class NeuronNode(sample_number: int, structure_id: int, coord_triple: tuple[float, float, float], radius: float, parent_sample_number: int)
A NeuronNode represents the contents of a single line in an *.swc file.
- class NeuronTree(root: NeuronNode, child_subgraphs: list[cajal.swc.NeuronTree])
A NeuronTree represents one connected component of the graph coded in an *.swc file.
- Parameters
root (NeuronNode) –
child_subgraphs (list[cajal.swc.NeuronTree]) –
- class cajal.swc.SWCForest
A
swc.SWCForest
is a list ofswc.NeuronTree
’s. It is intended to be used to represent a list of all connected components from an SWC file. An SWCForest represents all contents of one SWC file.
- read_swc(file_path: str) tuple[swc.SWCForest, dict[int, cajal.swc.NeuronTree]]
Construct the graph (forest) associated to an SWC file. The forest is sorted by the number of nodes of the components
An exception is raised if any line has fewer than seven whitespace separated strings.
- Parameters
file_path (str) – A path to an *.swc file.
- Returns
(forest, lookup_table), where lookup_table maps sample numbers for nodes to their positions in the forest.
- Return type
One can alternately represent an SWC file in a simple list format, rather than using a nested class structure. The class structure may be more elegant, but we have encountered a number of SWCs so far where the depth of the graph associated to an SWC exceeds the default stack limit of Python, and so recursive algorithms on graphs are prone to stack overflow errors. A list is more amenable to iterative algorithms. In particular, if the user wants to serialize the data of an SWC graph (for example, to pass it between two processes or threads) they should recast it as a list so that the Python serialization function does not cause a stack overflow.
- linearize(forest: swc.SWCForest) list[cajal.swc.NeuronNode]
Linearize the SWCForest into a list of NeuronNodes where the sample number of each node is just its index in the list plus 1.
- Parameters
forest (swc.SWCForest) – An SWCForest to be linearized.
- Returns
A list linear of NeuronNodes. The list linear represents a directed graph which is isomorphic to forest; under this graph isomorphism, the xyz coordinates, radius, and structure identifier will be preserved, but the fields parent_sample_number and sample_number will not be. Instead, we will have linear[k].sample_number==k+1 for each index k. (This index shift is clearly error-prone with Python’s zero-indexing of lists, but it seems common in SWC files.)
- Return type
In addition to having “standardized” indices, this is a breadth-first linearization algorithm. It is guaranteed that:
The graph is topologically sorted in that parent nodes come before child nodes.
Each component is contained in a contiguous region of the list, whose first element is of course the root by (1.)
Within each component, the nodes are organized by level, so that the first element is the root, indices 2..n are the nodes at depth 1, indices n+1 .. m are the nodes at depth 2, and so on.
- forest_from_linear(ell: list[cajal.swc.NeuronNode]) swc.SWCForest
Convert a list of
swc.NeuronNode
’s to a graph.- Parameters
ell (list[cajal.swc.NeuronNode]) – A list of
swc.NeuronNode
’s where ell[i].sample_number == i+1 for all i. It is assumed that ell is topologically sorted, i.e., that parents are listed before their children, and that roots are marked by -1.- Returns
An
swc.SWCForest
containing the contents of the graph.- Return type
- write_swc(outfile: str, forest: swc.SWCForest) None
Write forest to outfile. Overwrite whatever is in outfile.
This function does not respect the sample numbers and parent sample numbers in forest. They will be renumbered so that the indices are contiguous and start at 1.
- Parameters
outfile (str) – An absolute path to the output file.
forest (swc.SWCForest) –
- Return type
None
If the user is batch-processing all *.swc files in a given directory, it is appropriate to include a filtering function so that the user does not accidentally crash the program by trying to read a non-SWC file into memory. Such extraneous files could include backup text files automatically generated by a text editor or by the operating system, hidden files, log files, or lists of cell indices. Therefore the user has the option to supply a “name validation” function which returns either True or False for each file name in the directory, only the filenames which pass the name validation test will be sampled from. The default name validation function is this one:
- default_name_validate(filename: str) bool
If the file name starts with a period ‘.’, the standard hidden-file marker on Linux, return False. Otherwise, return True if and only if the file ends in “.swc” (case-insensitive).
The user should be warned that passing information between distinct Python processes is costly, and the following function is not recommended if the user wants to employ multiprocessing, as any child process which takes cells from this iterator as input will incur high overhead by serializing and copying the data between processes. For multiprocessing/parallelization it is better to give each process its own list of file names to operate on, and let them read the files independently.
- cell_iterator(infolder: str, name_validate: ~typing.Callable[[str], bool] = <function default_name_validate>) Iterator[tuple[str, swc.SWCForest]]
Construct an iterator over all SWCs in a directory (all files ending in *.swc or *.SWC).
- Parameters
- Returns
An iterator over pairs (name, forest), where “name” is the file root (everything before the period in the file name) and “forest” is the forest contained in the SWC file.
- Return type
The following function is very useful for sampling from fragments of a neuron. .. autofunction:: cajal.swc.filter_forest
- keep_only_eu(structure_ids: Container[int]) Callable[[swc.SWCForest], swc.SWCForest]
Given structure_ids, a (list, set, tuple, etc.) of integers, return a filtering function which accepts an
swc.SWCForest
and returns the subforest containing only the node types in structure_ids.Example: keep_only([1,3,4])(forest) is the subforest of forest containing only the soma, the basal dendrites and the apical dendrites, but not the axon.
The intended use is to generate a preprocessing function for swc.read_preprocess_save, swc.batch_filter_and_preprocess, or sample_swc.compute_and_save_intracell_all_euclidean, see the documentation for those functions for more information.
- preprocessor_geo(structure_ids: Union[Container[int], Literal['keep_all_types']]) Callable[[swc.SWCForest], NeuronTree]
This preprocessor strips the tree down to only the components listed in structure_ids and also trims the tree down to a single connected component. This is similar to
swc.keep_only_eu()
and the user should consult the documentation for that function. Observe that the type signature is also different. The callable returned by this function is suitable as a preprocessing function forsample_swc.read_preprocess_compute_geodesic()
orsample_swc.compute_and_save_intracell_all_geodesic()
.
- preprocessor_eu(structure_ids: Union[Container[int], Literal['keep_all_types']], soma_component_only: bool) Callable[[swc.SWCForest], Union[Err[str], swc.SWCForest]]
- Parameters
structure_ids (Union[Container[int], Literal['keep_all_types']]) – Either a collection of integers corresponding to structure ids in the SWC spec, or the literal string ‘keep_all_types’.
soma_component_only (bool) – Indicate whether to sample from the whole SWC file, or only from the connected component containing the soma. Whether this flag is appropriate depends on the technology used to construct the SWC files. Some technologies generate SWC files in which there are many unrelated connected components which are “noise” contributed by other overlapping neurons. In other technologies, all components are significant and the authors of the SWC file were simply unable to determine exactly where the branch should be connected to the main tree. In order to get sensible results from the data, the user should visually inspect neurons with multiple connected components using a tool such as Vaa3D https://github.com/Vaa3D/release/releases/tag/v1.1.2 to determine whether the extra components should be regarded as signal or noise.
- Returns
A preprocessing function which accepts as argument an SWCForest forest and returns a filtered forest containing only the nodes listed in structure_ids. If soma_component_only is True, only nodes from the component containing the soma will be returned; otherwise nodes will be drawn from across the whole forest. If soma_component_only is True and there is not a unique connected component whose root is a soma node, the function will return an error.
- Return type
Callable[[swc.SWCForest], Union[Err[str], swc.SWCForest]]
- total_length(tree: NeuronTree) float
Return the sum of lengths of all edges in the graph.
- Parameters
tree (NeuronTree) –
- Return type
- weighted_depth(tree: NeuronTree) float
Return the weighted depth/ weighted height of the tree, i.e., the maximal geodesic distance from the root to any other point.
- Parameters
tree (NeuronTree) –
- Return type
- discrete_depth(tree: NeuronTree) int
- Returns
The height of the tree in the unweighted or discrete sense, i.e. the longest path from the root to any leaf measured in the number of edges.
- Parameters
tree (NeuronTree) –
- Return type
- node_type_counts_tree(tree: NeuronTree) dict[int, int]
- Returns
A dictionary whose keys are all structure_id’s in tree and whose values are the multiplicities with which that node type occurs.
- Parameters
tree (NeuronTree) –
- Return type
- node_type_counts_forest(forest: swc.SWCForest) dict[int, int]
- Returns
a dictionary whose keys are all structure_id’s in forest and whose values are the multiplicities with which that node type occurs.
- Parameters
forest (swc.SWCForest) –
- Return type
- num_nodes(tree: NeuronTree) int
- Returns
The number of nodes in tree.
- Parameters
tree (NeuronTree) –
- Return type
- read_preprocess_save(infile_name: str, outfile_name: str, preprocess: Callable[[swc.SWCForest], Union[Err[T], swc.SWCForest, NeuronTree]]) Union[Err[T], Literal['success']]
Read the *.swc file file_name from disk as an SWCForest. Apply the function preprocess to the forest. If preprocessing returns an error,return that error. Otherwise, write the preprocessed swc to outfile and return the string “success”.
This function exists mostly for convenience, as it can be called in parallel on several files at once without requiring a large amount of data to be communicated between processes.
- Parameters
infile_name (str) –
outfile_name (str) –
preprocess (Callable[[swc.SWCForest], Union[Err[T], swc.SWCForest, NeuronTree]]) –
- Return type
- get_filenames(infolder: str, name_validate: ~typing.Callable[[str], bool] = <function default_name_validate>) tuple[list[str], list[str]]
Get a list of all files in infolder. Filter the list by name_validate. :return: a pair of lists (cell_names, file_paths), where file_paths are the paths to cells we want to sample from, and cell_names[i] is the substring of file_paths[i] containing only the file name, minus the extension; i.e., if file_paths[i] is “/home/jovyan/files/abc.swc” then cell_names[i] is “abc”.
See
swc.default_name_validate()
for an example of a name validation function.
- batch_filter_and_preprocess(infolder: str, outfolder: str, preprocess: ~typing.Callable[[swc.SWCForest], ~typing.Union[~cajal.utilities.Err[~cajal.utilities.T], swc.SWCForest, ~cajal.swc.NeuronTree]], parallel_processes: int, err_log: ~typing.Optional[str], suffix: ~typing.Optional[str] = None, name_validate: ~typing.Callable[[str], bool] = <function default_name_validate>) None
Get the set of files in infolder. Filter down to the filenames which pass the test name_validate, which is responsible for filtering out any non-swc files.For the files in this filtered list, read them into memory as
swc.SWCForest
’s. Apply the function preprocess to each forest. preprocess may return an error (essentially just a message contained in an error wrapper) or a modified/transformed SWCForest, i.e., certain nodes have been filtered out, or certain components of the graph deleted. If preprocess returns an error, write the error to the given log file err_log together with the name of the cell that caused the error. Otherwise, if preprocess returns an SWCForest, write this SWCForest into the folder outfolder with filename == cellname + suffix + ‘.swc’.- Parameters
infolder (str) – Folder containing SWC files to process.
outfolder (str) – Folder where the results of the filtering will be written.
err_log (Optional[str]) – A file name for a (currently nonexistent) *.csv file. This file will be written to with a list of all the cells which were rejected by preprocess together with an explanation of why these cells could not be processed.
preprocess (Callable[[swc.SWCForest], Union[Err[T], swc.SWCForest, NeuronTree]]) – A function to filter out bad SWC forests or transform them into a more manageable form.
parallel_processes (int) – Run this many Python processes in parallel.
suffix (Optional[str]) – If a file in infolder has the name “abc.swc” then the corresponding file written to outfolder will have the name “abc” + suffix + “.swc”.
name_validate (Callable[[str], bool]) – A function which identifies the files in infolder which are *.swc files. The default argument,
swc.default_name_validate()
, checks to see whether the filename has file extension “.swc”, case insensitive, and discards files starting with ‘.’, the marker for hidden files on Linux. The user may need to write their own function to ensure that various kinds of backup /autosave files and metadata files are not read into memory.
- Return type
None
For computing geodesic distances, it is more convenient to have a data structure with the weights precomputed and attached to the edges, so we introduce an alternate representation for a neuron where coordinates are forgotten and only the weighted tree structure remains. These objects can be smaller than the original NeuronTrees.
- class WeightedTreeRoot(subtrees: 'list[WeightedTreeChild]')
- Parameters
subtrees (list[cajal.weighted_tree.WeightedTreeChild]) –
- class WeightedTreeChild(subtrees: 'list[WeightedTreeChild]', depth: 'int', unique_id: 'int', parent: 'WeightedTree', dist: 'float')
- Parameters
subtrees (list[cajal.weighted_tree.WeightedTreeChild]) –
depth (int) –
unique_id (int) –
parent (sample_swc.WeightedTree) –
dist (float) –
A cajal.weighted_tree.WeightedTree
is either a
cajal.weighted_tree.WeightedTreeRoot
or a
cajal.weighted_tree.WeightedTreeChild
.
- WeightedTree_of(tree: NeuronTree) WeightedTreeRoot
Convert a NeuronTree to a WeightedTree. A node in a WeightedTree does not contain a coordinate triple, a radius, a structure_id, or a parent sample number.
Instead, it contains a direct pointer to its parent, a list of its children, and (if it is a child node) the weight of the edge between the child and its parent.
In forming the WeightedTree, any node with both a parent and exactly one child is eliminated, and the parent and the child are joined directly by a single edge whose weight is the sum of the two original edge weights. This reduces the number of nodes without affecting the geodesic distances between points in the graph.
- Parameters
tree (NeuronTree) – A NeuronTree to be converted into a WeightedTree.
- Returns
The WeightedTree corresponding to the original NeuronTree.
- Return type