Processing SWC Files

CAJAL supports neuronal tracing data in the SWC spec as specified here: http://www.neuronland.org/NLMorphologyConverter/MorphologyFormats/SWC/Spec.html

The sample_swc.py file contains functions to help the user sample points from an *.swc file.

class NeuronNode(sample_number: int, structure_id: int, coord_triple: tuple[float, float, float], radius: float, parent_sample_number: int)

A NeuronNode represents the contents of a single line in an *.swc file.

Parameters
class NeuronTree(root: NeuronNode, child_subgraphs: list[cajal.swc.NeuronTree])

A NeuronTree represents one connected component of the graph coded in an *.swc file.

Parameters
class cajal.swc.SWCForest

A swc.SWCForest is a list of swc.NeuronTree’s. It is intended to be used to represent a list of all connected components from an SWC file. An SWCForest represents all contents of one SWC file.

read_swc(file_path: str) tuple[swc.SWCForest, dict[int, cajal.swc.NeuronTree]]

Construct the graph (forest) associated to an SWC file. The forest is sorted by the number of nodes of the components

An exception is raised if any line has fewer than seven whitespace separated strings.

Parameters

file_path (str) – A path to an *.swc file.

Returns

(forest, lookup_table), where lookup_table maps sample numbers for nodes to their positions in the forest.

Return type

tuple[swc.SWCForest, dict[int, cajal.swc.NeuronTree]]

One can alternately represent an SWC file in a simple list format, rather than using a nested class structure. The class structure may be more elegant, but we have encountered a number of SWCs so far where the depth of the graph associated to an SWC exceeds the default stack limit of Python, and so recursive algorithms on graphs are prone to stack overflow errors. A list is more amenable to iterative algorithms. In particular, if the user wants to serialize the data of an SWC graph (for example, to pass it between two processes or threads) they should recast it as a list so that the Python serialization function does not cause a stack overflow.

linearize(forest: swc.SWCForest) list[cajal.swc.NeuronNode]

Linearize the SWCForest into a list of NeuronNodes where the sample number of each node is just its index in the list plus 1.

Parameters

forest (swc.SWCForest) – An SWCForest to be linearized.

Returns

A list linear of NeuronNodes. The list linear represents a directed graph which is isomorphic to forest; under this graph isomorphism, the xyz coordinates, radius, and structure identifier will be preserved, but the fields parent_sample_number and sample_number will not be. Instead, we will have linear[k].sample_number==k+1 for each index k. (This index shift is clearly error-prone with Python’s zero-indexing of lists, but it seems common in SWC files.)

Return type

list[cajal.swc.NeuronNode]

In addition to having “standardized” indices, this is a breadth-first linearization algorithm. It is guaranteed that:

  1. The graph is topologically sorted in that parent nodes come before child nodes.

  2. Each component is contained in a contiguous region of the list, whose first element is of course the root by (1.)

  3. Within each component, the nodes are organized by level, so that the first element is the root, indices 2..n are the nodes at depth 1, indices n+1 .. m are the nodes at depth 2, and so on.

forest_from_linear(ell: list[cajal.swc.NeuronNode]) swc.SWCForest

Convert a list of swc.NeuronNode’s to a graph.

Parameters

ell (list[cajal.swc.NeuronNode]) – A list of swc.NeuronNode’s where ell[i].sample_number == i+1 for all i. It is assumed that ell is topologically sorted, i.e., that parents are listed before their children, and that roots are marked by -1.

Returns

An swc.SWCForest containing the contents of the graph.

Return type

swc.SWCForest

write_swc(outfile: str, forest: swc.SWCForest) None

Write forest to outfile. Overwrite whatever is in outfile.

This function does not respect the sample numbers and parent sample numbers in forest. They will be renumbered so that the indices are contiguous and start at 1.

Parameters
  • outfile (str) – An absolute path to the output file.

  • forest (swc.SWCForest) –

Return type

None

If the user is batch-processing all *.swc files in a given directory, it is appropriate to include a filtering function so that the user does not accidentally crash the program by trying to read a non-SWC file into memory. Such extraneous files could include backup text files automatically generated by a text editor or by the operating system, hidden files, log files, or lists of cell indices. Therefore the user has the option to supply a “name validation” function which returns either True or False for each file name in the directory, only the filenames which pass the name validation test will be sampled from. The default name validation function is this one:

default_name_validate(filename: str) bool

If the file name starts with a period ‘.’, the standard hidden-file marker on Linux, return False. Otherwise, return True if and only if the file ends in “.swc” (case-insensitive).

Parameters

filename (str) –

Return type

bool

The user should be warned that passing information between distinct Python processes is costly, and the following function is not recommended if the user wants to employ multiprocessing, as any child process which takes cells from this iterator as input will incur high overhead by serializing and copying the data between processes. For multiprocessing/parallelization it is better to give each process its own list of file names to operate on, and let them read the files independently.

cell_iterator(infolder: str, name_validate: ~typing.Callable[[str], bool] = <function default_name_validate>) Iterator[tuple[str, swc.SWCForest]]

Construct an iterator over all SWCs in a directory (all files ending in *.swc or *.SWC).

Parameters
  • infolder (str) – A path to a folder containing SWC files.

  • name_validate (Callable[[str], bool]) –

Returns

An iterator over pairs (name, forest), where “name” is the file root (everything before the period in the file name) and “forest” is the forest contained in the SWC file.

Return type

Iterator[tuple[str, swc.SWCForest]]

The following function is very useful for sampling from fragments of a neuron. .. autofunction:: cajal.swc.filter_forest

keep_only_eu(structure_ids: Container[int]) Callable[[swc.SWCForest], swc.SWCForest]

Given structure_ids, a (list, set, tuple, etc.) of integers, return a filtering function which accepts an swc.SWCForest and returns the subforest containing only the node types in structure_ids.

Example: keep_only([1,3,4])(forest) is the subforest of forest containing only the soma, the basal dendrites and the apical dendrites, but not the axon.

The intended use is to generate a preprocessing function for swc.read_preprocess_save, swc.batch_filter_and_preprocess, or sample_swc.compute_and_save_intracell_all_euclidean, see the documentation for those functions for more information.

Parameters

structure_ids (Container[int]) – A container of integers representing types of neuron nodes.

Returns

A filtering function taking as an argument an SWCForest forest and returning the subforest of forest containing only the node types in structure_ids.

Return type

Callable[[swc.SWCForest], swc.SWCForest]

preprocessor_geo(structure_ids: Union[Container[int], Literal['keep_all_types']]) Callable[[swc.SWCForest], NeuronTree]

This preprocessor strips the tree down to only the components listed in structure_ids and also trims the tree down to a single connected component. This is similar to swc.keep_only_eu() and the user should consult the documentation for that function. Observe that the type signature is also different. The callable returned by this function is suitable as a preprocessing function for sample_swc.read_preprocess_compute_geodesic() or sample_swc.compute_and_save_intracell_all_geodesic().

Parameters

structure_ids (Union[Container[int], Literal['keep_all_types']]) –

Return type

Callable[[swc.SWCForest], NeuronTree]

preprocessor_eu(structure_ids: Union[Container[int], Literal['keep_all_types']], soma_component_only: bool) Callable[[swc.SWCForest], Union[Err[str], swc.SWCForest]]
Parameters
  • structure_ids (Union[Container[int], Literal['keep_all_types']]) – Either a collection of integers corresponding to structure ids in the SWC spec, or the literal string ‘keep_all_types’.

  • soma_component_only (bool) – Indicate whether to sample from the whole SWC file, or only from the connected component containing the soma. Whether this flag is appropriate depends on the technology used to construct the SWC files. Some technologies generate SWC files in which there are many unrelated connected components which are “noise” contributed by other overlapping neurons. In other technologies, all components are significant and the authors of the SWC file were simply unable to determine exactly where the branch should be connected to the main tree. In order to get sensible results from the data, the user should visually inspect neurons with multiple connected components using a tool such as Vaa3D https://github.com/Vaa3D/release/releases/tag/v1.1.2 to determine whether the extra components should be regarded as signal or noise.

Returns

A preprocessing function which accepts as argument an SWCForest forest and returns a filtered forest containing only the nodes listed in structure_ids. If soma_component_only is True, only nodes from the component containing the soma will be returned; otherwise nodes will be drawn from across the whole forest. If soma_component_only is True and there is not a unique connected component whose root is a soma node, the function will return an error.

Return type

Callable[[swc.SWCForest], Union[Err[str], swc.SWCForest]]

total_length(tree: NeuronTree) float

Return the sum of lengths of all edges in the graph.

Parameters

tree (NeuronTree) –

Return type

float

weighted_depth(tree: NeuronTree) float

Return the weighted depth/ weighted height of the tree, i.e., the maximal geodesic distance from the root to any other point.

Parameters

tree (NeuronTree) –

Return type

float

discrete_depth(tree: NeuronTree) int
Returns

The height of the tree in the unweighted or discrete sense, i.e. the longest path from the root to any leaf measured in the number of edges.

Parameters

tree (NeuronTree) –

Return type

int

node_type_counts_tree(tree: NeuronTree) dict[int, int]
Returns

A dictionary whose keys are all structure_id’s in tree and whose values are the multiplicities with which that node type occurs.

Parameters

tree (NeuronTree) –

Return type

dict[int, int]

node_type_counts_forest(forest: swc.SWCForest) dict[int, int]
Returns

a dictionary whose keys are all structure_id’s in forest and whose values are the multiplicities with which that node type occurs.

Parameters

forest (swc.SWCForest) –

Return type

dict[int, int]

num_nodes(tree: NeuronTree) int
Returns

The number of nodes in tree.

Parameters

tree (NeuronTree) –

Return type

int

read_preprocess_save(infile_name: str, outfile_name: str, preprocess: Callable[[swc.SWCForest], Union[Err[T], swc.SWCForest, NeuronTree]]) Union[Err[T], Literal['success']]

Read the *.swc file file_name from disk as an SWCForest. Apply the function preprocess to the forest. If preprocessing returns an error,return that error. Otherwise, write the preprocessed swc to outfile and return the string “success”.

This function exists mostly for convenience, as it can be called in parallel on several files at once without requiring a large amount of data to be communicated between processes.

Parameters
Return type

Union[Err[T], Literal[‘success’]]

get_filenames(infolder: str, name_validate: ~typing.Callable[[str], bool] = <function default_name_validate>) tuple[list[str], list[str]]

Get a list of all files in infolder. Filter the list by name_validate. :return: a pair of lists (cell_names, file_paths), where file_paths are the paths to cells we want to sample from, and cell_names[i] is the substring of file_paths[i] containing only the file name, minus the extension; i.e., if file_paths[i] is “/home/jovyan/files/abc.swc” then cell_names[i] is “abc”.

See swc.default_name_validate() for an example of a name validation function.

Parameters
Return type

tuple[list[str], list[str]]

batch_filter_and_preprocess(infolder: str, outfolder: str, preprocess: ~typing.Callable[[swc.SWCForest], ~typing.Union[~cajal.utilities.Err[~cajal.utilities.T], swc.SWCForest, ~cajal.swc.NeuronTree]], parallel_processes: int, err_log: ~typing.Optional[str], suffix: ~typing.Optional[str] = None, name_validate: ~typing.Callable[[str], bool] = <function default_name_validate>) None

Get the set of files in infolder. Filter down to the filenames which pass the test name_validate, which is responsible for filtering out any non-swc files.For the files in this filtered list, read them into memory as swc.SWCForest’s. Apply the function preprocess to each forest. preprocess may return an error (essentially just a message contained in an error wrapper) or a modified/transformed SWCForest, i.e., certain nodes have been filtered out, or certain components of the graph deleted. If preprocess returns an error, write the error to the given log file err_log together with the name of the cell that caused the error. Otherwise, if preprocess returns an SWCForest, write this SWCForest into the folder outfolder with filename == cellname + suffix + ‘.swc’.

Parameters
  • infolder (str) – Folder containing SWC files to process.

  • outfolder (str) – Folder where the results of the filtering will be written.

  • err_log (Optional[str]) – A file name for a (currently nonexistent) *.csv file. This file will be written to with a list of all the cells which were rejected by preprocess together with an explanation of why these cells could not be processed.

  • preprocess (Callable[[swc.SWCForest], Union[Err[T], swc.SWCForest, NeuronTree]]) – A function to filter out bad SWC forests or transform them into a more manageable form.

  • parallel_processes (int) – Run this many Python processes in parallel.

  • suffix (Optional[str]) – If a file in infolder has the name “abc.swc” then the corresponding file written to outfolder will have the name “abc” + suffix + “.swc”.

  • name_validate (Callable[[str], bool]) – A function which identifies the files in infolder which are *.swc files. The default argument, swc.default_name_validate(), checks to see whether the filename has file extension “.swc”, case insensitive, and discards files starting with ‘.’, the marker for hidden files on Linux. The user may need to write their own function to ensure that various kinds of backup /autosave files and metadata files are not read into memory.

Return type

None

For computing geodesic distances, it is more convenient to have a data structure with the weights precomputed and attached to the edges, so we introduce an alternate representation for a neuron where coordinates are forgotten and only the weighted tree structure remains. These objects can be smaller than the original NeuronTrees.

class WeightedTreeRoot(subtrees: 'list[WeightedTreeChild]')
Parameters

subtrees (list[cajal.weighted_tree.WeightedTreeChild]) –

class WeightedTreeChild(subtrees: 'list[WeightedTreeChild]', depth: 'int', unique_id: 'int', parent: 'WeightedTree', dist: 'float')
Parameters

A cajal.weighted_tree.WeightedTree is either a cajal.weighted_tree.WeightedTreeRoot or a cajal.weighted_tree.WeightedTreeChild.

WeightedTree_of(tree: NeuronTree) WeightedTreeRoot

Convert a NeuronTree to a WeightedTree. A node in a WeightedTree does not contain a coordinate triple, a radius, a structure_id, or a parent sample number.

Instead, it contains a direct pointer to its parent, a list of its children, and (if it is a child node) the weight of the edge between the child and its parent.

In forming the WeightedTree, any node with both a parent and exactly one child is eliminated, and the parent and the child are joined directly by a single edge whose weight is the sum of the two original edge weights. This reduces the number of nodes without affecting the geodesic distances between points in the graph.

Parameters

tree (NeuronTree) – A NeuronTree to be converted into a WeightedTree.

Returns

The WeightedTree corresponding to the original NeuronTree.

Return type

WeightedTreeRoot