Second Lower Bound and Quantized Gromov-Wasserstein
- slb_parallel_memory(cell_dms: Collection[ndarray[Any, dtype[float64]]], cell_distributions: Optional[Iterable[ndarray[Any, dtype[float64]]]], num_processes: int, chunksize: int = 20) ndarray[Any, dtype[float64]]
Compute the SLB distance in parallel between all cells in cell_dms.
- Parameters
cell_dms (Collection[ndarray[Any, dtype[float64]]]) – A collection of distance matrices.
cell_distributions (Optional[Iterable[ndarray[Any, dtype[float64]]]]) – A collection of distributions on the cells, should be the same length as the collection of cell dms.
num_processes (int) – How many Python processes to run in parallel
chunksize (int) – How many SLB distances each Python process computes at a time
- Returns
a square matrix giving pairwise SLB distances between points.
- Return type
- slb_parallel(intracell_csv_loc: str, out_csv: str, num_processes: int, chunksize: int = 20) None
Compute the SLB distance in parallel between all cells in the csv file intracell_csv_loc.
The files are expected to be formatted according to the format in
cajal.run_gw.icdm_csv_validate()
.
- class quantized_icdm(cell_dm: ndarray[Any, dtype[float64]], p: ndarray[Any, dtype[float64]], num_clusters: int, clusters: Optional[ndarray[Any, dtype[int64]]] = None)
A “quantized” intracell distance matrix.
A metric measure space which has been equipped with a given clustering; it contains additional data which allows for the rapid computation of pairwise GW distances across many cells. Users should only need to understand how to use the constructor. Usage of this class will result in high memory usage if the number of cells to be constructed is large.
- Parameters
cell_dm (ndarray[Any, dtype[float64]]) – An intracell distance matrix in squareform.
p (ndarray[Any, dtype[float64]]) – A probability distribution on the points of the metric space
num_clusters (int) – How many clusters to subdivide the cell into; the more clusters, the more accuracy, but the longer the computation.
clusters (Optional[ndarray[Any, dtype[int64]]]) – Labels for a clustering of the points in the cell. If no clustering is supplied, one will be derived by hierarchical clustering until num_clusters clusters are formed. If a clustering is supplied, then num_clusters is ignored.
- quantized_gw_parallel(intracell_csv_loc: str, num_processes: int, num_clusters: int, out_csv: str, chunksize: int = 20, verbose: bool = False, write_blocksize: int = 100) None
Compute the quantized Gromov-Wasserstein distance in parallel between all cells in a family of cells.
Read icdms from file, quantize them, compute pairwise qGW distances between icdms, and write the result to file.
- Parameters
intracell_csv_loc (str) – path to a CSV file containing the cells to process
num_processes (int) – number of Python processes to run in parallel
num_clusters (int) – Each cell will be partitioned into num_clusters many clusters.
out_csv (str) – file path where a CSV file containing the quantized GW distances will be written
chunksize (int) – How many q-GW distances should be computed at a time by each parallel process.
verbose (bool) –
write_blocksize (int) –
- Return type
None