Second Lower Bound and Quantized Gromov-Wasserstein

slb_parallel_memory(cell_dms: Collection[ndarray[Any, dtype[float64]]], cell_distributions: Optional[Iterable[ndarray[Any, dtype[float64]]]], num_processes: int, chunksize: int = 20) ndarray[Any, dtype[float64]]

Compute the SLB distance in parallel between all cells in cell_dms.

Parameters
  • cell_dms (Collection[ndarray[Any, dtype[float64]]]) – A collection of distance matrices.

  • cell_distributions (Optional[Iterable[ndarray[Any, dtype[float64]]]]) – A collection of distributions on the cells, should be the same length as the collection of cell dms.

  • num_processes (int) – How many Python processes to run in parallel

  • chunksize (int) – How many SLB distances each Python process computes at a time

Returns

a square matrix giving pairwise SLB distances between points.

Return type

ndarray[Any, dtype[float64]]

slb_parallel(intracell_csv_loc: str, out_csv: str, num_processes: int, chunksize: int = 20) None

Compute the SLB distance in parallel between all cells in the csv file intracell_csv_loc.

The files are expected to be formatted according to the format in cajal.run_gw.icdm_csv_validate().

Parameters
  • cell_dms – A collection of distance matrices

  • num_processes (int) – How many Python processes to run in parallel

  • chunksize (int) – How many SLB distances each Python process computes at a time

  • intracell_csv_loc (str) –

  • out_csv (str) –

Return type

None

class quantized_icdm(cell_dm: ndarray[Any, dtype[float64]], p: ndarray[Any, dtype[float64]], num_clusters: int, clusters: Optional[ndarray[Any, dtype[int64]]] = None)

A “quantized” intracell distance matrix.

A metric measure space which has been equipped with a given clustering; it contains additional data which allows for the rapid computation of pairwise GW distances across many cells. Users should only need to understand how to use the constructor. Usage of this class will result in high memory usage if the number of cells to be constructed is large.

Parameters
  • cell_dm (ndarray[Any, dtype[float64]]) – An intracell distance matrix in squareform.

  • p (ndarray[Any, dtype[float64]]) – A probability distribution on the points of the metric space

  • num_clusters (int) – How many clusters to subdivide the cell into; the more clusters, the more accuracy, but the longer the computation.

  • clusters (Optional[ndarray[Any, dtype[int64]]]) – Labels for a clustering of the points in the cell. If no clustering is supplied, one will be derived by hierarchical clustering until num_clusters clusters are formed. If a clustering is supplied, then num_clusters is ignored.

quantized_gw_parallel(intracell_csv_loc: str, num_processes: int, num_clusters: int, out_csv: str, chunksize: int = 20, verbose: bool = False, write_blocksize: int = 100) None

Compute the quantized Gromov-Wasserstein distance in parallel between all cells in a family of cells.

Read icdms from file, quantize them, compute pairwise qGW distances between icdms, and write the result to file.

Parameters
  • intracell_csv_loc (str) – path to a CSV file containing the cells to process

  • num_processes (int) – number of Python processes to run in parallel

  • num_clusters (int) – Each cell will be partitioned into num_clusters many clusters.

  • out_csv (str) – file path where a CSV file containing the quantized GW distances will be written

  • chunksize (int) – How many q-GW distances should be computed at a time by each parallel process.

  • verbose (bool) –

  • write_blocksize (int) –

Return type

None