ovrlpy.Ovrlp

class ovrlpy.Ovrlp(transcripts, /, KDE_bandwidth=2.5, min_distance=8, n_components=30, *, gene_key='gene', coordinate_keys=('x', 'y', 'z'), n_workers=1, dtype=np.float32, patch_length=500, umap_kwargs=UMAP_2D_PARAMS, cumap_kwargs=UMAP_RGB_PARAMS, random_state=None)

Main analysis class for spatial overlap analysis.

Parameters:
  • transcripts (polars.DataFrame | DataFrame) – Transcript information containing coordinates and gene name/id.

  • KDE_bandwidth (float, optional) – The bandwidth of the KDE.

  • min_distance (float, optional) – Minimum distance for cell typing.

  • n_components (int, optional) – Number of components for PCA.

  • gene_key (str) – Name of the gene column

  • coordinate_keys (Iterable[str]) – Names of the coordinate columns.

  • n_workers (int) – Number of threads used in parallel processing.

  • dtype (numpy.typing.DTypeLike) – Datatype used for KDE calculations.

  • patch_length (int) – Upper bound for size of each patch. (Only relevant for processing)

  • umap_kwargs (dict, optional) – Keyword arguments for 2D UMAP embedding.

  • cumap_kwargs (dict, optional) – Keyword arguments for 3D UMAP embedding.

  • random_state (int | RandomState | None) – Random state used to seed UMAP and PCA.

transcripts

Transcript information containing coordinates and gene name/id.

Type:

polars.DataFrame

KDE_bandwidth

The bandwidth of the KDE.

Type:

float

min_distance

Minimum distance between pseudocells (local maxima).

Type:

int

pseudocells

Gene expression matrix of the pseudcells.

Type:

AnnData

signatures

A dataframe of celltypes x gene signatures used to annotate the UMAP.

Type:

polars.DataFrame

celltype_centers

The center of gravity of each celltype in the 2D embedding, used for UMAP annotation.

Type:

ndarray

celltype_assignments

The assignments of the cell types.

Type:

ndarray

pca

The PCA object used for the 2D embedding and calculating the VSI score.

Type:

PCA

umap_2d

The UMAP object used for the 2D embedding.

Type:

umap.UMAP

pca_rgb

The PCA object used for the 3D RGB embedding.

Type:

PCA

umap_rgb

The UMAP object used for the 3D RGB embedding.

Type:

umap.UMAP

genes

A list of genes to utilize in the model.

Type:

list

gridsize

The size of a pixel.

Type:

float

origin

The origin of the grid (corresponds to the amount the coordinates have been shifted to minimize the analysis area).

Type:

tuple[float, float]

integrity_map

The integrity map of the tissue.

Type:

ndarray

signal_map

A pixel map of overall signal strength in the tissue, used to mask out low-signal regions.

Type:

ndarray

dtype

Datatype used for KDE calculations.

Type:

numpy.typing.DTypeLike

patch_length

Upper bound for size of each patch. (Only relevant for processing)

Type:

int

n_workers

Number of threads used in parallel processing.

Type:

int

process_coordinates(gridsize=1, **kwargs)

Process the coordinates of the transcripts dataframe.

Parameters:
  • gridsize (float, optional) – The size of each pixel in the grid. Measured in units of the transcript locations (usually µm).

  • kwargs – Other keyword arguments are passed to ovrlpy.process_coordinates()

fit_transcripts(/, min_transcripts=10, *, genes = None, fit_umap = True)

Fits a spatial transcripts dataset using the SSAM algorithm.

Parameters:
  • min_transcripts (float) – Minimum expression for a local maximum to be considered. Expressed in terms of transcripts.

  • genes (Iterable[str] | None) – A list of genes to utilize in the model. None uses all genes. Local maxima are always detected based on all genes.

  • fit_umap (bool) – Whether to fit the UMAP to the data.

fit_pseudocells(pseudocells, *, fit_umap=True)

Fits the expression of pseudocells.

Parameters:
  • pseudocells (AnnData) – Gene expression to use for fitting.

  • fit_umap (bool) – Whether to fit the UMAP to the data.

fit_signatures(signatures, key)
fit_signatures(signatures: DataFrame, key: None = None)

Fits a signature matrix.

Parameters:
  • signatures (polars.DataFrame | DataFrame) – A matrix of celltypes x gene signatures to use to annotate the UMAP.

  • key (str | None) – Name of the column with name of the signature. Only used if signatures is a polars.DataFrame, for pandas.DataFrame the names are expected as index.

compute_VSI(*, min_transcripts=2)

Calculate the vertical signal integrity (VSI).

Parameters:

min_transcripts (float | None, optional) – Minimum expression value to consider in calculation. Defaults to the 110% of the maximum expression profile of two molecules in the KDE.

analyse(gridsize=1, min_transcripts=10, genes=None, fit_umap=True)

Run main ovrlpy analysis.

Parameters:
  • gridsize (float, optional) – The size of the pixel grid.

  • min_transcripts (float) – Minimum expression for a local maximum to be considered. Expressed in terms of transcripts.

  • genes (Iterable[str] | None) – A list of genes to utilize in the model. None uses all genes. Local maxima are always detected based on all genes.

  • fit_umap (bool) – Whether to fit the UMAP to the data.

detect_doublets(min_distance=10, min_integrity=0.7, min_signal=3, integrity_sigma=None)

This function is used to find individual low peaks of signal integrity in the tissue map as an indicator of single occurrences overlapping cells.

Parameters:
  • min_distance (int, optional) – Minimum distance between reported peaks

  • min_integrity (float, optional) – Threshold of signal integrity value. A peak with an signal_integrity < min_integrity is not considered.

  • min_signal (float, optional) – Minimum signal value for a peak to be considered

  • integrity_sigma (float, optional) – Optional sigma value for gaussian filtering of the integrity map, which leads to the detection of overlap regions with larger spatial extent.

Return type:

polars.DataFrame

subset_transcripts(x, y, *, window_size=30)

Subset the transcript dataframe spatially based on given x, y coordinates and window size.

Parameters:
  • x (float) – x-coordinate to center the region

  • y (float) – y-coordinate to center the region

  • window_size (int, optional) – The window size of the region. Molecules within this window around (x, y) are returned as a new DataFrame.

Return type:

polars.DataFrame

transform_transcripts(transcripts, *, gene_key='gene', coordinate_keys=['x', 'y', 'z'])

Transforms the coordinate dataframe to the 2D and 3D embedding space.

Parameters:
  • transcripts (polars.DataFrame | DataFrame) – Dataframe of transcript coordinates to transform.

  • gene_key (str) – Name of the gene column.

  • coordinate_keys (Sequence[str]) – Names of the coordinate columns.

Returns:

  • embedding (numpy.ndarray) – 2D UMAP embedding

  • rgb (numpy.ndarray) – 3D RGB UMAP embedding

transform_pseudocells(pseudocells)

Transforms a matrix of gene expression to the 2D and 3D embedding space.

Parameters:

pseudocells (polars.DataFrame | DataFrame) – A cell x gene matrix of gene expression

Returns:

  • embedding (numpy.ndarray) – 2D UMAP embedding

  • rgb (numpy.ndarray) – 3D RGB UMAP embedding

pseudocell_integrity()

Returns a DataFrame containing the gene-count matrix of the fitted tissue’s determined pseudo-cells.

Return type:

polars.DataFrame