ovrlpy.Ovrlp
- class ovrlpy.Ovrlp(transcripts, /, KDE_bandwidth=2.5, min_distance=8, n_components=30, *, gene_key='gene', coordinate_keys=('x', 'y', 'z'), n_workers=1, dtype=np.float32, patch_length=500, umap_kwargs=UMAP_2D_PARAMS, cumap_kwargs=UMAP_RGB_PARAMS, random_state=None)
Main analysis class for spatial overlap analysis.
- Parameters:
transcripts (polars.DataFrame | DataFrame) – Transcript information containing coordinates and gene name/id.
KDE_bandwidth (float, optional) – The bandwidth of the KDE.
min_distance (float, optional) – Minimum distance for cell typing.
n_components (int, optional) – Number of components for PCA.
gene_key (str) – Name of the gene column
coordinate_keys (Iterable[str]) – Names of the coordinate columns.
n_workers (int) – Number of threads used in parallel processing.
dtype (numpy.typing.DTypeLike) – Datatype used for KDE calculations.
patch_length (int) – Upper bound for size of each patch. (Only relevant for processing)
umap_kwargs (dict, optional) – Keyword arguments for 2D UMAP embedding.
cumap_kwargs (dict, optional) – Keyword arguments for 3D UMAP embedding.
random_state (int | RandomState | None) – Random state used to seed UMAP and PCA.
- transcripts
Transcript information containing coordinates and gene name/id.
- Type:
polars.DataFrame
- signatures
A dataframe of celltypes x gene signatures used to annotate the UMAP.
- Type:
polars.DataFrame
- celltype_centers
The center of gravity of each celltype in the 2D embedding, used for UMAP annotation.
- Type:
- umap_2d
The UMAP object used for the 2D embedding.
- Type:
umap.UMAP
- umap_rgb
The UMAP object used for the 3D RGB embedding.
- Type:
umap.UMAP
- signal_map
A pixel map of overall signal strength in the tissue, used to mask out low-signal regions.
- Type:
- dtype
Datatype used for KDE calculations.
- Type:
numpy.typing.DTypeLike
- process_coordinates(gridsize=1, **kwargs)
Process the coordinates of the transcripts dataframe.
- Parameters:
gridsize (float, optional) – The size of the pixel grid.
kwargs – Other keyword arguments are passed to
ovrlpy.process_coordinates()
- fit_transcripts(/, min_transcripts=10, *, genes = None, fit_umap = True)
Fits a spatial transcripts dataset using the SSAM algorithm.
- Parameters:
min_transcripts (float) – Minimum expression for a local maximum to be considered. Expressed in terms of transcripts.
genes (Iterable[str] | None) – A list of genes to utilize in the model. None uses all genes. Local maxima are always detected based on all genes.
fit_umap (bool) – Whether to fit the UMAP to the data.
- fit_pseudocells(pseudocells, *, fit_umap=True)
Fits the expression of pseudocells.
- fit_signatures(signatures, key)
- fit_signatures(signatures: DataFrame, key: None = None)
Fits a signature matrix.
- Parameters:
signatures (polars.DataFrame | DataFrame) – A matrix of celltypes x gene signatures to use to annotate the UMAP.
key (str | None) – Name of the column with name of the signature. Only used if signatures is a
polars.DataFrame, forpandas.DataFramethe names are expected as index.
- compute_VSI(*, min_transcripts=2)
Calculate the vertical signal integrity (VSI).
- Parameters:
min_transcripts (float | None, optional) – Minimum expression value to consider in calculation. Defaults to the 110% of the maximum expression profile of two molecules in the KDE.
- analyse(gridsize=1, min_transcripts=10, genes=None, fit_umap=True)
Run main ovrlpy analysis.
- Parameters:
gridsize (float, optional) – The size of the pixel grid.
min_transcripts (float) – Minimum expression for a local maximum to be considered. Expressed in terms of transcripts.
genes (Iterable[str] | None) – A list of genes to utilize in the model. None uses all genes. Local maxima are always detected based on all genes.
fit_umap (bool) – Whether to fit the UMAP to the data.
- detect_doublets(min_distance=10, min_integrity=0.7, min_signal=3, integrity_sigma=None)
This function is used to find individual low peaks of signal integrity in the tissue map as an indicator of single occurrences overlapping cells.
- Parameters:
min_distance (int, optional) – Minimum distance between reported peaks
min_integrity (float, optional) – Threshold of signal integrity value. A peak with an signal_integrity < min_integrity is not considered.
min_signal (float, optional) – Minimum signal value for a peak to be considered
integrity_sigma (float, optional) – Optional sigma value for gaussian filtering of the integrity map, which leads to the detection of overlap regions with larger spatial extent.
- Return type:
polars.DataFrame
- subset_transcripts(x, y, *, window_size=30)
Subset the transcript dataframe spatially based on given x, y coordinates and window size.
- transform_transcripts(transcripts, *, gene_key='gene', coordinate_keys=['x', 'y', 'z'])
Transforms the coordinate dataframe to the 2D and 3D embedding space.
- Parameters:
- Returns:
embedding (numpy.ndarray) – 2D UMAP embedding
rgb (numpy.ndarray) – 3D RGB UMAP embedding
- transform_pseudocells(pseudocells)
Transforms a matrix of gene expression to the 2D and 3D embedding space.
- Parameters:
pseudocells (polars.DataFrame | DataFrame) – A cell x gene matrix of gene expression
- Returns:
embedding (numpy.ndarray) – 2D UMAP embedding
rgb (numpy.ndarray) – 3D RGB UMAP embedding
- pseudocell_integrity()
Returns a DataFrame containing the gene-count matrix of the fitted tissue’s determined pseudo-cells.
- Return type:
polars.DataFrame