ovrlpy.Visualizer

class ovrlpy.Visualizer(KDE_bandwidth=1.5, celltyping_min_expression=10, celltyping_min_distance=5, n_components_pca=0.7, dtype=np.float32, umap_kwargs=UMAP_2D_PARAMS, cumap_kwargs=UMAP_RGB_PARAMS)

A class to visualize spatial transcriptomics data. Contains a latent gene expression UMAP and RGB embedding.

Parameters:

KDE_bandwidth (float, optional) – The bandwidth of the KDE.
celltyping_min_expression (int, optional) – Minimum expression level for cell typing.
celltyping_min_distance (int, optional) – Minimum distance for cell typing.
n_components_pca (float, optional) – Number of components for PCA.
dtype – Datatype for the KDE.
umap_kwargs (dict, optional) – Keyword arguments for 2D UMAP embedding.
cumap_kwargs (dict, optional) – Keyword arguments for 3D UMAP embedding.

KDE_bandwidth

The bandwidth of the KDE.

Type:: float

celltyping_min_expression

Minimum expression level for cell typing.

Type:: int

celltyping_min_distance

Minimum distance for cell typing.

Type:: int

pseudocell_locations_x

x-coordinates of cell typing regions of interest obtained through gene expression localmax sampling.

Type:: ndarray

pseudocell_locations_y

y-coordinates of cell typing regions of interest obtained through gene expression localmax sampling.

Type:: ndarray

pseudocell_expression_samples

Gene expression matrix of the cell typing regions of interest.

Type:: DataFrame

signatures

A matrix of celltypes x gene signatures to use to annotate the UMAP.

Type:: DataFrame

celltype_centers

The center of gravity of each celltype in the 2d embedding, used for UMAP annotation.

Type:: ndarray

celltype_class_assignments

The class assignments of the cell types.

Type:: ndarray

pca_2d

The PCA object used for the 2d embedding.

Type:: PCA

embedder_2d

The UMAP object used for the 2d embedding.

Type:: umap.UMAP

pca_3d

The PCA object used for the 3d RGB embedding.

Type:: PCA

embedder_3d

The UMAP object used for the 3d RGB embedding.

Type:: umap.UMAP

n_components_pca

Number of components for PCA.

Type:: float

umap_kwargs

Keyword arguments for 2D UMAP embedding object.

Type:: dict

cumap_kwargs

Keyword arguments for 3D UMAP RGB embedding object.

Type:: dict

genes

A list of genes to utilize in the model.

Type:: list

embedding

The 2d embedding of pseudocell gene expression .

Type:: ndarray

colors

The RGB embedding.

Type:: ndarray

colors_min_max

The minimum and maximum values of the RGB embedding, necessary for normalization of the transform method.

Type:: list

integrity_map

The integrity map of the tissue.

Type:: ndarray

signal_map

A pixel map of overall signal strength in the tissue, used to mask out low-signal regions that are difficult to interpret.

Type:: ndarray

fit_transcripts(coordinate_df, genes=None, gene_key='gene', signature_matrix=None, fit_umap=True, patch_length=500, n_workers=8)

Fits the visualizer to a spatial transcripts dataset using the SSAM algorithm.

Parameters:

coordinate_df (DataFrame) – A dataframe of coordinates.
genes (list) – A list of genes to utilize in the model. None uses all genes.
gene_key (str) – The key in the dataframe containing the gene names.
signature_matrix (DataFrame) – A matrix of celltypes x gene signatures to use to annotate the UMAP.
fit_umap (bool) – Whether to fit the UMAP to the data.
patch_length (int) – Size of the length in each dimension when calculating signal integrity in patches. Smaller values will use less memory, but may take longer to compute.
n_workers (int) – The number of workers to use in the SSAM algorithm

fit_pseudocells(pseudocell_expression_samples, *, genes=None, fit_umap=True)

Fits the visualizer to a given pseudocell expression sample.

Parameters:

pseudocell_expression_samples (DataFrame) – A cell x gene matrix of gene expression
genes (list, optional) – A list of genes to utilize in the model.
fit_umap (bool) – Whether to fit the UMAP to the data.

fit_signatures(signature_matrix=None)

Fits the visualizer with a given signature matrix.

Parameters:: signature_matrix (DataFrame) – A matrix of celltypes x gene signatures to use to annotate the UMAP. None defaults to displaying individual genes.

subsample_df(x, y, coordinate_df, window_size=30)

Subsamples the coordinate dataframe spatially based on given x, y coordinates and window size.

Parameters:

x (float) – x-coordinate to center the sampling window
y (float) – y-coordinate to center the sampling window
coordinate_df (DataFrame) – DataFrame of gene annotated molecule coordinates to create the subsample from
window_size (int, optional) – The window size of the sampling window. Molecules within this window around (x,y) are sampled and returned as a new DataFrame.

transform_transcripts(coordinate_df)

Transforms the coordinate dataframe to the visualizers 2d and 3d embedding space.

Parameters:: coordinate_df (DataFrame) – Data frame of gene-annotated molecule coordinates to transform.

transform_pseudocells(pseudocell_expression_samples)

Transforms a matrix of gene expression to the visualizer’s 2d and 3d embedding space.

Parameters:: pseudocell_expression_samples (DataFrame) – A cell x gene matrix of gene expression

pseudocell_df()

Returns a pandas.DataFrame containing the gene-count matrix of the fitted tissue’s determined pseudo-cells.

Return type:: DataFrame

plot_region_of_interest(subsample, subsample_embedding_color, x=None, y=None, window_size=None, rasterized=True, scalebar=SCALEBAR_PARAMS)

Plots an instance of the visualized data.

Parameters:

subsample (DataFrame) – A dataframe of molecule coordinates and gene assignments.
subsample_embedding_color (DataFrame) – A list of rgb values for each molecule.
x (float) – Center x-coordinate for the region-of-interest.
y (float) – Center y-coordinate for the region-of-interest.
window_size (float, optional) – Window size of the region-of-interest.
rasterized (bool, optional) – If True all plots will be rasterized.
scalebar (dict[str, Any] | None) – If None no scalebar will be plotted. Otherwise a dictionary with additional kwargs for matplotlib_scalebar.scalebar.ScaleBar. By default ovrlpy.SCALEBAR_PARAMS

plot_umap(ax=None, rasterized=False, **kwargs)

Plots the UMAP embedding.

Parameters:

ax (Optional[Axes]) – Axis object to plot on.
rasterized (bool, optional) – If True the plot will be rasterized.
kwargs – Keyword arguments for matplotlib.pyplot.scatter().

plot_tissue(rasterized=False, scalebar=SCALEBAR_PARAMS, **kwargs)

Plots the tissue embedding.

Parameters:

rasterized (bool, optional) – If True the plot will be rasterized.
scalebar (dict[str, Any] | None) – If None no scalebar will be plotted. Otherwise a dictionary with additional kwargs for matplotlib_scalebar.scalebar.ScaleBar. By default ovrlpy.SCALEBAR_PARAMS
kwargs – Keyword arguments for the matplotlib’s scatter plot function.

plot_fit(rasterized=True, umap_kwargs={'scatter_kwargs': {'s': 1}}, tissue_kwargs={'s': 1})

Plots the fitted model.

Parameters:: rasterized (bool, optional) – If True all plots will be rasterized.