ovrlpy.Visualizer

class ovrlpy.Visualizer(KDE_bandwidth=1.5, celltyping_min_expression=10, celltyping_min_distance=5, n_components_pca=0.7, dtype=np.float32, umap_kwargs=UMAP_2D_PARAMS, cumap_kwargs=UMAP_RGB_PARAMS)

A class to visualize spatial transcriptomics data. Contains a latent gene expression UMAP and RGB embedding.

Parameters:
  • KDE_bandwidth (float, optional) – The bandwidth of the KDE.

  • celltyping_min_expression (int, optional) – Minimum expression level for cell typing.

  • celltyping_min_distance (int, optional) – Minimum distance for cell typing.

  • n_components_pca (float, optional) – Number of components for PCA.

  • dtype – Datatype for the KDE.

  • umap_kwargs (dict, optional) – Keyword arguments for 2D UMAP embedding.

  • cumap_kwargs (dict, optional) – Keyword arguments for 3D UMAP embedding.

KDE_bandwidth

The bandwidth of the KDE.

Type:

float

celltyping_min_expression

Minimum expression level for cell typing.

Type:

int

celltyping_min_distance

Minimum distance for cell typing.

Type:

int

pseudocell_locations_x

x-coordinates of cell typing regions of interest obtained through gene expression localmax sampling.

Type:

ndarray

pseudocell_locations_y

y-coordinates of cell typing regions of interest obtained through gene expression localmax sampling.

Type:

ndarray

pseudocell_expression_samples

Gene expression matrix of the cell typing regions of interest.

Type:

DataFrame

signatures

A matrix of celltypes x gene signatures to use to annotate the UMAP.

Type:

DataFrame

celltype_centers

The center of gravity of each celltype in the 2d embedding, used for UMAP annotation.

Type:

ndarray

celltype_class_assignments

The class assignments of the cell types.

Type:

ndarray

pca_2d

The PCA object used for the 2d embedding.

Type:

PCA

embedder_2d

The UMAP object used for the 2d embedding.

Type:

umap.UMAP

pca_3d

The PCA object used for the 3d RGB embedding.

Type:

PCA

embedder_3d

The UMAP object used for the 3d RGB embedding.

Type:

umap.UMAP

n_components_pca

Number of components for PCA.

Type:

float

umap_kwargs

Keyword arguments for 2D UMAP embedding object.

Type:

dict

cumap_kwargs

Keyword arguments for 3D UMAP RGB embedding object.

Type:

dict

genes

A list of genes to utilize in the model.

Type:

list

embedding

The 2d embedding of pseudocell gene expression .

Type:

ndarray

colors

The RGB embedding.

Type:

ndarray

colors_min_max

The minimum and maximum values of the RGB embedding, necessary for normalization of the transform method.

Type:

list

integrity_map

The integrity map of the tissue.

Type:

ndarray

signal_map

A pixel map of overall signal strength in the tissue, used to mask out low-signal regions that are difficult to interpret.

Type:

ndarray

fit_transcripts(coordinate_df, genes=None, gene_key='gene', signature_matrix=None, fit_umap=True, patch_length=500, n_workers=8)

Fits the visualizer to a spatial transcripts dataset using the SSAM algorithm.

Parameters:
  • coordinate_df (DataFrame) – A dataframe of coordinates.

  • genes (list) – A list of genes to utilize in the model. None uses all genes.

  • gene_key (str) – The key in the dataframe containing the gene names.

  • signature_matrix (DataFrame) – A matrix of celltypes x gene signatures to use to annotate the UMAP.

  • fit_umap (bool) – Whether to fit the UMAP to the data.

  • patch_length (int) – Size of the length in each dimension when calculating signal integrity in patches. Smaller values will use less memory, but may take longer to compute.

  • n_workers (int) – The number of workers to use in the SSAM algorithm

fit_pseudocells(pseudocell_expression_samples, *, genes=None, fit_umap=True)

Fits the visualizer to a given pseudocell expression sample.

Parameters:
  • pseudocell_expression_samples (DataFrame) – A cell x gene matrix of gene expression

  • genes (list, optional) – A list of genes to utilize in the model.

  • fit_umap (bool) – Whether to fit the UMAP to the data.

fit_signatures(signature_matrix=None)

Fits the visualizer with a given signature matrix.

Parameters:

signature_matrix (DataFrame) – A matrix of celltypes x gene signatures to use to annotate the UMAP. None defaults to displaying individual genes.

subsample_df(x, y, coordinate_df, window_size=30)

Subsamples the coordinate dataframe spatially based on given x, y coordinates and window size.

Parameters:
  • x (float) – x-coordinate to center the sampling window

  • y (float) – y-coordinate to center the sampling window

  • coordinate_df (DataFrame) – DataFrame of gene annotated molecule coordinates to create the subsample from

  • window_size (int, optional) – The window size of the sampling window. Molecules within this window around (x,y) are sampled and returned as a new DataFrame.

transform_transcripts(coordinate_df)

Transforms the coordinate dataframe to the visualizers 2d and 3d embedding space.

Parameters:

coordinate_df (DataFrame) – Data frame of gene-annotated molecule coordinates to transform.

transform_pseudocells(pseudocell_expression_samples)

Transforms a matrix of gene expression to the visualizer’s 2d and 3d embedding space.

Parameters:

pseudocell_expression_samples (DataFrame) – A cell x gene matrix of gene expression

pseudocell_df()

Returns a pandas.DataFrame containing the gene-count matrix of the fitted tissue’s determined pseudo-cells.

Return type:

DataFrame

plot_region_of_interest(subsample, subsample_embedding_color, x=None, y=None, window_size=None, rasterized=True, scalebar=SCALEBAR_PARAMS)

Plots an instance of the visualized data.

Parameters:
  • subsample (DataFrame) – A dataframe of molecule coordinates and gene assignments.

  • subsample_embedding_color (DataFrame) – A list of rgb values for each molecule.

  • x (float) – Center x-coordinate for the region-of-interest.

  • y (float) – Center y-coordinate for the region-of-interest.

  • window_size (float, optional) – Window size of the region-of-interest.

  • rasterized (bool, optional) – If True all plots will be rasterized.

  • scalebar (dict[str, Any] | None) – If None no scalebar will be plotted. Otherwise a dictionary with additional kwargs for matplotlib_scalebar.scalebar.ScaleBar. By default ovrlpy.SCALEBAR_PARAMS

plot_umap(ax=None, rasterized=False, **kwargs)

Plots the UMAP embedding.

Parameters:
plot_tissue(rasterized=False, scalebar=SCALEBAR_PARAMS, **kwargs)

Plots the tissue embedding.

Parameters:
  • rasterized (bool, optional) – If True the plot will be rasterized.

  • scalebar (dict[str, Any] | None) – If None no scalebar will be plotted. Otherwise a dictionary with additional kwargs for matplotlib_scalebar.scalebar.ScaleBar. By default ovrlpy.SCALEBAR_PARAMS

  • kwargs – Keyword arguments for the matplotlib’s scatter plot function.

plot_fit(rasterized=True, umap_kwargs={'scatter_kwargs': {'s': 1}}, tissue_kwargs={'s': 1})

Plots the fitted model.

Parameters:

rasterized (bool, optional) – If True all plots will be rasterized.