ovrlpy.Visualizer
- class ovrlpy.Visualizer(KDE_bandwidth=1.5, celltyping_min_expression=10, celltyping_min_distance=5, n_components_pca=0.7, dtype=np.float32, umap_kwargs=UMAP_2D_PARAMS, cumap_kwargs=UMAP_RGB_PARAMS)
A class to visualize spatial transcriptomics data. Contains a latent gene expression UMAP and RGB embedding.
- Parameters:
KDE_bandwidth (float, optional) – The bandwidth of the KDE.
celltyping_min_expression (int, optional) – Minimum expression level for cell typing.
celltyping_min_distance (int, optional) – Minimum distance for cell typing.
n_components_pca (float, optional) – Number of components for PCA.
dtype – Datatype for the KDE.
umap_kwargs (dict, optional) – Keyword arguments for 2D UMAP embedding.
cumap_kwargs (dict, optional) – Keyword arguments for 3D UMAP embedding.
- pseudocell_locations_x
x-coordinates of cell typing regions of interest obtained through gene expression localmax sampling.
- Type:
- pseudocell_locations_y
y-coordinates of cell typing regions of interest obtained through gene expression localmax sampling.
- Type:
- pseudocell_expression_samples
Gene expression matrix of the cell typing regions of interest.
- Type:
- celltype_centers
The center of gravity of each celltype in the 2d embedding, used for UMAP annotation.
- Type:
- embedder_2d
The UMAP object used for the 2d embedding.
- Type:
umap.UMAP
- embedder_3d
The UMAP object used for the 3d RGB embedding.
- Type:
umap.UMAP
- colors_min_max
The minimum and maximum values of the RGB embedding, necessary for normalization of the transform method.
- Type:
- signal_map
A pixel map of overall signal strength in the tissue, used to mask out low-signal regions that are difficult to interpret.
- Type:
- fit_transcripts(coordinate_df, genes=None, gene_key='gene', signature_matrix=None, fit_umap=True, patch_length=500, n_workers=8)
Fits the visualizer to a spatial transcripts dataset using the SSAM algorithm.
- Parameters:
coordinate_df (DataFrame) – A dataframe of coordinates.
genes (list) – A list of genes to utilize in the model. None uses all genes.
gene_key (str) – The key in the dataframe containing the gene names.
signature_matrix (DataFrame) – A matrix of celltypes x gene signatures to use to annotate the UMAP.
fit_umap (bool) – Whether to fit the UMAP to the data.
patch_length (int) – Size of the length in each dimension when calculating signal integrity in patches. Smaller values will use less memory, but may take longer to compute.
n_workers (int) – The number of workers to use in the SSAM algorithm
- fit_pseudocells(pseudocell_expression_samples, *, genes=None, fit_umap=True)
Fits the visualizer to a given pseudocell expression sample.
- fit_signatures(signature_matrix=None)
Fits the visualizer with a given signature matrix.
- Parameters:
signature_matrix (DataFrame) – A matrix of celltypes x gene signatures to use to annotate the UMAP. None defaults to displaying individual genes.
- subsample_df(x, y, coordinate_df, window_size=30)
Subsamples the coordinate dataframe spatially based on given x, y coordinates and window size.
- Parameters:
x (float) – x-coordinate to center the sampling window
y (float) – y-coordinate to center the sampling window
coordinate_df (DataFrame) – DataFrame of gene annotated molecule coordinates to create the subsample from
window_size (int, optional) – The window size of the sampling window. Molecules within this window around (x,y) are sampled and returned as a new DataFrame.
- transform_transcripts(coordinate_df)
Transforms the coordinate dataframe to the visualizers 2d and 3d embedding space.
- Parameters:
coordinate_df (DataFrame) – Data frame of gene-annotated molecule coordinates to transform.
- transform_pseudocells(pseudocell_expression_samples)
Transforms a matrix of gene expression to the visualizer’s 2d and 3d embedding space.
- Parameters:
pseudocell_expression_samples (DataFrame) – A cell x gene matrix of gene expression
- pseudocell_df()
Returns a pandas.DataFrame containing the gene-count matrix of the fitted tissue’s determined pseudo-cells.
- Return type:
- plot_region_of_interest(subsample, subsample_embedding_color, x=None, y=None, window_size=None, rasterized=True, scalebar=SCALEBAR_PARAMS)
Plots an instance of the visualized data.
- Parameters:
subsample (DataFrame) – A dataframe of molecule coordinates and gene assignments.
subsample_embedding_color (DataFrame) – A list of rgb values for each molecule.
x (float) – Center x-coordinate for the region-of-interest.
y (float) – Center y-coordinate for the region-of-interest.
window_size (float, optional) – Window size of the region-of-interest.
rasterized (bool, optional) – If True all plots will be rasterized.
scalebar (dict[str, Any] | None) – If None no scalebar will be plotted. Otherwise a dictionary with additional kwargs for
matplotlib_scalebar.scalebar.ScaleBar. By defaultovrlpy.SCALEBAR_PARAMS
- plot_umap(ax=None, rasterized=False, **kwargs)
Plots the UMAP embedding.
- Parameters:
rasterized (bool, optional) – If True the plot will be rasterized.
kwargs – Keyword arguments for
matplotlib.pyplot.scatter().
- plot_tissue(rasterized=False, scalebar=SCALEBAR_PARAMS, **kwargs)
Plots the tissue embedding.
- Parameters:
rasterized (bool, optional) – If True the plot will be rasterized.
scalebar (dict[str, Any] | None) – If None no scalebar will be plotted. Otherwise a dictionary with additional kwargs for
matplotlib_scalebar.scalebar.ScaleBar. By defaultovrlpy.SCALEBAR_PARAMSkwargs – Keyword arguments for the matplotlib’s scatter plot function.