ovrlpy.Ovrlp ============ .. py:class:: ovrlpy.Ovrlp(transcripts, /, KDE_bandwidth = 2.5, min_distance = 8, n_components = 30, *, gene_key = 'gene', coordinate_keys = ('x', 'y', 'z'), n_workers = 1, dtype = np.float32, patch_length = 500, umap_kwargs = UMAP_2D_PARAMS, cumap_kwargs = UMAP_RGB_PARAMS, random_state = None) Main analysis class for spatial overlap analysis. :param transcripts: Transcript information containing coordinates and gene name/id. :type transcripts: polars.DataFrame | pandas.DataFrame :param KDE_bandwidth: The bandwidth of the KDE. :type KDE_bandwidth: float, optional :param min_distance: Minimum distance for cell typing. :type min_distance: float, optional :param n_components: Number of components for PCA. :type n_components: int, optional :param gene_key: Name of the gene column :type gene_key: str :param coordinate_keys: Names of the coordinate columns. :type coordinate_keys: collections.abc.Iterable[str] :param n_workers: Number of threads used in parallel processing. :type n_workers: int :param dtype: Datatype used for KDE calculations. :type dtype: numpy.typing.DTypeLike :param patch_length: Upper bound for size of each patch. (Only relevant for processing) :type patch_length: int :param umap_kwargs: Keyword arguments for 2D UMAP embedding. :type umap_kwargs: dict, optional :param cumap_kwargs: Keyword arguments for 3D UMAP embedding. :type cumap_kwargs: dict, optional :param random_state: Random state used to seed UMAP and PCA. :type random_state: int | numpy.random.RandomState | None .. attribute:: transcripts Transcript information containing coordinates and gene name/id. :type: polars.DataFrame .. attribute:: KDE_bandwidth The bandwidth of the KDE. :type: float .. attribute:: min_distance Minimum distance between pseudocells (local maxima). :type: int .. attribute:: pseudocells Gene expression matrix of the pseudcells. :type: anndata.AnnData .. attribute:: signatures A dataframe of celltypes x gene signatures used to annotate the UMAP. :type: polars.DataFrame .. attribute:: celltype_centers The center of gravity of each celltype in the 2D embedding, used for UMAP annotation. :type: numpy.ndarray .. attribute:: celltype_assignments The assignments of the cell types. :type: numpy.ndarray .. attribute:: pca The PCA object used for the 2D embedding and calculating the VSI score. :type: sklearn.decomposition.PCA .. attribute:: umap_2d The UMAP object used for the 2D embedding. :type: umap.UMAP .. attribute:: pca_rgb The PCA object used for the 3D RGB embedding. :type: sklearn.decomposition.PCA .. attribute:: umap_rgb The UMAP object used for the 3D RGB embedding. :type: umap.UMAP .. attribute:: genes A list of genes to utilize in the model. :type: list .. attribute:: gridsize The size of a pixel. :type: float .. attribute:: origin The origin of the grid (corresponds to the amount the coordinates have been shifted to minimize the analysis area). :type: tuple[float, float] .. attribute:: integrity_map The integrity map of the tissue. :type: numpy.ndarray .. attribute:: signal_map A pixel map of overall signal strength in the tissue, used to mask out low-signal regions. :type: numpy.ndarray .. attribute:: dtype Datatype used for KDE calculations. :type: numpy.typing.DTypeLike .. attribute:: patch_length Upper bound for size of each patch. (Only relevant for processing) :type: int .. attribute:: n_workers Number of threads used in parallel processing. :type: int .. py:method:: process_coordinates(gridsize = 1, **kwargs) Process the coordinates of the transcripts dataframe. :param gridsize: The size of each pixel in the grid. Measured in units of the transcript locations (usually µm). :type gridsize: float, optional :param kwargs: Other keyword arguments are passed to :py:func:`ovrlpy.process_coordinates` .. py:method:: fit_transcripts(/, min_transcripts=10, *, genes = None, fit_umap = True) Fits a spatial transcripts dataset using the SSAM algorithm. :param min_transcripts: Minimum expression for a local maximum to be considered. Expressed in terms of transcripts. :type min_transcripts: float :param genes: A list of genes to utilize in the model. `None` uses all genes. Local maxima are always detected based on all genes. :type genes: collections.abc.Iterable[str] | None :param fit_umap: Whether to fit the UMAP to the data. :type fit_umap: bool .. py:method:: fit_pseudocells(pseudocells, *, fit_umap = True) Fits the expression of pseudocells. :param pseudocells: Gene expression to use for fitting. :type pseudocells: anndata.AnnData :param fit_umap: Whether to fit the UMAP to the data. :type fit_umap: bool .. py:method:: fit_signatures(signatures, key) fit_signatures(signatures: pandas.DataFrame, key: None = None) Fits a signature matrix. :param signatures: A matrix of celltypes x gene signatures to use to annotate the UMAP. :type signatures: polars.DataFrame | pandas.DataFrame :param key: Name of the column with name of the signature. Only used if `signatures` is a :py:class:`polars.DataFrame`, for :py:class:`pandas.DataFrame` the names are expected as index. :type key: str | None .. py:method:: compute_VSI(*, min_transcripts = 2) Calculate the vertical signal integrity (VSI). :param min_transcripts: Minimum expression value to consider in calculation. Defaults to the 110% of the maximum expression profile of two molecules in the KDE. :type min_transcripts: float | None, optional .. py:method:: analyse(gridsize = 1, min_transcripts = 10, genes = None, fit_umap = True) Run main ovrlpy analysis. :param gridsize: The size of the pixel grid. :type gridsize: float, optional :param min_transcripts: Minimum expression for a local maximum to be considered. Expressed in terms of transcripts. :type min_transcripts: float :param genes: A list of genes to utilize in the model. `None` uses all genes. Local maxima are always detected based on all genes. :type genes: collections.abc.Iterable[str] | None :param fit_umap: Whether to fit the UMAP to the data. :type fit_umap: bool .. py:method:: detect_doublets(min_distance = 10, min_integrity = 0.7, min_signal = 3, integrity_sigma = None) This function is used to find individual low peaks of signal integrity in the tissue map as an indicator of single occurrences overlapping cells. :param min_distance: Minimum distance between reported peaks :type min_distance: int, optional :param min_integrity: Threshold of signal integrity value. A peak with an `signal_integrity < min_integrity` is not considered. :type min_integrity: float, optional :param min_signal: Minimum signal value for a peak to be considered :type min_signal: float, optional :param integrity_sigma: Optional sigma value for gaussian filtering of the integrity map, which leads to the detection of overlap regions with larger spatial extent. :type integrity_sigma: float, optional :rtype: polars.DataFrame .. py:method:: subset_transcripts(x, y, *, window_size = 30) Subset the transcript dataframe spatially based on given x, y coordinates and window size. :param x: x-coordinate to center the region :type x: float :param y: y-coordinate to center the region :type y: float :param window_size: The window size of the region. Molecules within this window around (x, y) are returned as a new DataFrame. :type window_size: int, optional :rtype: polars.DataFrame .. py:method:: transform_transcripts(transcripts, *, gene_key = 'gene', coordinate_keys = ['x', 'y', 'z']) Transforms the coordinate dataframe to the 2D and 3D embedding space. :param transcripts: Dataframe of transcript coordinates to transform. :type transcripts: polars.DataFrame | pandas.DataFrame :param gene_key: Name of the gene column. :type gene_key: str :param coordinate_keys: Names of the coordinate columns. :type coordinate_keys: collections.abc.Sequence[str] :returns: * **embedding** (*numpy.ndarray*) -- 2D UMAP embedding * **rgb** (*numpy.ndarray*) -- 3D RGB UMAP embedding .. py:method:: transform_pseudocells(pseudocells) Transforms a matrix of gene expression to the 2D and 3D embedding space. :param pseudocells: A cell x gene matrix of gene expression :type pseudocells: polars.DataFrame | pandas.DataFrame :returns: * **embedding** (*numpy.ndarray*) -- 2D UMAP embedding * **rgb** (*numpy.ndarray*) -- 3D RGB UMAP embedding .. py:method:: pseudocell_integrity() Returns a DataFrame containing the gene-count matrix of the fitted tissue's determined pseudo-cells. :rtype: polars.DataFrame