ovrlpy.Ovrlp
============

.. py:class:: ovrlpy.Ovrlp(transcripts, /, KDE_bandwidth = 2.5, min_distance = 8, n_components = 30, *, gene_key = 'gene', coordinate_keys = ('x', 'y', 'z'), n_workers = 1, dtype = np.float32, patch_length = 500, umap_kwargs = UMAP_2D_PARAMS, cumap_kwargs = UMAP_RGB_PARAMS, random_state = None)

   Main analysis class for spatial overlap analysis.

   :param transcripts: Transcript information containing coordinates and gene name/id.
   :type transcripts: polars.DataFrame | pandas.DataFrame
   :param KDE_bandwidth: The bandwidth of the KDE.
   :type KDE_bandwidth: float, optional
   :param min_distance: Minimum distance for cell typing.
   :type min_distance: float, optional
   :param n_components: Number of components for PCA.
   :type n_components: int, optional
   :param gene_key: Name of the gene column
   :type gene_key: str
   :param coordinate_keys: Names of the coordinate columns.
   :type coordinate_keys: collections.abc.Iterable[str]
   :param n_workers: Number of threads used in parallel processing.
   :type n_workers: int
   :param dtype: Datatype used for KDE calculations.
   :type dtype: numpy.typing.DTypeLike
   :param patch_length: Upper bound for size of each patch. (Only relevant for processing)
   :type patch_length: int
   :param umap_kwargs: Keyword arguments for 2D UMAP embedding.
   :type umap_kwargs: dict, optional
   :param cumap_kwargs: Keyword arguments for 3D UMAP embedding.
   :type cumap_kwargs: dict, optional
   :param random_state: Random state used to seed UMAP and PCA.
   :type random_state: int | numpy.random.RandomState | None

   .. attribute:: transcripts

      Transcript information containing coordinates and gene name/id.

      :type: polars.DataFrame

   .. attribute:: KDE_bandwidth

      The bandwidth of the KDE.

      :type: float

   .. attribute:: min_distance

      Minimum distance between pseudocells (local maxima).

      :type: int

   .. attribute:: pseudocells

      Gene expression matrix of the pseudcells.

      :type: anndata.AnnData

   .. attribute:: signatures

      A dataframe of celltypes x gene signatures used to annotate the UMAP.

      :type: polars.DataFrame

   .. attribute:: celltype_centers

      The center of gravity of each celltype in the 2D embedding, used for UMAP annotation.

      :type: numpy.ndarray

   .. attribute:: celltype_assignments

      The assignments of the cell types.

      :type: numpy.ndarray

   .. attribute:: pca

      The PCA object used for the 2D embedding and calculating the VSI score.

      :type: sklearn.decomposition.PCA

   .. attribute:: umap_2d

      The UMAP object used for the 2D embedding.

      :type: umap.UMAP

   .. attribute:: pca_rgb

      The PCA object used for the 3D RGB embedding.

      :type: sklearn.decomposition.PCA

   .. attribute:: umap_rgb

      The UMAP object used for the 3D RGB embedding.

      :type: umap.UMAP

   .. attribute:: genes

      A list of genes to utilize in the model.

      :type: list

   .. attribute:: gridsize

      The size of a pixel.

      :type: float

   .. attribute:: origin

      The origin of the grid (corresponds to the amount the coordinates
      have been shifted to minimize the analysis area).

      :type: tuple[float, float]

   .. attribute:: integrity_map

      The integrity map of the tissue.

      :type: numpy.ndarray

   .. attribute:: signal_map

      A pixel map of overall signal strength in the tissue, used to mask out low-signal regions.

      :type: numpy.ndarray

   .. attribute:: dtype

      Datatype used for KDE calculations.

      :type: numpy.typing.DTypeLike

   .. attribute:: patch_length

      Upper bound for size of each patch. (Only relevant for processing)

      :type: int

   .. attribute:: n_workers

      Number of threads used in parallel processing.

      :type: int


   .. py:method:: process_coordinates(gridsize = 1, **kwargs)

      Process the coordinates of the transcripts dataframe.

      :param gridsize: The size of each pixel in the grid.
                       Measured in units of the transcript locations (usually µm).
      :type gridsize: float, optional
      :param kwargs: Other keyword arguments are passed to :py:func:`ovrlpy.process_coordinates`


   .. py:method:: fit_transcripts(/, min_transcripts=10, *, genes = None, fit_umap = True)

      Fits a spatial transcripts dataset using the SSAM algorithm.

      :param min_transcripts: Minimum expression for a local maximum to be considered. Expressed in terms
                              of transcripts.
      :type min_transcripts: float
      :param genes: A list of genes to utilize in the model. `None` uses all genes.
                    Local maxima are always detected based on all genes.
      :type genes: collections.abc.Iterable[str] | None
      :param fit_umap: Whether to fit the UMAP to the data.
      :type fit_umap: bool


   .. py:method:: fit_pseudocells(pseudocells, *, fit_umap = True)

      Fits the expression of pseudocells.

      :param pseudocells: Gene expression to use for fitting.
      :type pseudocells: anndata.AnnData
      :param fit_umap: Whether to fit the UMAP to the data.
      :type fit_umap: bool


   .. py:method:: fit_signatures(signatures, key)
                  fit_signatures(signatures: pandas.DataFrame, key: None = None)

      Fits a signature matrix.

      :param signatures: A matrix of celltypes x gene signatures to use to annotate the UMAP.
      :type signatures: polars.DataFrame | pandas.DataFrame
      :param key: Name of the column with name of the signature.
                  Only used if `signatures` is a :py:class:`polars.DataFrame`,
                  for :py:class:`pandas.DataFrame` the names are expected as index.
      :type key: str | None


   .. py:method:: compute_VSI(*, min_transcripts = 2)

      Calculate the vertical signal integrity (VSI).

      :param min_transcripts: Minimum expression value to consider in calculation.
                              Defaults to the 110% of the maximum expression profile of two molecules in the KDE.
      :type min_transcripts: float | None, optional


   .. py:method:: analyse(gridsize = 1, min_transcripts = 10, genes = None, fit_umap = True)

      Run main ovrlpy analysis.

      :param gridsize: The size of the pixel grid.
      :type gridsize: float, optional
      :param min_transcripts: Minimum expression for a local maximum to be considered. Expressed in terms
                              of transcripts.
      :type min_transcripts: float
      :param genes: A list of genes to utilize in the model. `None` uses all genes.
                    Local maxima are always detected based on all genes.
      :type genes: collections.abc.Iterable[str] | None
      :param fit_umap: Whether to fit the UMAP to the data.
      :type fit_umap: bool


   .. py:method:: detect_doublets(min_distance = 10, min_integrity = 0.7, min_signal = 3, integrity_sigma = None)

      This function is used to find individual low peaks of signal integrity in the tissue
      map as an indicator of single occurrences overlapping cells.

      :param min_distance: Minimum distance between reported peaks
      :type min_distance: int, optional
      :param min_integrity: Threshold of signal integrity value. A peak with an
                            `signal_integrity < min_integrity` is not considered.
      :type min_integrity: float, optional
      :param min_signal: Minimum signal value for a peak to be considered
      :type min_signal: float, optional
      :param integrity_sigma: Optional sigma value for gaussian filtering of the integrity map,
                              which leads to the detection of overlap regions with larger spatial extent.
      :type integrity_sigma: float, optional

      :rtype: polars.DataFrame


   .. py:method:: subset_transcripts(x, y, *, window_size = 30)

      Subset the transcript dataframe spatially based on given x, y coordinates and window
      size.

      :param x: x-coordinate to center the region
      :type x: float
      :param y: y-coordinate to center the region
      :type y: float
      :param window_size: The window size of the region. Molecules within this window around (x, y)
                          are returned as a new DataFrame.
      :type window_size: int, optional

      :rtype: polars.DataFrame


   .. py:method:: transform_transcripts(transcripts, *, gene_key = 'gene', coordinate_keys = ['x', 'y', 'z'])

      Transforms the coordinate dataframe to the 2D and 3D embedding space.

      :param transcripts: Dataframe of transcript coordinates to transform.
      :type transcripts: polars.DataFrame | pandas.DataFrame
      :param gene_key: Name of the gene column.
      :type gene_key: str
      :param coordinate_keys: Names of the coordinate columns.
      :type coordinate_keys: collections.abc.Sequence[str]

      :returns: * **embedding** (*numpy.ndarray*) -- 2D UMAP embedding
                * **rgb** (*numpy.ndarray*) -- 3D RGB UMAP embedding


   .. py:method:: transform_pseudocells(pseudocells)

      Transforms a matrix of gene expression to the 2D and 3D embedding space.

      :param pseudocells: A cell x gene matrix of gene expression
      :type pseudocells: polars.DataFrame | pandas.DataFrame

      :returns: * **embedding** (*numpy.ndarray*) -- 2D UMAP embedding
                * **rgb** (*numpy.ndarray*) -- 3D RGB UMAP embedding


   .. py:method:: pseudocell_integrity()

      Returns a DataFrame containing the gene-count matrix of the fitted
      tissue's determined pseudo-cells.

      :rtype: polars.DataFrame