Xenium mouse brain

In this notebook, we will use ovrlpy to investigate the Xenium’s mouse brain dataset.

We want to create a signal embedding of the transcriptome, and a vertical signal incoherence map to identify locations with a high risk of containing spatial doublets.

We want to create a signal embedding of the transcriptome, and a vertical signal incoherence map to identify locations with a high risk of containing spatial doublets.

Settings and Imports

First we import relevant analysis packages and set the paths to the data files.

from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import ovrlpy
data_path = Path(
    "/dh-projects/ag-ishaque/raw_data/tiesmeys-ovrlpy/Xenium-brain-2024/replicate1"
)

signature_matrix_file = Path(
    "/dh-projects/ag-ishaque/raw_data/Xenium-benchmark/scRNAseq/trimmed_means.csv"
)

result_folder = Path("results")
result_folder.mkdir(exist_ok=True, parents=True)

Loading the data

Now, we want to load the data and prepare it for analysis.

coordinate_df = ovrlpy.io.read_Xenium(data_path / "transcripts.parquet")

coordinate_df.head()
gene x y z
0 Bhlhe40 4843.045898 6427.729980 19.068869
1 Parm1 4844.632812 6223.182617 18.520161
2 Bhlhe40 4842.943359 6478.310547 18.500109
3 Lyz2 4843.941406 6344.550293 15.016154
4 Dkk3 4843.162598 6632.111816 15.394680

Let’s get a quick overview of the tissue

_ = coordinate_df[::1000].plot.scatter(x="x", y="y", s=0.1)
../_images/6b9198edd009e26cee859656f4f9742b5daa47189754d8a08ace7a914b961577.png

Running the ovrlpy pipeline

ovrlpy provides a convenience function run to run the entire pipeline. The function creates a signal integrity map, a signal strength map and a Visualizer obejcet to visualize the results.

signal_integrity, signal_strength, visualizer = ovrlpy.run(
    df=coordinate_df, cell_diameter=10, n_expected_celltypes=30, n_workers=8
)
Running vertical adjustment
Creating gene expression embeddings for visualization:
Analyzing in 3d mode:
determining pseudocells:
found 62341 pseudocells
sampling expression:
100%|██████████| 88/88 [02:21<00:00,  1.61s/it]
Modeling 30 pseudo-celltype clusters;
Creating signal integrity map:
 94%|█████████▍| 296/315 [06:33<00:21,  1.15s/it]/dh-projects/ag-ishaque/analysis/muellni/envs/ovrlpy/lib/python3.12/site-packages/ovrlpy/_utils.py:397: RuntimeWarning: invalid value encountered in divide
  spatial_patch_cosine_similarity[patch_signal_mask] = np.sum(
100%|██████████| 315/315 [06:41<00:00,  1.28s/it]

Visualizing results

The visualizer object has a plotting method to show the embeddings of the sampled gene expression signal.

visualizer.plot_fit()
../_images/597a46e74636d62cceb48bc32dc4d3222dc0dc52d0891c12d610181e726ffce8.png

We can annotate the UMAP using external, single-cell derived cell type signatures to help interpret the cell type clusters in the gene-expression embedding:

signatures = pd.read_csv(signature_matrix_file, index_col=0).T.loc[
    :, lambda df: df.columns.isin(visualizer.genes)
]

signatures = signatures.groupby(
    lambda x: x.split("_")[1].split(" ")[0].split("-")[0]
).mean()

signatures.index = signatures.index.str.replace("/", "-")
signatures.index.name = "celltype"

signatures.iloc[:5, :5]
feature Sox17 Col19a1 2010300C02Rik Satb2 Nrp2
celltype
Astro 0.0 0.000000 0.000000 0.000000 1.316010
CA1 0.0 0.000000 7.463820 2.937198 0.693990
CA2 0.0 0.000000 9.586734 0.000000 2.942782
CA3 0.0 0.051941 6.305638 0.000000 6.810690
CR 0.0 0.000000 0.000000 0.000000 0.000000
visualizer.fit_signatures(signatures)
visualizer.plot_fit()
../_images/acb0dc4068af8e756083ba965734609ef2d5af19d38932c4e1212dd0a99f7743.png

In the same way, the signal integrity map can be visualized, where visualization is cut off at regions below a certain signal strength threshold:

fig, ax = ovrlpy.plot_signal_integrity(
    signal_integrity, signal_strength, signal_threshold=3
)
../_images/be6a16b413ee48f704213e80f74ab210f235e74bc948ee4d5877d690c1b72b5a.png

Detecting doublets

We can detect individual doublet events with ovrlpy, again setting a signal strength threshold to filter out low-transcript regions:

doublet_df = ovrlpy.detect_doublets(
    signal_integrity, signal_strength, minimum_signal_strength=3, integrity_sigma=2
)
_ = plt.scatter(
    doublet_df["x"],
    doublet_df["y"],
    c=doublet_df["integrity"],
    s=0.2,
    cmap="viridis",
    vmin=0,
    vmax=1,
)
_ = plt.gca().set_aspect("equal")
_ = plt.colorbar()
../_images/59a730cec02f22fa40de73791c919903b19b104cc66ce2933166ad2d5c816c83.png

Having sampled regions of potential doublets, we can visualize them as close-up transcriptome molecule clouds through the Visualizer’s learned color embeddings - by providing their (x, y) locations to ovrlpy.plot_region_of_interest

doublet_case = 0

x, y = doublet_df.loc[doublet_case, ["x", "y"]]

_ = ovrlpy.plot_region_of_interest(
    x, y, coordinate_df, visualizer, signal_integrity, signal_strength, window_size=60
)
/dh-projects/ag-ishaque/analysis/muellni/envs/ovrlpy/lib/python3.12/site-packages/sklearn/base.py:493: UserWarning: X does not have valid feature names, but PCA was fitted with feature names
  warnings.warn(
../_images/43089df52fe3399d7846d0a6cf1edd5ad3cf50c824648363ad6630870b97ee39.png

Other functionality

Furthermore, we can save the visualizer object to file for later use leveraging the pickle module

import pickle

with open("my_analysis.pickle", "wb") as file:
    pickle.dump(visualizer, file)

… and easily reload it if needed.

with open("my_analysis.pickle", "rb") as file:
    visualizer = pickle.load(file)

Additionally, the analysis has produced a global z-level adjustment of the transcriptome coordinates, which can be used to create a z-stack of adjacent, aligned sections in silico:

plt.figure(figsize=(20, 5))

ax = plt.subplot(111, projection="3d")

for i in range(-2, 3):
    subset = coordinate_df[(coordinate_df.z - coordinate_df.z_delim).between(i, i + 1)]

    ax.scatter(
        subset.x[::100],
        subset.y[::100],
        np.zeros(1 + (len(subset) // 100)) + i,
        s=1,
        alpha=0.1,
    )
../_images/4c25ce15e102ad195784ad799375a0236df1d812cbdeaed9eed3809835a42d84.png