Xenium mouse brain

In this notebook, we will use ovrlpy to investigate the Xenium’s mouse brain dataset.

We want to create a signal embedding of the transcriptome, and a vertical signal incoherence map to identify locations with a high risk of containing spatial doublets.

We want to create a signal embedding of the transcriptome, and a vertical signal incoherence map to identify locations with a high risk of containing spatial doublets.

Settings and Imports

First we import relevant analysis packages and set the paths to the data files.

from pathlib import Path

import matplotlib.pyplot as plt
import pandas as pd

import ovrlpy

The data file can be downloaded from the link above, and the signatures are available from the GitHub repository (see here).

# TODO: adjust paths
data_path = Path("path/to/downloaded_data/replicate1")

signature_matrix_file = Path(
    "./xenium_mouse_brain_data/celltype_signatures_brain.csv.gz"
)

Loading the data

Now, we want to load the data and prepare it for analysis.

coordinate_df = ovrlpy.io.read_Xenium(data_path / "transcripts.parquet")

coordinate_df.head()
shape: (5, 4)
genexyz
catf32f32f32
"Bhlhe40"4843.0458986427.7299819.068869
"Parm1"4844.6328126223.18261718.520161
"Bhlhe40"4842.9433596478.31054718.500109
"Lyz2"4843.9414066344.55029315.016154
"Dkk3"4843.1625986632.11181615.39468

Let’s get a quick overview of the tissue

# only show every 1,000th transcript
n = 1_000

fig, ax = plt.subplots()
ax.scatter(coordinate_df[::n, "x"], coordinate_df[::n, "y"], s=0.1)
_ = ax.set(aspect="equal")
../_images/97a460b83ba5adc9aa57dd554d3c3bab7597a91ecbe4c7540245e4cf5c6a60a9.png

Running the ovrlpy pipeline

ovrlpy provides a convenience function run to run the entire pipeline. The function creates a signal integrity map, a signal strength map and a Visualizer obejcet to visualize the results.

# ensure reproducibility by setting random_state
brain = ovrlpy.Ovrlp(coordinate_df, n_workers=8, random_state=42)
brain.analyse()
Running vertical adjustment
Creating gene expression embeddings for visualization
determining pseudocells
found 59237 pseudocells
sampling expression:
100%|██████████| 54/54 [01:15<00:00,  1.40s/it]
Modeling 30 pseudo-celltype clusters;
Creating signal integrity map
100%|██████████| 315/315 [07:51<00:00,  1.50s/it]

Visualizing results

The visualizer object has a plotting method to show the embeddings of the sampled gene expression signal.

_ = ovrlpy.plot_pseudocells(brain)
../_images/47d71a6f33992989dfa0da8c73a7d5703b0fe0ae7f12839b037377cbaeac7265.png

We can annotate the UMAP using external, single-cell derived cell type signatures to help interpret the cell type clusters in the gene-expression embedding. Cell type signatures can be created by using the mean gene expression per cell type (or annotation of interest) of a reference dataset.

signatures = pd.read_csv(signature_matrix_file, index_col="gene").T
signatures = signatures.loc[:, signatures.columns.isin(brain.genes)]
signatures.index.name = "celltype"

signatures.iloc[:5, :5]
gene 2010300C02Rik Acsbg1 Acta2 Acvrl1 Adamts2
celltype
Astro 0.000 3.593 0.0 0.0 0.00000
CA1 7.464 0.000 0.0 0.0 0.69060
CA2 9.587 0.000 0.0 0.0 0.00000
CA3 6.306 0.000 0.0 0.0 0.09887
CR 0.000 0.000 0.0 0.0 0.00000
brain.fit_signatures(signatures)
_ = ovrlpy.plot_pseudocells(brain)
../_images/d4be8637551dde73b9c6ce841adbbc68c0736656e1cf6ce4eedcf447adbb09a4.png

In the same way, the signal integrity map can be visualized, where visualization is cut off at regions below a certain signal strength threshold:

fig = ovrlpy.plot_signal_integrity(brain, signal_threshold=3)
../_images/f86cae96bdd87887a63b74daf4933ac1d744c648ac17a66d2eaf0a2d47eb4f2e.png

Detecting doublets

We can detect individual doublet events with ovrlpy, again setting a signal strength threshold to filter out low-transcript regions:

doublets = brain.detect_doublets(min_signal=3, integrity_sigma=2)
fig, ax = plt.subplots()
_scatter = ax.scatter(
    doublets["x"], doublets["y"], c=doublets["integrity"], s=0.2, cmap="viridis"
)
_ = ax.set_aspect("equal")
_ = fig.colorbar(_scatter, ax=ax)
../_images/7b3ff2c1e704621d8b09e30e38687c4bc32a61a6db05440acc71c3de24d4c0e7.png

Having sampled regions of potential doublets, we can visualize them as close-up transcriptome molecule clouds through the Visualizer’s learned color embeddings - by providing their (x, y) locations to ovrlpy.plot_region_of_interest

doublet_case = 0

x, y = doublets["x", "y"].row(doublet_case)

_ = ovrlpy.plot_region_of_interest(brain, x, y, window_size=60)
../_images/26ba2a18b926104a29866560b30e907853369be6e8075af08878276d35fc5525.png

Other functionality

Furthermore, we can save the visualizer object to file for later use leveraging the pickle module

import pickle

with open("my_analysis.pickle", "wb") as file:
    pickle.dump(brain, file)

… and easily reload it if needed.

with open("my_analysis.pickle", "rb") as file:
    brain = pickle.load(file)

Additionally, the analysis has produced a global z-level adjustment of the transcriptome coordinates, which can be used to create a z-stack of adjacent, aligned sections in silico:

import polars as pl

fig = plt.figure()
ax = fig.add_subplot(111, projection="3d")

for i in range(-2, 3):
    subset = brain.transcripts.filter(
        (pl.col("z") - pl.col("z_center")).is_between(i, i + 1)
    )
    # downsample the number of transcripts
    subset = subset[::100]

    ax.scatter(subset["x"], subset["y"], i, s=1, alpha=0.1)

ratio = brain.transcripts["x"].max() / brain.transcripts["y"].max()
ax.set_box_aspect([ratio, 1, 0.75])
../_images/75c8a97ec4d9ebfea57eab552cbc29bc2f34ec4b34aebdbf30c439b44170cffd.png