MERFISH mouse liver

In this notebook, we will use ovrlpy to investigate the Vizgen MERFISH’s mouse liver dataset.

We want to create a signal embedding of the transcriptome, and a vertical signal incoherence map to identify locations with a high risk of containing spatial doublets.

Settings and Imports

First, let’s define settings and input files.

from pathlib import Path

import matplotlib.pyplot as plt

import ovrlpy
sample_nr = 1
slice_nr = 1

data_path = Path("/dh-projects/ag-ishaque/raw_data/vizgen-merfish/vz-liver-showcase")

coordinate_file = (
    data_path / f"Liver{sample_nr}Slice{slice_nr}" / "detected_transcripts.csv"
)

Loading the data

Next, we want to load the data.

coordinate_df = ovrlpy.io.read_MERFISH(coordinate_file)

print(f"Number of transcripts: {len(coordinate_df):,}")
Number of transcripts: 417,243,171
coordinate_df.head()
x y z gene
0 2506.4070 -95.451480 0.0 Comt
1 2531.8447 -95.187020 0.0 Comt
2 2483.7969 -91.360115 0.0 Comt
3 2505.7693 -84.081650 0.0 Comt
4 2501.3940 -81.387090 0.0 Comt

The dataset is quite large, so we will subset to a smaller region.

# subset to region
x_lims = (2000, 9000)
y_lims = (1000, 9000)

coordinate_df = coordinate_df.loc[
    lambda df: (df["x"] > x_lims[0])
    & (df["x"] < x_lims[1])
    & (df["y"] > y_lims[0])
    & (df["y"] < y_lims[1])
]

coordinate_df = coordinate_df.assign(
    x=lambda df: df["x"] - df["x"].min(), y=lambda df: df["y"] - df["y"].min()
)

print(f"Number of transcripts: {len(coordinate_df):,}")
Number of transcripts: 300,693,961

Tissue overview

plt.scatter(coordinate_df.loc[::100, "x"], coordinate_df.loc[::100, "y"], s=0.1)
_ = plt.gca().set_aspect("equal", adjustable="box")
../_images/d06bae9cd51a4f2f676dbfde335b216ce17cd44458e791d55c5157b768981951.png

Compute & Visualize coherence map

integrity, signal, visualizer = ovrlpy.run(
    df=coordinate_df, cell_diameter=7, n_expected_celltypes=10, n_workers=8
)
Running vertical adjustment
Creating gene expression embeddings for visualization:
Analyzing in 3d mode:
determining pseudocells:
found 137299 pseudocells
sampling expression:
100%|██████████| 120/120 [08:21<00:00,  4.18s/it]
Modeling 10 pseudo-celltype clusters;
Creating signal integrity map:
  0%|          | 1/224 [00:03<14:37,  3.94s/it]/dh-projects/ag-ishaque/analysis/muellni/envs/ovrlpy/lib/python3.12/site-packages/ovrlpy/_utils.py:397: RuntimeWarning: invalid value encountered in divide
  spatial_patch_cosine_similarity[patch_signal_mask] = np.sum(
  7%|▋         | 16/224 [00:55<09:28,  2.73s/it]/dh-projects/ag-ishaque/analysis/muellni/envs/ovrlpy/lib/python3.12/site-packages/ovrlpy/_utils.py:397: RuntimeWarning: invalid value encountered in divide
  spatial_patch_cosine_similarity[patch_signal_mask] = np.sum(
  8%|▊         | 17/224 [00:58<09:41,  2.81s/it]/dh-projects/ag-ishaque/analysis/muellni/envs/ovrlpy/lib/python3.12/site-packages/ovrlpy/_utils.py:397: RuntimeWarning: invalid value encountered in divide
  spatial_patch_cosine_similarity[patch_signal_mask] = np.sum(
  8%|▊         | 18/224 [01:01<09:52,  2.88s/it]/dh-projects/ag-ishaque/analysis/muellni/envs/ovrlpy/lib/python3.12/site-packages/ovrlpy/_utils.py:397: RuntimeWarning: invalid value encountered in divide
  spatial_patch_cosine_similarity[patch_signal_mask] = np.sum(
  9%|▉         | 21/224 [01:11<11:27,  3.39s/it]/dh-projects/ag-ishaque/analysis/muellni/envs/ovrlpy/lib/python3.12/site-packages/ovrlpy/_utils.py:397: RuntimeWarning: invalid value encountered in divide
  spatial_patch_cosine_similarity[patch_signal_mask] = np.sum(
 12%|█▎        | 28/224 [01:38<12:15,  3.75s/it]/dh-projects/ag-ishaque/analysis/muellni/envs/ovrlpy/lib/python3.12/site-packages/ovrlpy/_utils.py:397: RuntimeWarning: invalid value encountered in divide
  spatial_patch_cosine_similarity[patch_signal_mask] = np.sum(
 23%|██▎       | 51/224 [03:02<10:44,  3.73s/it]/dh-projects/ag-ishaque/analysis/muellni/envs/ovrlpy/lib/python3.12/site-packages/ovrlpy/_utils.py:397: RuntimeWarning: invalid value encountered in divide
  spatial_patch_cosine_similarity[patch_signal_mask] = np.sum(
 26%|██▌       | 58/224 [03:28<10:35,  3.83s/it]/dh-projects/ag-ishaque/analysis/muellni/envs/ovrlpy/lib/python3.12/site-packages/ovrlpy/_utils.py:397: RuntimeWarning: invalid value encountered in divide
  spatial_patch_cosine_similarity[patch_signal_mask] = np.sum(
 38%|███▊      | 86/224 [05:11<08:12,  3.57s/it]/dh-projects/ag-ishaque/analysis/muellni/envs/ovrlpy/lib/python3.12/site-packages/ovrlpy/_utils.py:397: RuntimeWarning: invalid value encountered in divide
  spatial_patch_cosine_similarity[patch_signal_mask] = np.sum(
 41%|████      | 92/224 [05:33<07:59,  3.63s/it]/dh-projects/ag-ishaque/analysis/muellni/envs/ovrlpy/lib/python3.12/site-packages/ovrlpy/_utils.py:397: RuntimeWarning: invalid value encountered in divide
  spatial_patch_cosine_similarity[patch_signal_mask] = np.sum(
 43%|████▎     | 96/224 [05:47<07:43,  3.62s/it]/dh-projects/ag-ishaque/analysis/muellni/envs/ovrlpy/lib/python3.12/site-packages/ovrlpy/_utils.py:397: RuntimeWarning: invalid value encountered in divide
  spatial_patch_cosine_similarity[patch_signal_mask] = np.sum(
 49%|████▉     | 110/224 [06:38<06:48,  3.59s/it]/dh-projects/ag-ishaque/analysis/muellni/envs/ovrlpy/lib/python3.12/site-packages/ovrlpy/_utils.py:397: RuntimeWarning: invalid value encountered in divide
  spatial_patch_cosine_similarity[patch_signal_mask] = np.sum(
 70%|███████   | 157/224 [09:34<04:09,  3.72s/it]/dh-projects/ag-ishaque/analysis/muellni/envs/ovrlpy/lib/python3.12/site-packages/ovrlpy/_utils.py:397: RuntimeWarning: invalid value encountered in divide
  spatial_patch_cosine_similarity[patch_signal_mask] = np.sum(
 77%|███████▋  | 173/224 [10:35<03:11,  3.75s/it]/dh-projects/ag-ishaque/analysis/muellni/envs/ovrlpy/lib/python3.12/site-packages/ovrlpy/_utils.py:397: RuntimeWarning: invalid value encountered in divide
  spatial_patch_cosine_similarity[patch_signal_mask] = np.sum(
 80%|███████▉  | 179/224 [10:57<02:46,  3.71s/it]/dh-projects/ag-ishaque/analysis/muellni/envs/ovrlpy/lib/python3.12/site-packages/ovrlpy/_utils.py:397: RuntimeWarning: invalid value encountered in divide
  spatial_patch_cosine_similarity[patch_signal_mask] = np.sum(
 83%|████████▎ | 187/224 [11:26<02:17,  3.72s/it]/dh-projects/ag-ishaque/analysis/muellni/envs/ovrlpy/lib/python3.12/site-packages/ovrlpy/_utils.py:397: RuntimeWarning: invalid value encountered in divide
  spatial_patch_cosine_similarity[patch_signal_mask] = np.sum(
 86%|████████▌ | 192/224 [11:44<01:52,  3.51s/it]/dh-projects/ag-ishaque/analysis/muellni/envs/ovrlpy/lib/python3.12/site-packages/ovrlpy/_utils.py:397: RuntimeWarning: invalid value encountered in divide
  spatial_patch_cosine_similarity[patch_signal_mask] = np.sum(
 91%|█████████ | 204/224 [12:29<01:15,  3.78s/it]/dh-projects/ag-ishaque/analysis/muellni/envs/ovrlpy/lib/python3.12/site-packages/ovrlpy/_utils.py:397: RuntimeWarning: invalid value encountered in divide
  spatial_patch_cosine_similarity[patch_signal_mask] = np.sum(
 93%|█████████▎| 208/224 [12:41<00:48,  3.04s/it]/dh-projects/ag-ishaque/analysis/muellni/envs/ovrlpy/lib/python3.12/site-packages/ovrlpy/_utils.py:397: RuntimeWarning: invalid value encountered in divide
  spatial_patch_cosine_similarity[patch_signal_mask] = np.sum(
 95%|█████████▌| 213/224 [13:00<00:40,  3.66s/it]/dh-projects/ag-ishaque/analysis/muellni/envs/ovrlpy/lib/python3.12/site-packages/ovrlpy/_utils.py:397: RuntimeWarning: invalid value encountered in divide
  spatial_patch_cosine_similarity[patch_signal_mask] = np.sum(
 96%|█████████▌| 215/224 [13:08<00:33,  3.68s/it]/dh-projects/ag-ishaque/analysis/muellni/envs/ovrlpy/lib/python3.12/site-packages/ovrlpy/_utils.py:397: RuntimeWarning: invalid value encountered in divide
  spatial_patch_cosine_similarity[patch_signal_mask] = np.sum(
 98%|█████████▊| 220/224 [13:26<00:14,  3.69s/it]/dh-projects/ag-ishaque/analysis/muellni/envs/ovrlpy/lib/python3.12/site-packages/ovrlpy/_utils.py:397: RuntimeWarning: invalid value encountered in divide
  spatial_patch_cosine_similarity[patch_signal_mask] = np.sum(
100%|██████████| 224/224 [13:35<00:00,  3.64s/it]
visualizer.plot_fit()
../_images/c96d830c311cd97892a0ccca6a8a5b7d771c94f885240aa13400bd18b5a6153f.png

Signal integrity of the tissue sample

fig, ax = ovrlpy.plot_signal_integrity(integrity, signal, signal_threshold=3)
../_images/209628886320dac22e9844cb24e39e8ea726f2fd6857fafc257a1ea83041d63a.png

Doublet probability

doublet_df = ovrlpy.detect_doublets(
    integrity, signal, minimum_signal_strength=3, integrity_sigma=3
)
ax = plt.scatter(
    doublet_df["x"], doublet_df["y"], c=doublet_df["integrity"], s=0.1, cmap="viridis_r"
)
plt.gca().set_aspect("equal", adjustable="box")
_ = plt.colorbar(ax)
../_images/e2ad1a667ed77a70b24e055a9d1fb19603c0555f03dfbce13d09fce63e1ed6dd.png

Visualize a specific doublet event like so.

doublet_case = 100

x, y = doublet_df.loc[doublet_case, ["x", "y"]]

_ = ovrlpy.plot_region_of_interest(
    x, y, coordinate_df, visualizer, integrity, signal, window_size=40
)
/dh-projects/ag-ishaque/analysis/muellni/envs/ovrlpy/lib/python3.12/site-packages/sklearn/base.py:493: UserWarning: X does not have valid feature names, but PCA was fitted with feature names
  warnings.warn(
../_images/f0811ae2df29edc2098b497f4da6960cc5b74cd9bf856ec01905daa7a19fda8a.png