{ "cells": [ { "cell_type": "markdown", "id": "e0d62664", "metadata": {}, "source": [ "# X-Set Enrichment Analysis (XSEA; c.f. ABAnnotate)\n", "\n", "So far, we've been computing colocalization values between a brain map and individual reference maps — one correlation per receptor or gene. But sometimes we don't care about individual maps; we want to know whether the brain pattern of interest is broadly associated with a *group* of maps — say, all serotonin receptors, or all genes expressed in excitatory neurons.\n", "\n", "This is the idea behind **X-Set Enrichment Analysis (XSEA)**. The concept is closely related to Gene Set Enrichment Analysis (GSEA) in transcriptomics, adapted for brain maps.\n", "\n", "The concept of XSEA used here was first implemented in the MATLAB toolbox [**ABAnnotate**](https://github.com/leondlotter/ABAnnotate), a predecessor of NiSpace. If you've used ABAnnotate, you'll find the concepts familiar.\n", "\n", "## The idea\n", "\n", "Given a brain map and a set of reference maps (e.g., all receptor maps for serotonin), XSEA computes:\n", "\n", "1. The mean colocalization between the brain map and all maps in the set.\n", "2. A p-value asking: is this mean colocalization higher than expected when the brain map is replaced by a spatially-constrained random map?\n", "\n", "The result is one statistic per *set*, not per individual map.\n", "\n", "## A note on null models\n", "\n", "There are two natural null hypotheses for XSEA:\n", "\n", "1. **Permute the input map** (default): generate surrogate maps of the brain map and recompute mean colocalizations. This tests whether the observed mean is higher than expected under spatial randomness.\n", "2. **Permute the sets**: randomly sample sets from a larger background population of maps and ask whether the observed set mean is higher than random sets.\n", "\n", "Ben Fulcher et al. (2021, *Nature Communications*) showed that random set sampling can inflate false positives when maps within a set are highly co-expressed (correlated). NiSpace therefore defaults to map permutation, following the approach of ABAnnotate. Pass `permute_sets=True` to switch to set permutation if needed. However, this is currently not recommended. In the future, we will implement within-set correlation-matched set resampling, which can address this issue." ] }, { "cell_type": "code", "execution_count": 1, "id": "6120975d", "metadata": { "execution": { "iopub.execute_input": "2026-06-01T13:06:41.215586Z", "iopub.status.busy": "2026-06-01T13:06:41.215454Z", "iopub.status.idle": "2026-06-01T13:06:41.628207Z", "shell.execute_reply": "2026-06-01T13:06:41.627893Z" } }, "outputs": [], "source": [ "import tqdm.notebook\n", "tqdm.notebook.tqdm = tqdm.tqdm\n", "\n", "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "id": "5aab78c1", "metadata": { "execution": { "iopub.execute_input": "2026-06-01T13:06:41.629849Z", "iopub.status.busy": "2026-06-01T13:06:41.629701Z", "iopub.status.idle": "2026-06-01T13:06:43.536677Z", "shell.execute_reply": "2026-06-01T13:06:43.536382Z" } }, "outputs": [], "source": [ "from nispace.datasets import fetch_reference, fetch_example\n", "from nispace.io import load_img\n", "from nispace.api import NiSpace" ] }, { "cell_type": "markdown", "id": "db130d48", "metadata": {}, "source": [ "## Dataset setup\n", "\n", "We'll run XSEA on the pain map against mRNA gene expression data, testing whether pain-related brain activation broadly aligns with the expression of particular cell types.\n", "\n", "The mRNA reference dataset is derived from the Allen Human Brain Atlas (AHBA) and provides mean gene expression per brain region for thousands of genes, organized into sets such as cell type markers." ] }, { "cell_type": "code", "execution_count": 3, "id": "f6a4ef34", "metadata": { "execution": { "iopub.execute_input": "2026-06-01T13:06:43.538369Z", "iopub.status.busy": "2026-06-01T13:06:43.538191Z", "iopub.status.idle": "2026-06-01T13:06:43.793287Z", "shell.execute_reply": "2026-06-01T13:06:43.792943Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[32mINFO | 01/06/26 15:06:43 | nispace.datasets: Loading mrna maps.\u001b[0m\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[32mINFO | 01/06/26 15:06:43 | nispace.datasets: Loading integrated collection 'CellTypesPsychEncodeTPM' for dataset 'mrna'.\u001b[0m\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[32mINFO | 01/06/26 15:06:43 | nispace.datasets: Filtering maps by collection.\u001b[0m\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[32mINFO | 01/06/26 15:06:43 | nispace.datasets: Loading and inner-merging data parcellated with 'Schaefer100Parcels7Networks' and 'TianS1'\u001b[0m\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "mRNA data: 465 genes across 24 sets x 116 parcels\n", "Set names: ['Ex1 CortProject (L2/3)', 'Ex2 Granule (L3/4)', 'Ex3 Granule (L4)', 'Ex4 SubcortProject (L4)', 'Ex5 SubcortProject (L4-6)', 'Ex6 SubcortProject (L5-6)', 'Ex7 Corticothalamic', 'Ex8 Corticothalamic (L6)', 'In1 VIP+RELN+NDNF+ (L1/2)', 'In2 VIP+RELN-NDNF- (L6)', 'In3 VIP+RELN+NDNF- (L6)', 'In4 VIP-RELN+NDNF+ (L1-3)', 'In5 CCK+NOS1+CALB2+ (L2/3)', 'In6 PVALB+CRHBP+ (L4/5)', 'In7 SST+CALB1+NPY+ (L5/6)', 'In8 SST+NOS1+ (L6)', 'Astrocyte', 'Endothelial', 'Developing-quiescent', 'Developing-replicating', 'Microglia', 'Other Neurons', 'OPC', 'Oligodendrocyte']\n" ] }, { "data": { "text/html": [ "
| \n", " | \n", " | hemi-L_div-Vis_lab-1 | \n", "hemi-L_div-Vis_lab-2 | \n", "hemi-L_div-Vis_lab-3 | \n", "hemi-L_div-Vis_lab-4 | \n", "hemi-L_div-Vis_lab-5 | \n", "hemi-L_div-Vis_lab-6 | \n", "hemi-L_div-Vis_lab-7 | \n", "hemi-L_div-Vis_lab-8 | \n", "hemi-L_div-Vis_lab-9 | \n", "hemi-L_div-SomMot_lab-1 | \n", "... | \n", "hemi-L_lab-PUT | \n", "hemi-L_lab-CAU | \n", "hemi-R_lab-HIP | \n", "hemi-R_lab-AMY | \n", "hemi-R_lab-pTHA | \n", "hemi-R_lab-aTHA | \n", "hemi-R_lab-NAc | \n", "hemi-R_lab-GP | \n", "hemi-R_lab-PUT | \n", "hemi-R_lab-CAU | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| set | \n", "map | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
| Ex1 CortProject (L2/3) | \n", "CAMK2A | \n", "0.842136 | \n", "0.826765 | \n", "0.826967 | \n", "0.827184 | \n", "0.817241 | \n", "0.822723 | \n", "0.815382 | \n", "0.818394 | \n", "0.815004 | \n", "0.819051 | \n", "... | \n", "0.452723 | \n", "0.571612 | \n", "0.792884 | \n", "0.710187 | \n", "0.414296 | \n", "0.366631 | \n", "0.485376 | \n", "0.288345 | \n", "0.537562 | \n", "0.596598 | \n", "
| CCDC88C | \n", "0.490978 | \n", "0.415965 | \n", "0.401705 | \n", "0.323039 | \n", "0.311288 | \n", "0.341007 | \n", "0.394077 | \n", "0.361664 | \n", "0.407064 | \n", "0.432452 | \n", "... | \n", "0.784354 | \n", "0.905244 | \n", "0.417680 | \n", "0.772467 | \n", "0.433421 | \n", "0.556930 | \n", "0.916510 | \n", "0.728213 | \n", "0.748571 | \n", "0.981271 | \n", "|
| CDH9 | \n", "0.742186 | \n", "0.618615 | \n", "0.623595 | \n", "0.567650 | \n", "0.546706 | \n", "0.577347 | \n", "0.681953 | \n", "0.648653 | \n", "0.656170 | \n", "0.708508 | \n", "... | \n", "0.483765 | \n", "0.509307 | \n", "0.638481 | \n", "0.786927 | \n", "0.225884 | \n", "0.266731 | \n", "0.684596 | \n", "0.402965 | \n", "0.512751 | \n", "0.539964 | \n", "
3 rows × 116 columns
\n", "| \n", " | mean_rho | \n", "p | \n", "p_corrected | \n", "
|---|---|---|---|
| Ex7 Corticothalamic | \n", "0.087049 | \n", "0.001 | \n", "0.010495 | \n", "
| Ex8 Corticothalamic (L6) | \n", "0.127390 | \n", "0.001 | \n", "0.010495 | \n", "
| OPC | \n", "0.168996 | \n", "0.002 | \n", "0.020890 | \n", "
| Microglia | \n", "0.250146 | \n", "0.002 | \n", "0.020890 | \n", "
| Developing-replicating | \n", "0.105683 | \n", "0.014 | \n", "0.138153 | \n", "
| Ex6 SubcortProject (L5-6) | \n", "0.131066 | \n", "0.060 | \n", "0.479255 | \n", "
| Ex3 Granule (L4) | \n", "-0.215634 | \n", "0.068 | \n", "0.524137 | \n", "
| Oligodendrocyte | \n", "0.118138 | \n", "0.112 | \n", "0.714241 | \n", "
| Endothelial | \n", "0.141804 | \n", "0.162 | \n", "0.844909 | \n", "
| Developing-quiescent | \n", "0.059943 | \n", "0.170 | \n", "0.859830 | \n", "
| In2 VIP+RELN-NDNF- (L6) | \n", "0.042509 | \n", "0.194 | \n", "0.897133 | \n", "
| In1 VIP+RELN+NDNF+ (L1/2) | \n", "0.142336 | \n", "0.202 | \n", "0.907404 | \n", "
| Astrocyte | \n", "0.175766 | \n", "0.242 | \n", "0.946164 | \n", "
| In7 SST+CALB1+NPY+ (L5/6) | \n", "-0.024495 | \n", "0.458 | \n", "0.998433 | \n", "
| Ex2 Granule (L3/4) | \n", "-0.024170 | \n", "0.672 | \n", "0.999992 | \n", "
| Other Neurons | \n", "0.017340 | \n", "0.684 | \n", "0.999995 | \n", "
| In3 VIP+RELN+NDNF- (L6) | \n", "0.011462 | \n", "0.688 | \n", "0.999995 | \n", "
| In5 CCK+NOS1+CALB2+ (L2/3) | \n", "-0.031559 | \n", "0.750 | \n", "1.000000 | \n", "
| In4 VIP-RELN+NDNF+ (L1-3) | \n", "-0.036649 | \n", "0.808 | \n", "1.000000 | \n", "
| Ex4 SubcortProject (L4) | \n", "-0.033283 | \n", "0.830 | \n", "1.000000 | \n", "
| Ex1 CortProject (L2/3) | \n", "0.017241 | \n", "0.844 | \n", "1.000000 | \n", "
| In8 SST+NOS1+ (L6) | \n", "-0.009339 | \n", "0.878 | \n", "1.000000 | \n", "
| In6 PVALB+CRHBP+ (L4/5) | \n", "-0.019250 | \n", "0.882 | \n", "1.000000 | \n", "
| Ex5 SubcortProject (L4-6) | \n", "-0.018907 | \n", "0.916 | \n", "1.000000 | \n", "