nispace.datasets.fetch_collection
- nispace.datasets.fetch_collection(collection, dataset=None, maps=None, set_size_range=None, weight_range=None, weight_quantile=None, set_specificity=None, return_maps=False, nispace_data_dir=None, overwrite=False, check_file_hash=True, verbose=True)[source]
Fetch a collection that defines a subset (and optional grouping) of maps.
A collection is a mapping from map IDs to optional set labels and weights. The result is a DataFrame with columns
["map"],["set", "map"], or["set", "map", "weight"]depending on the collection content.Three .collect file formats are supported:
Simple list — plain text, one map ID per line,
mapheader.JSON set dict —
{"set_name": ["map1", "map2", ...], ...}.CSV set table — columns:
map,set, map, orset, map, weight.
- Parameters:
collection (str, Path, ndarray, DataFrame, Series, or list) – When
datasetis given: the name of an integrated collection (e.g."All","BrainSpanWeights"). Whendatasetis None: a path to a.collectfile, or an in-memory array-like / DataFrame that is used directly.dataset (str, optional) – Name of an integrated NiSpace reference dataset (e.g.
"mrna","pet"). If provided,collectionmust be the name of one of that dataset’s registered collections.maps (list, optional) – Restrict to this subset of map IDs after loading.
set_size_range (tuple (int, int), optional) – Keep only sets whose membership count falls within
[min, max](inclusive).weight_range (tuple (float, float), optional) – Keep only entries whose weight is within
[min, max](inclusive). Ignored when the collection has no weights.weight_quantile (float, optional) – Within each set, keep only entries with weight ≥ this quantile. Ignored when the collection has no weights.
set_specificity (float in (0, 1], optional) – Keep only maps that appear in ≤
set_specificityfraction of all sets, i.e. discard ubiquitous maps.return_maps (bool, default False) – If True, return a tuple
(collection_df, maps_avail)wheremaps_availis the deduplicated list of map IDs after all filters.nispace_data_dir (str or Path, optional) – Override the NiSpace data directory (default:
$NISPACE_DATA_DIR).overwrite (bool, default False) – Re-download the collection file even if it is already cached.
check_file_hash (bool, default True) – Verify the SHA-256 hash of the downloaded file.
verbose (bool, default True) – Print progress messages.
- Returns:
collection_df (DataFrame) – Columns:
["map"]for unstructured collections;["set", "map"]for grouped collections;["set", "map", "weight"]for weighted collections.maps_avail (list of str) – Only returned when
return_maps=True. Deduplicated map IDs present incollection_dfafter filtering.