Αlpha¹ Review in Progress
A gene program dictionary of human cells
Xu, Y.; Wang, Y.; Geng, Z.; Qin, Y.; Ma, S.
Defining all human cell types and their roles in health and disease is a central goal of biology. Single-cell RNA sequencing has enabled the construction of organ-specific cell atlases, but building a comprehensive organism-wide atlas spanning multiple organs remains challenging due to batch effects, study biases, and inter-organ complexity. Here, we present Gene Program Dictionary (GPD), a framework that leverages robust gene co-expression programs--rather than direct cell integration--to overcome these barriers. Using SpacGPA, a partial correlation-based network method, we analyzed 466 scRNA-seq datasets, generating 1,975 independent networks and 90,701 gene co-expression modules, which were consolidated into 1,534 consensus gene programs representing a wide range of human tissues and cell types. Each program serves as a composite marker, capturing both cell-type-specific and shared biological processes. We demonstrate their utility by mapping endothelial cell subtypes across tissues to reveal their heterogeneity--including tumor-specific programs--annotating colorectal cancer spatial transcriptomes, and linking programs and their corresponding cell types to disease loci, revealing hotspots such as neuronal programs in psychiatric disorders and a proximal tubule program in kidney diseases. GPD provides an organism-wide reference for studying cellular diversity and disease mechanisms.
Peer Reviews
Peer review in progress...