Αlpha¹ Review in Progress
Batch Effects Remain a Fundamental Barrier to Universal Embeddings in Single-Cell Foundation Models
Wang, L.; Zhang, C.; Zhang, S.
Constructing a cell universe requires integrating heterogeneous single-cell RNA-seq datasets, but is hindered by diverse batch effects. Single-cell foundation models (scFMs), inspired by large language models, aim to learn universal cellular embeddings from large-scale single-cell data. However, unlike language, single-cell data are sparse, noisy, and strongly affected by batch effects that limit cross-dataset transferability. Our systematic evaluation across diverse batch scenarios reveals that current scFMs fail to effectively remove batch effects, with batch signals persisting in pretrained embeddings. Post-hoc batch-centering partially improves alignment, highlighting the need for future scFMs to integrate explicit batch-effect correction mechanisms to achieve true universal cellular embeddings.
Peer Reviews
Peer review in progress...