Αlpha¹ Review in Progress

RECORD ID: 428CD35B
Peer-Reviewed Manuscript

Batch Effects Remain a Fundamental Barrier to Universal Embeddings in Single-Cell Foundation Models

Authors

Wang, L.; Zhang, C.; Zhang, S.

Abstract

Constructing a cell universe requires integrating heterogeneous single-cell RNA-seq datasets, but is hindered by diverse batch effects. Single-cell foundation models (scFMs), inspired by large language models, aim to learn universal cellular embeddings from large-scale single-cell data. However, unlike language, single-cell data are sparse, noisy, and strongly affected by batch effects that limit cross-dataset transferability. Our systematic evaluation across diverse batch scenarios reveals that current scFMs fail to effectively remove batch effects, with batch signals persisting in pretrained embeddings. Post-hoc batch-centering partially improves alignment, highlighting the need for future scFMs to integrate explicit batch-effect correction mechanisms to achieve true universal cellular embeddings.

Peer Reviews

Peer review in progress...

Community Assessment

Your Assessment

Robust Methods
Supported Claims
Significance