Αlpha¹ Review in Progress

RECORD ID: E4B8432D
Peer-Reviewed Manuscript

Generative single-cell transcriptomics via large language models

Authors

Choi, H.; Shin, H.; Lee, D.; Lee, D.

Abstract

Single-cell and spatial transcriptomics have generated vast atlases of cellular states, yet these data are almost exclusively used for analysis rather than generation. Here we introduce the LLM-based model PGL, Portraying Gene Language, a framework that reframes single-cell transcriptomes as a language generation modeling problem. PGL represents each cell as a long sequence of gene-expression tokens and uses a large language model to synthesize complete single-cell RNA-seq profiles from metadata alone, such as tissue and disease context. PGL-generated cells recapitulate dataset-specific transcriptomic structure, align with known cancer subtype biology, and mix coherently with real single-cell datasets. Notably, generated cells can be used as effective references for spatial transcriptomics, enabling accurate cell-type mapping without matched single-cell atlases. By shifting single-cell modeling from representation learning to cell generation, PGL enables virtual cohort construction, hypothesis generation, and reference-on-demand analysis, positioning generative language models as foundational tools for in silico single-cell biology.

Peer Reviews

Peer review in progress...

Community Assessment

Your Assessment

Robust Methods
Supported Claims
Significance