Spotlight10 June 2026

New AI Framework Reveals Limits of Personal Genome-Based Gene Expression Prediction

From Pepkio Team · 10 June 2026 · 2 min read

Understanding exactly how our unique DNA variations dial gene expression up or down remains a major hurdle in personalized medicine and fundamental biology. To help bridge this gap, researchers have developed SAGE-net, a highly efficient deep-learning framework designed to train sequence-to-function models directly on personal genomes, scientists report today in Nature Methods. The work, led by senior author Sara Mostafavi at Genentech and the University of Washington, with first author Anna E. Spiro, shows that while training AI on personal genomes improves expression predictions for known genes, the models still struggle to learn a universal biological "grammar" that can be applied to entirely unseen genetic regions.

The research team used genetic and brain tissue data from established cohorts like ROSMAP and GTEx to test their new framework. They found that SAGE-net matched the performance of standard linear models when predicting expression for genes it was trained on, all while drastically reducing the computing time required for analysis—running up to 70 times faster than some larger reference models. However, when asked to predict the behavior of variants in genes the AI had never seen before, it stumbled. The model effectively identified specific predictive variants rather than learning the underlying, generalizable rules of human gene regulation.

This finding is critical for the field because deep neural networks are widely anticipated to be the key to interpreting rare or completely new genetic mutations—an area where standard statistical models fall short. SAGE-net provides the scalable software necessary to run rapid, iterative experiments, which will be essential for refining these AI architectures until they can accurately predict how any arbitrary sequence of DNA functions in the human body.

The study plainly notes that current deep-learning models have yet to surpass traditional linear methods for predicting personal gene expression. Interestingly, when the team applied SAGE-net to a simpler epigenomic mechanism—DNA methylation—the model successfully generalized its predictions to unseen genetic regions. This suggests that the sheer, multi-layered complexity of gene expression is the primary roadblock, rather than a fundamental flaw in the sequence-to-function modeling approach itself.

While AI has not yet fully cracked the code of individual gene expression, scalable platforms like SAGE-net provide the essential testing ground needed to eventually decode how our personal genetic variations drive health and disease.

Reference:
Spiro, A.E., Tu, X., Sheng, Y. et al. A scalable approach to investigating sequence-to-function predictions from personal genomes. Nat Methods (2026). https://doi.org/10.1038/s41592-026-03124-8

Explore Pepkio

Bioinformatics CRO
Reproducible, publication-style analyses with full source code and methods for academic labs and biotech teams.
The bioinformatics outsourcing playbook
Cost, timelines, vendor selection, and reproducibility for labs weighing whether to outsource bioinformatics.
Free AI-assisted lab tools
Browser calculators for serial dilutions, molarity, PCR setup, plate readers, and more — no account required.