Researchers Develop AI Model That Maps How Genes Work Together in Human Cells
A step toward better diagnostics and care
Scientists at the Icahn School of Medicine at Mount Sinai have created a new artificial intelligence (AI) model that helps reveal how genes function together inside human cells, offering a powerful new way to understand biology and disease.
The study, published in the May 21 online issue of Patterns, a Cell Press Journal [DOI: https://doi.org/10.1016/j.patter.2026.101565], introduces a gene set foundation model (GSFM) designed to learn patterns in how genes are grouped and function across thousands of biological contexts. The work draws inspiration from advances in large language models (LLMs) such as ChatGPT, which learn how words gain meaning depending on their context. In a similar way, a GSFM learns how genes behave differently depending on their cellular “context.”
“Genes rarely act alone. Instead, they participate in multiple biological processes, forming different molecular groupings depending on where and when they are active in the cell. A single gene can play different roles in different settings, much like a word can have different meanings in different sentences,” says senior corresponding author Avi Ma'ayan, PhD, Professor of Pharmacological Sciences and Director of the Mount Sinai Center for Bioinformatics at the Icahn School of Medicine at Mount Sinai. “Just as modern language models learn the meaning of words from context, we asked whether AI could learn the ‘meaning’ of genes in the same way. Our GSFM was designed to do exactly that.”
The model provides a new way to understand the structural and functional organization of genes and their products inside human cells. This improved understanding could eventually support the development of better diagnostics, biomarkers, and therapies. By mapping how genes relate to one another across many biological situations, the GSFM creates a reference framework that can help scientists interpret complex multi-omics datasets more effectively, say the investigators.
“The organization of genes within cells remains one of the major unsolved questions in biology. The GSFM helps address this by learning from millions of gene groupings derived from published research and gene expression datasets,” says Dr. Ma’ayan.
The model can:
- Help identify the function of poorly understood genes without immediate laboratory experiments
- Highlight genes involved in disease processes
- Suggest potential new drug targets and biomarkers
- Provide a reusable knowledge system for many types of biomedical research data analysis tasks—for example, improved gene set enrichment analysis
In essence, say the investigators, GSFM offers a new “map” of how genes work together in different contexts.
To build the model, the researchers compiled millions of gene sets from published scientific studies and gene expression datasets. In total, the system learned from hundreds of thousands of independent research efforts.
The AI model was trained in a way similar to solving a puzzle: it was given part of a gene set and asked to predict the missing pieces. Over time, it learned underlying patterns that describe how genes are grouped and interact.
The AI model was then benchmarked against other approaches and demonstrated strong performance, including the ability to identify gene-gene and gene-function relationships before they were confirmed experimentally. To evaluate this, the model was trained using gene sets from publications up to a defined cutoff date, and then tested on whether it could predict discoveries reported in studies published after that cutoff date.
“Unlike previous biological AI models that primarily rely on gene expression data, our GSFM is uniquely trained on gene sets, a different and largely underused type of biological information,” says Dr. Ma’ayan. “This approach allows the model to integrate diverse data from many diseases, experimental methods, and research conditions, creating a unified representation of gene relationships across biology.”
GSFMs could enhance existing bioinformatics tools and improve the interpretation of data collected with omics technologies. One immediate application is in gene set enrichment analysis, a widely used method in molecular biology research. By improving how scientists interpret gene groupings, the model may help uncover new biological insights from both existing and future datasets.
The research team plans to expand the system by combining GSFM with other AI foundation models. One goal is to integrate it with language-based models to generate natural-language explanations of gene functions. Another future direction is combining GSFM with drug-focused AI models, with the long-term aim of predicting how drugs interact with cells and supporting the design of new therapeutics.
The gene pages and the GSFM model are accessible at https://gsfm.maayanlab.cloud and https://github.com/MaayanLab/gsfm.
The paper is titled “GSFM: A Gene Set Foundation Model Pre-Trained on a Massive Collection of Diverse Gene Sets.”
The study’s authors, as listed in the journal, are Daniel J. B. Clarke, Giacomo B. Marino, and Avi Ma’ayan.
This work was partially funded by NIH grants OT2OD036435, OT2OD030160, U24CA264250, U24CA271114, R01DK131525, RC2DK131995.
About the Icahn School of Medicine at Mount Sinai
The Icahn School of Medicine at Mount Sinai is internationally renowned for its outstanding research, educational, and clinical care programs. It is the sole academic partner for the seven member hospitals* of the Mount Sinai Health System, one of the largest academic health systems in the United States, providing care to New York City’s large and diverse patient population.
The Icahn School of Medicine at Mount Sinai offers highly competitive MD, PhD, MD-PhD, and master’s degree programs, with enrollment of more than 1,200 students. It has the largest graduate medical education program in the country, with more than 2,700 clinical residents and fellows training throughout the Health System. The Graduate School of Biomedical Sciences offers 13 degree-granting programs, conducts innovative basic and translational research, and trains more than 470 postdoctoral research fellows.
Ranked 11th nationwide in National Institutes of Health (NIH) funding, the Icahn School of Medicine at Mount Sinai is among the 90th percentile of U.S. private medical schools in Sponsored Programs Direct Expenditures per Principal Investigator, according to the Association of American Medical Colleges. More than 6,900 scientists, educators, and clinicians work within and across dozens of academic departments and multidisciplinary institutes with an emphasis on translational research and therapeutics. Through Mount Sinai Innovation Partners (MSIP), the Health System facilitates the real-world application and commercialization of medical breakthroughs made at Mount Sinai.
-------------------------------------------------------
* Mount Sinai Health System member hospitals: The Mount Sinai Hospital; Mount Sinai Brooklyn; Mount Sinai Morningside; Mount Sinai Queens; Mount Sinai South Nassau; Mount Sinai West; and New York Eye and Ear Infirmary of Mount Sinai.
About the Mount Sinai Health System
Mount Sinai Health System is one of the largest academic medical systems in the New York metro area, with more than 47,000 employees working across seven hospitals, more than 400 outpatient practices, more than 600 research and clinical labs, a school of nursing, and leading schools of medicine and graduate education. Mount Sinai advances health for all people, everywhere, by taking on the most complex health care challenges of our time—discovering and applying new scientific learning and knowledge; developing safer, more effective treatments; educating the next generation of medical leaders and innovators; and supporting local communities by delivering high-quality care to all who need it.
Through the integration of its hospitals, labs, and schools, Mount Sinai offers comprehensive health care from conception through geriatrics, leveraging innovative approaches such as artificial intelligence and informatics while keeping patients’ medical and emotional needs at the center of all treatment. The Health System includes more than 6,400 primary and specialty care physicians and 10 free-standing joint-venture centers throughout the five boroughs of New York City, Westchester, Long Island, and Florida. Hospitals within the System are consistently ranked by Newsweek’s® “The World’s Best Smart Hospitals,” “Best in State Hospitals,” “World’s Best Hospitals,” and “Best Specialty Hospitals” and by U.S. News & World Report's® “Best Hospitals” and “Best Children’s Hospitals.” The Mount Sinai Hospital is on the U.S. News & World Report® “Best Hospitals” Honor Roll for 2025-2026.
For more information, visit https://www.mountsinai.org or find Mount Sinai on Facebook, Instagram, LinkedIn, X, and YouTube.