Alternate proteins from the same gene contribute differently to health and rare disease

Whitehead Institute Member Iain Cheeseman, graduate student Jimmy Ly, and colleagues propose that researchers and clinicians may be able to get more information from patients’ genomes by looking at them in a different way.

Greta Friar | Whitehead Institute
November 7, 2025

In a paper published in Molecular Cell on November 7, Whitehead Institute Member Iain Cheeseman, graduate student Jimmy Ly, and colleagues propose that researchers and clinicians may be able to get more information from patients’ genomes by looking at them in a different way.

The common wisdom is that each gene codes for one protein. Someone studying whether a patient has a mutation or version of a gene that contributes to their disease will therefore look for mutations that affect the “known” protein product of that gene. However, Cheeseman and others are finding that the majority of genes code for more than one protein. That means that a mutation that may seem insignificant because it does not appear to affect the known protein could nonetheless alter a different protein made by the same gene. Now, Cheeseman and Ly have shown that mutations affecting one or multiple proteins from the same gene can contribute differently to disease.

In their paper, the researchers first share what they have learned about how cells make use of the ability to generate different versions of proteins from the same gene. Then, they examine how mutations that affect these proteins contribute to disease. Through a collaboration with co-author Mark Fleming, the pathologist-in-chief at Boston Children’s Hospital, they provide two case studies of patients with atypical presentations of a rare anemia linked to mutations that selectively affect only one of two proteins produced by the gene implicated in the disease.

“We hope this work demonstrates the importance of considering whether a gene of interest makes multiple versions of a protein, and what the role of each version is in health and disease,” Ly says. “This information could lead to better understanding of the biology of disease, better diagnostics, and perhaps one day to tailored therapies to treat these diseases.”

Rethinking how cells use genes

Cells have several ways to make different versions of a protein, but the variation that Cheeseman and Ly study happens during protein production from genetic code. Cellular machines build each protein according to the instructions within a genetic sequence that begins at a “start codon” and ends at a “stop codon.” However, some genetic sequences contain more than one start codon, many that are hiding in plain sight. If the cellular machinery skips the first start codon and detects a second one, it may build a shorter version of the protein. In other cases, the machinery may detect a section that closely resembles a start codon at a point earlier in the sequence than its typical starting place, and build a longer version of the protein.

These events may sound like mistakes: the cell’s machinery accidentally creating the wrong version of the correct protein. To the contrary, protein production from these alternate starting places is an important feature of cell biology that exists across species. When Ly traced when certain genes evolved to produce multiple proteins, he found that this is a common, robust process that has been preserved throughout evolutionary history for millions of years.

Ly shows that one function this serves is to send versions of a protein to different parts of the cell. Many proteins contain zip code-like sequences that tell the cell’s machinery where to deliver them so the proteins can do their jobs. Ly found many examples in which longer and shorter versions of the same protein contained different zip codes and ended up in different places within the cell.

In particular, Ly found many cases in which one version of a protein ended up in mitochondria, structures that provide energy to cells, while another version ended up elsewhere. Because of the mitochondria’s role in the essential process of energy production, mutations to mitochondrial genes are often implicated in disease.

Ly wondered what would happen when a disease-causing mutation eliminates one version of a protein but leaves the other intact, causing the protein to only reach one of its two intended destinations. He looked through a database containing genetic information from people with rare diseases to see if such cases existed, and found that they did. In fact, there may be tens of thousands of such cases. However, without access to the people, Ly had no way of knowing what the consequences of this were in terms of symptoms and severity of disease.

Meanwhile, Cheeseman had begun working with Boston Children’s Hospital to foster collaborations between Whitehead Institute and the hospital’s researchers and clinicians to accelerate the pathway from research discovery to clinical application. Through these efforts, Cheeseman and Ly met Fleming.

One group of Fleming’s patients have a type of anemia called SIFD—Sideroblastic Anemia with B-Cell Immunodeficiency, Periodic Fevers, and Developmental Delay—that is caused by mutations to the TRNT1 gene. TRNT1 is one of the genes Ly had identified as producing a mitochondrial version of its protein and another version that ends up elsewhere: in the nucleus.

Fleming shared anonymized patient data with Ly, and Ly found two cases of interest in the genetic data. Most of the patients had mutations that impaired both versions of the protein, but one patient had a mutation that eliminated only the mitochondrial version of the protein, while another patient had a mutation that eliminated only the nuclear version.

When Ly shared his results, Fleming revealed that both of those patients had very atypical presentations of SIFD, supporting Ly’s hypothesis that mutations affecting different versions of a protein would have different consequences. The patient who only had the mitochondrial version was anemic but developmentally normal. The patient missing the mitochondrial version of the protein did not have developmental delays or chronic anemia but did have other immune symptoms, and was not correctly diagnosed until his fifties. There are likely other factors contributing to each patient’s exact presentation of the disease, but Ly’s work begins to unravel the mystery of their atypical symptoms.

Cheeseman and Ly want to make more clinicians aware of the prevalence of genes coding for more than one protein, so they know to check for mutations affecting any of the protein versions that could contribute to disease. For example, several TRNT1 mutations that only eliminate the shorter version of the protein are not flagged as disease-causing by current assessment tools. Cheeseman lab researchers including Ly and graduate student Matteo Di Bernardo are now developing a new assessment tool for clinicians, called SwissIsoform, that will identify relevant mutations that affect specific protein versions, including mutations that would otherwise be missed.

“Jimmy and Iain’s work will globally support genetic disease variant interpretation and help with connecting genetic differences to variation in disease symptoms,” Fleming says. “In fact, we have recently identified two other patients with mutations affecting only the mitochondrial versions of two other proteins, who similarly have milder symptoms than patients with mutations that affect both versions.”

Long term, the researchers hope that their discoveries could aid in understanding the molecular basis of disease and in developing new gene therapies: once researchers understand what has gone wrong within a cell to cause disease, they are better equipped to devise a solution. More immediately, the researchers hope that their work will make a difference by providing better information to clinicians and people with rare diseases.

“As a basic researcher who doesn’t typically interact with patients, there’s something very satisfying about knowing that the work you are doing is helping specific people,” Cheeseman says. “As my lab transitions to this new focus, I’ve heard many stories from people trying to navigate a rare disease and just get answers, and that has been really motivating to us, as we work to provide new insights into the disease biology.”

Jimmy Ly, Matteo Di Bernardo, Yi Fei Tao, Ekaterina Khalizeva, Christopher J. Giuliano, Sebastian Lourido, Mark D. Fleming, Iain M. Cheeseman. “Alternative start codon selection shapes mitochondrial function and rare human diseases.” Molecular Cell, November 7, 2025. DOI: https://10.0.3.248/j.molcel.2025.10.013

A new way to understand and predict gene splicing

The KATMAP model, developed by researchers in the Department of Biology, can predict alternative cell splicing, which allows cells to create endless diversity from the same sets of genetic blueprints.

Lillian Eden | Department of Biology
November 4, 2025

Although heart cells and skin cells contain identical instructions for creating proteins encoded in their DNA, they’re able to fill such disparate niches because molecular machinery can cut out and stitch together different segments of those instructions to create endlessly unique combinations.

The ingenuity of using the same genes in different ways is made possible by a process called splicing and is controlled by splicing factors; which splicing factors a cell employs determines what sets of instructions that cell produces, which, in turn, gives rise to proteins that allow cells to fulfill different functions.

In an open-access paper published today in Nature Biotechnology, researchers in the MIT Department of Biology outlined a framework for parsing the complex relationship between sequences and splicing regulation to investigate the regulatory activities of splicing factors, creating models that can be applied to interpret and predict splicing regulation across different cell types, and even different species. Called Knockdown Activity and Target Models from Additive regression Predictions, KATMAP draws on experimental data from disrupting the expression of a splicing factor and information on which sequences the splicing factor interacts with to predict its likely targets.

Aside from the benefits of a better understanding of gene regulation, splicing mutations — either in the gene that is spliced or in the splicing factor itself — can give rise to diseases such as cancer by altering how genes are expressed, leading to the creation or accumulation of faulty or mutated proteins. This information is critical for developing therapeutic treatments for those diseases. The researchers also demonstrated that KATMAP can potentially be used to predict whether synthetic nucleic acids, a promising treatment option for disorders including a subset of muscular atrophy and epilepsy disorders, affect splicing.

Perturbing splicing 

In eukaryotic cells, including our own, splicing occurs after DNA is transcribed to produce an RNA copy of a gene, which contains both coding and non-coding regions of RNA. The noncoding intron regions are removed, and the coding exon segments are spliced back together to make a near-final blueprint, which can then be translated into a protein.

According to first author Michael P. McGurk, a postdoc in the lab of MIT Professor Christopher Burge, previous approaches could provide an average picture of regulation, but could not necessarily predict the regulation of splicing factors at particular exons in particular genes.

KATMAP draws on RNA sequencing data generated from perturbation experiments, which alter the expression level of a regulatory factor by either overexpressing it or knocking down its levels. The consequences of overexpression or knockdown are that the genes regulated by the splicing factor should exhibit different levels of splicing after perturbation, which helps the model identify the splicing factor’s targets.

Cells, however, are complex, interconnected systems, where one small change can cause a cascade of effects. KATMAP is also able to distinguish between direct targets from indirect, downstream impacts by incorporating known information about the sequence the splicing factor is likely to interact with, referred to as a binding site or binding motif.

“In our analyses, we identify predicted targets as exons that have binding sites for this particular factor in the regions where this model thinks they need to be to impact regulation,” McGurk says, while non-targets may be affected by perturbation but don’t have the likely appropriate binding sites nearby.

This is especially helpful for splicing factors that aren’t as well-studied.

“One of our goals with KATMAP was to try to make the model general enough that it can learn what it needs to assume for particular factors, like how similar the binding site has to be to the known motif or how regulatory activity changes with the distance of the binding sites from the splice sites,” McGurk says.

Starting simple

Although predictive models can be very powerful at presenting possible hypotheses, many are considered “black boxes,” meaning the rationale that gives rise to their conclusions is unclear. KATMAP, on the other hand, is an interpretable model that enables researchers to quickly generate hypotheses and interpret splicing patterns in terms of regulatory factors while also understanding how the predictions were made.

“I don’t just want to predict things, I want to explain and understand,” McGurk says. “We set up the model to learn from existing information about splicing and binding, which gives us biologically interpretable parameters.”

The researchers did have to make some simplifying assumptions in order to develop the model. KATMAP considers only one splicing factor at a time, although it is possible for splicing factors to work in concert with one another. The RNA target sequence could also be folded in such a way that the factor wouldn’t be able to access a predicted binding site, so the site is present but not utilized.

“When you try to build up complete pictures of complex phenomena, it’s usually best to start simple,” McGurk says. “A model that only considers one splicing factor at a time is a good starting point.”

David McWaters, another postdoc in the Burge Lab and a co-author on the paper, conducted key experiments to test and validate that aspect of the KATMAP model.

Future directions

The Burge lab is collaborating with researchers at Dana-Farber Cancer Institute to apply KATMAP to the question of how splicing factors are altered in disease contexts, as well as with other researchers at MIT as part of an MIT HEALS grant to model splicing factor changes in stress responses. McGurk also hopes to extend the model to incorporate cooperative regulation for splicing factors that work together.

“We’re still in a very exploratory phase, but I would like to be able to apply these models to try to understand splicing regulation in disease or development. In terms of variation of splicing factors, they are related, and we need to understand both,” McGurk says.

Burge, the Uncas (1923) and Helen Whitaker Professor and senior author of the paper, will continue to work on generalizing this approach to build interpretable models for other aspects of gene regulation.

“We now have a tool that can learn the pattern of activity of a splicing factor from types of data that can be readily generated for any factor of interest,” says Burge, who is also an extra-mural member of the Koch Institute for Integrative Cancer Research and an associate member of the Broad Institute of MIT and Harvard. “As we build up more of these models, we’ll be better able to infer which splicing factors have altered activity in a disease state from transcriptomic data, to help understand which splicing factors are driving pathology.”

A more precise way to edit the genome

MIT researchers have dramatically lowered the error rate of prime editing, a technique that holds potential for treating many genetic disorders.

Anne Trafton | MIT News
September 17, 2025

A genome-editing technique known as prime editing holds potential for treating many diseases by transforming faulty genes into functional ones. However, the process carries a small chance of inserting errors that could be harmful.

MIT researchers have now found a way to dramatically lower the error rate of prime editing, using modified versions of the proteins involved in the process. This advance could make it easier to develop gene therapy treatments for a variety of diseases, the researchers say.

“This paper outlines a new approach to doing gene editing that doesn’t complicate the delivery system and doesn’t add additional steps, but results in a much more precise edit with fewer unwanted mutations,” says Phillip Sharp, an MIT Institute Professor Emeritus, a member of MIT’s Koch Institute for Integrative Cancer Research, and one of the senior authors of the new study.

With their new strategy, the MIT team was able to improve the error rate of prime editors from about one error in seven edits to one in 101 for the most-used editing mode, or from one error in 122 edits to one in 543 for a high-precision mode.

“For any drug, what you want is something that is effective, but with as few side effects as possible,” says Robert Langer, the David H. Koch Institute Professor at MIT, a member of the Koch Institute, and one of the senior authors of the new study. “For any disease where you might do genome editing, I would think this would ultimately be a safer, better way of doing it.”

Koch Institute research scientist Vikash Chauhan is the lead author of the paper, which appears today in Nature.

The potential for error

The earliest forms of gene therapy, first tested in the 1990s, involved delivering new genes carried by viruses. Subsequently, gene-editing techniques that use enzymes such as zinc finger nucleases to correct genes were developed. These nucleases are difficult to engineer, however, so adapting them to target different DNA sequences is a very laborious process.

Many years later, the CRISPR genome-editing system was discovered in bacteria, offering scientists a potentially much easier way to edit the genome. The CRISPR system consists of an enzyme called Cas9 that can cut double-stranded DNA at a particular spot, along with a guide RNA that tells Cas9 where to cut. Researchers have adapted this approach to cut out faulty gene sequences or to insert new ones, following an RNA template.

In 2019, researchers at the Broad Institute of MIT and Harvard reported the development of prime editing: a new system, based on CRISPR, that is more precise and has fewer off-target effects. A recent study reported that prime editors were successfully used to treat a patient with chronic granulomatous disease (CGD), a rare genetic disease that affects white blood cells.

“In principle, this technology could eventually be used to address many hundreds of genetic diseases by correcting small mutations directly in cells and tissues,” Chauhan says.

One of the advantages of prime editing is that it doesn’t require making a double-stranded cut in the target DNA. Instead, it uses a modified version of Cas9 that cuts just one of the complementary strands, opening up a flap where a new sequence can be inserted. A guide RNA delivered along with the prime editor serves as the template for the new sequence.

Once the new sequence has been copied, however, it must compete with the old DNA strand to be incorporated into the genome. If the old strand outcompetes the new one, the extra flap of new DNA hanging off may accidentally get incorporated somewhere else, giving rise to errors.

Many of these errors might be relatively harmless, but it’s possible that some could eventually lead to tumor development or other complications. With the most recent version of prime editors, this error rate ranges from one per seven edits to one per 121 edits for different editing modes.

“The technologies we have now are really a lot better than earlier gene therapy tools, but there’s always a chance for these unintended consequences,” Chauhan says.

Precise editing

To reduce those error rates, the MIT team decided to take advantage of a phenomenon they had observed in a 2023 study. In that paper, they found that while Cas9 usually cuts in the same DNA location every time, some mutated versions of the protein show a relaxation of those constraints. Instead of always cutting the same location, those Cas9 proteins would sometimes make their cut one or two bases further along the DNA sequence.

This relaxation, the researchers discovered, makes the old DNA strands less stable, so they get degraded, making it easier for the new strands to be incorporated without introducing any errors.

In the new study, the researchers were able to identify Cas9 mutations that dropped the error rate to 1/20th its original value. Then, by combining pairs of those mutations, they created a Cas9 editor that lowered the error rate even further, to 1/36th the original amount.

To make the editors even more accurate, the researchers incorporated their new Cas9 proteins into a prime editing system that has an RNA binding protein that stabilizes the ends of the RNA template more efficiently. This final editor, which the researchers call vPE, had an error rate just 1/60th of the original, ranging from one in 101 edits to one in 543 edits for different editing modes. These tests were performed in mouse and human cells.

The MIT team is now working on further improving the efficiency of prime editors, through further modifications of Cas9 and the RNA template. They are also working on ways to deliver the editors to specific tissues of the body, which is a longstanding challenge in gene therapy.

They also hope that other labs will begin using the new prime editing approach in their research studies. Prime editors are commonly used to explore many different questions, including how tissues develop, how populations of cancer cells evolve, and how cells respond to drug treatment.

“Genome editors are used extensively in research labs,” Chauhan says. “So the therapeutic aspect is exciting, but we are really excited to see how people start to integrate our editors into their research workflows.”

The research was funded by the Life Sciences Research Foundation, the National Institute of Biomedical Imaging and Bioengineering, the National Cancer Institute, and the Koch Institute Support (core) Grant from the National Cancer Institute.

Locally produced proteins help mitochondria function

One of the ways that cells ensure proteins end up where they're needed is creating them at that location, through a process called localized translation. New research from the Weissman Lab has expanded our understanding localized translation at mitochondria and sheds light on the organizational principles of genes and the proteins they encode.

Greta Friar | Whitehead Institute
August 27, 2025

Now, Weissman, who is also a professor of biology at the Massachusetts Institute of Technology and an HHMI Investigator, and postdoc in his lab Jingchuan Luo have expanded our knowledge of localized translation at mitochondria, structures that generate energy for the cell. In a paper published in Cell on August 27, they share a new tool, LOCL-TL, for studying localized translation in close detail, and describe the discoveries it enabled about two classes of proteins that are locally translated at mitochondria.

The importance of localized translation at mitochondria relates to their unusual origin. Mitochondria were once bacteria that lived within our ancestors’ cells. Over time the bacteria lost their autonomy and became part of the larger cells, which included migrating most of their genes into the larger cell’s genome in the nucleus. Cells evolved processes to ensure that proteins needed by mitochondria that are encoded in genes in the larger cell’s genome get transported to the mitochondria. Mitochondria retain a few genes in their own genome, so production of proteins from the mitochondrial genome and that of the larger cell’s genome must be coordinated to avoid mismatched production of mitochondrial parts. Localized translation may help cells to manage the interplay between mitochondrial and nuclear protein production—among other purposes.

How to detect local protein production

For a protein to be made, genetic code stored in DNA is read into RNA, and then the RNA is read or translated by a ribosome, a cellular machine that builds a protein according to the RNA code. Weissman’s lab previously developed a method to study localized translation by tagging ribosomes near a structure of interest, and then capturing the tagged ribosomes in action and observing the proteins they are making. This approach, called proximity-specific ribosome profiling, allows researchers to see what proteins are being made where in the cell. The challenge that Luo faced was how to tweak this method to capture only ribosomes at work near mitochondria.

Ribosomes work quickly, so a ribosome that gets tagged while making a protein at the mitochondria can move on to making other proteins elsewhere in the cell in a matter of minutes. The only way researchers can guarantee that the ribosomes they capture are still working on proteins made near the mitochondria is if the experiment happens very quickly.

Weissman and colleagues had previously solved this time sensitivity problem in yeast cells with a ribosome-tagging tool called BirA that is activated by the presence of the molecule biotin. BirA is fused to the cellular structure of interest, and tags ribosomes it can touch—but only once activated. Researchers keep the cell depleted of biotin until they are ready to capture the ribosomes, to limit the time when tagging occurs. However, this approach does not work with mitochondria in mammalian cells because they need biotin to function normally, so it cannot be depleted.

Luo and Weissman adapted the existing tool to respond to blue light instead of biotin. The new tool, LOV-BirA, is fused to the mitochondria’s outer membrane. Cells are kept in the dark until the researchers are ready. Then they expose the cells to blue light, activating LOV-BirA to tag ribosomes. They give it a few minutes and then quickly extract the ribosomes. This approach proved very accurate at capturing only ribosomes working at mitochondria.

The researchers then used a method originally developed by the Weissman lab to extract the sections of RNA inside of the ribosomes. This allows them to see exactly how far along in the process of making a protein the ribosome is when captured, which can reveal whether the entire protein is made at the mitochondria, or whether it is partly produced elsewhere and only gets completed at the mitochondria.

“One advantage of our tool is the granularity it provides,” Luo says. “Being able to see what section of the protein is locally translated helps us understand more about how localized translation is regulated, which can then allow us to understand its dysregulation in disease and to control localized translation in future studies.”

Two protein groups are made at mitochondria

Using these approaches, the researchers found that about twenty percent of the genes needed in mitochondria that are located in the main cellular genome are locally translated at mitochondria. These proteins can be divided into two distinct groups with different evolutionary histories and mechanisms for localized translation.

One group consists of relatively long proteins, each containing more than 400 amino acids or protein building blocks. These proteins tend to be of bacterial origin—present in the ancestor of mitochondria—and they are locally translated in both mammalian and yeast cells, suggesting that their localized translation has been maintained through a long evolutionary history.

Like many mitochondrial proteins encoded in the nucleus, these proteins contain a mitochondrial targeting sequence (MTS), a zip code that tells the cell where to bring them. The researchers discovered that most proteins containing an MTS also contain a nearby inhibitory sequence that prevents transportation until they are done being made. This group of locally translated proteins lacks the inhibitory sequence, so they are brought to the mitochondria during their production.

Production of these longer proteins begins anywhere in the cell, and then after approximately the first 250 amino acids are made, they get transported to the mitochondria. While the rest of the protein gets made, it is simultaneously fed into a channel that brings it inside the mitochondria. This ties up the channel for a long time, limiting import of other proteins, so cells can only afford to do this simultaneous production and import for select proteins. The researchers hypothesize that these bacterial-origin proteins are given priority as an ancient mechanism to ensure that they are accurately produced and placed within mitochondria.

The second locally translated group consists of short proteins, each less than 200 amino acids long. These proteins are more recently evolved, and correspondingly, the researchers found that the mechanism for their localized translation is not shared by yeast. Their mitochondrial recruitment happens at the RNA level. Two sequences within regulatory sections of each RNA molecule that do not encode the final protein instead code for the cell’s machinery to recruit the RNAs to the mitochondria.

The researchers searched for molecules that might be involved in this recruitment, and identified the RNA binding protein AKAP1, which exists at mitochondria. When they eliminated AKAP1, the short proteins were translated indiscriminately around the cell. This provided an opportunity to learn more about the effects of localized translation, by seeing what happens in its absence. When the short proteins were not locally translated, this led to the loss of various mitochondrial proteins, including those involved in oxidative phosphorylation, our cells’ main energy generation pathway.

In future research, Weissman and Luo will delve deeper into how localized translation affects mitochondrial function and dysfunction in disease. The researchers also intend to use LOCL-TL to study localized translation in other cellular processes, including in relation to embryonic development, neural plasticity, and disease.

“This approach should be broadly applicable to different cellular structures and cell types, providing many opportunities to understand how localized translation contributes to biological processes,” Weissman says. “We’re particularly interested in what we can learn about the roles it may play in diseases including neurodegeneration, cardiovascular diseases, and cancers.”

Luo et al. “Proximity-specific ribosome profiling reveals the logic of localized mitochondrial translation.” Cell, August 27, 2025. https://doi.org/10.1016/j.cell.2025.08.002

Can bacteria be used to clean up oil spills?

The Drennan Lab is working on insights into how nature performs challenging chemistry in oxygen-free environments, with potential applications for remediation, such as cleaning up oil spills, in situations where traditional approaches are ineffective.

Produced by Lillian Eden | Department of Biology
August 28, 2025

Can bacteria clean up oil spills? The short answer: no. Or, at least, not yet.

The Drennan Lab is working to understand how bacteria perform incredible, radical chemistry on inert compounds. Inert compounds, like those that make up crude oil, are challenging to break down because they contain very stable chains of carbon and hydrogen (hydrocarbons). Some microbes have special enzymes that attach another compound to these long, hydrocarbon chains, which makes it possible for the previously inert compound to be degraded. 

Using cryo-electron microscopy, the Drennan Lab recently determined the three-dimensional structure of a glycyl radical enzyme that catalyzes the formation of carbon-carbon bonds, outlined in a recent paper published in PNAS.

This work provides insights into how nature performs challenging chemistry in oxygen-free environments and has potential applications for remediation, such as cleaning up oil spills, in situations where traditional approaches are ineffective. 

This research was led by former postdoc Mary C. Andorfer, who will continue to explore the power of anaerobic microbes as an Assistant Professor at Michigan State University. This work was funded by the National Institutes of Health. Catherine Drennan is a Professor of Biology and Chemistry at MIT and a Howard Hughes Medical Institute Investigator. 

Mapping cells in time and space: a new tool reveals a detailed history of tumor growth

Weissman and colleagues have developed an advanced lineage tracing tool that not only captures an accurate family tree of cell divisions, but also combines that with spatial information: identifying where each cell ends up within a tissue.

Greta Friar | Whitehead Institute
July 24, 2025

All life is connected in a vast family tree. Every organism exists in relationship to its ancestors, descendants, and cousins, and the path between any two individuals can be traced. The same is true of cells within organisms—each of the trillions of cells in the human body is produced through successive divisions from a fertilized egg, and can all be related to one another through a cellular family tree. In simpler organisms such as the worm C. elegans, this cellular family tree has been fully mapped, but the cellular family tree of a human is many times larger and more complex.

In the past, Whitehead Institute Member Jonathan Weissman and other researchers have developed lineage tracing methods to track and reconstruct the family trees of cell divisions in model organisms in order to understand more about the relationships between cells and how they assemble into tissues, organs, and—in some cases—tumors. These methods could help to answer many questions about how organisms develop and diseases like cancer are initiated and progress.

Now, Weissman and colleagues have developed an advanced lineage tracing tool that not only captures an accurate family tree of cell divisions, but also combines that with spatial information: identifying where each cell ends up within a tissue. The researchers used their tool, PEtracer, to observe the growth of metastatic tumors in mice. Combining lineage tracing and spatial data provided the researchers with a detailed view of how elements intrinsic to the cancer cells and from their environments influenced tumor growth, as Weissman and postdocs in his lab Luke Koblan, Kathryn Yost, and Pu Zheng, and graduate student William Colgan share in a paper published in the journal Science on July 24.

“Developing this tool required combining diverse skillsets through the sort of ambitious interdisciplinary collaboration that’s only possible at a place like Whitehead Institute,” says Weissman, who is also a professor of biology at the Massachusetts Institute of Technology and an HHMI Investigator. “Luke came in with an expertise in genetic engineering, Pu in imaging, Katie in cancer biology, and William in computation but the real key to their success was their ability to work together to build PEtracer.”

“Understanding how cells move in time and space is an important way to look at biology, and here we were able to see both of those things in high resolution. The idea is that by understanding both a cell’s past and where it ends up, you can see how different factors throughout its life influenced its behaviors. In this study we use these approaches to look at tumor growth, though in principle we can now begin to apply these tools to study other biology of interest like embryonic development,” Koblan says.

Designing a tool to track cells in space and time

PEtracer tracks cells’ lineages by repeatedly adding short, predetermined codes to the DNA of cells over time. Each piece of code, called a lineage tracing mark, is made up of 5 bases, the building blocks of DNA. These marks are inserted using a gene editing technology called prime editing, which directly rewrites stretches of DNA with minimal undesired byproducts. Over time, each cell acquires more lineage tracing marks, while also maintaining the marks of its ancestors. The researchers can then compare cells’ combinations of marks to figure out relationships and reconstruct the family tree.

“We used computational modeling to design the tool from first principles, to make sure that it was highly accurate, and compatible with imaging technology. We ran many simulations to land on the optimal parameters for a new lineage tracing tool, and then engineered our system to fit those parameters,” Colgan says.

When the tissue—in this case, a tumor growing in the lung of a mouse—had sufficiently grown, the researchers collected these tissues and used advanced imaging approaches to look at each cell’s lineage relationship to other cells via the lineage tracing marks, along with its spatial position within the imaged tissue and its identity (as determined by the levels of different RNAs expressed in each cell). PEtracer is compatible with both imaging approaches and sequencing methods that capture genetic information from single cells.

“Making it possible to collect and analyze all of this data from the imaging was a large challenge,” Zheng says. “What’s particularly exciting to me is not just that we were able to collect terabytes of data, but that we designed the project to collect data that we knew we could use to answer important questions and drive biological discovery.”

Reconstructing the history of a tumor

Combining the lineage tracing, gene expression, and spatial data let the researchers understand how the tumor grew. They could tell how closely related neighboring cells are and compare their traits. Using this approach, the researchers found that the tumors they were analyzing were made up of four distinct modules, or neighborhoods, of cells.

The tumor cells closest to the lung, the most nutrient-dense region, were the most fit, meaning their lineage history indicated the highest rate of cell division over time. Fitness in cancer cells tends to correlate to how aggressively tumors will grow.

The cells at the “leading edge” of the tumor, the far side from the lung, were more diverse and not as fit. Below the leading edge was a low-oxygen neighborhood of cells that might once have been leading edge cells, now trapped in a less desirable spot. Between these cells and the lung-adjacent cells was the tumor core, a region with both living and dead cells as well as cellular debris.

The researchers found that cancer cells across the family tree were equally likely to end up in most of the regions, with the exception of the lung adjacent region, where a few branches of the family tree dominated. This suggests that the cancer cells’ differing traits were heavily influenced by their environments, or the conditions in their local neighborhoods, rather than their family history. Further evidence of this point was that expression of certain fitness-related genes, such as Fgf1/Fgfbp1, correlated to a cell’s location rather than its ancestry. However, lung adjacent cells also had inherited traits that gave them an edge, including expression of the fitness-related gene Cldn4­—showing that family history influenced outcomes as well.

These findings demonstrate how cancer growth is influenced both by factors intrinsic to certain lineages of cancer cells and by environmental factors that shape the behavior of cancer cells exposed to them.

“By looking at so many dimensions of the tumor in concert, we could gain insights that would not have been possible with a more limited view,” Yost says. “Being able to characterize different populations of cells within a tumor will enable researchers to develop therapies that target the most aggressive populations more effectively.”

“Now that we’ve done the hard work of designing the tool, we’re excited to apply it to look at all sorts of questions in health and disease, in embryonic development, and across other model species, with an eye toward understanding important problems in human health,” Koblan says. “The data we collect will also be useful for training AI models of cellular behavior. We’re excited to share this technology with other researchers and see what we all can discover.”

Luke W. Koblan, Kathryn E. Yost, Pu Zheng, William N. Colgan, Matthew G. Jones, Dian Yang, Arhan Kumar, Jaspreet Sandhu, Alexandra Schnell, Dawei Sun, Can Ergen, Reuben A. Saunders, Xiaowei Zhuang, William E. Allen, Nir Yosef, Jonathan S. Weissman. “High-resolution spatial mapping of cell state and lineage dynamics in vivo with PEtracer.” Science, online July 24, 2025. https://doi.org/10.1126/science.adx3800

Yunha Hwang

Education 

  • PhD, 2024, Evolutionary and Organismic Biology, Harvard University
  • MS, 2018, Earth Systems, Stanford University
  • B.Sc, 2018, Computer Science, Stanford University

Research Summary

Microbial genomes encode the largest molecular, biochemical, and functional diversity on Earth. We focus on developing machine learning models and experimental approaches to discover and design novel biological functions. We integrate computation with expertise in evolution, ecology, and biochemistry to characterize and harness the functional potential of microbes.

Putting liver cells in context: new method combines imaging and sequencing to study gene function in living tissue

Researchers in the Weissman Lab have developed a powerful approach that simultaneously measures how genetic changes such as turning off individual genes affect both gene expression and cell structure in intact liver tissue, with the goal of discovering how genes control organ function and disease.

Whitehead Institute
June 12, 2025

 

However, capturing both the “visuals and sound” of biological data, such as gene expression and cell structure data, from the same cells requires researchers to develop new approaches. They also have to make sure that the data they capture accurately reflects what happens in living organisms, including how cells interact with each other and their environments.

Whitehead Institute and Harvard University researchers have taken on these challenges and developed Perturb-Multimodal (Perturb-Multi), a powerful new approach that simultaneously measures how genetic changes such as turning off individual genes affect both gene expression and cell structure in intact liver tissue. The method, described in Cell on June 12, aims to accelerate discovery of how genes control organ function and disease.

The research team, led by Whitehead Institute Member Jonathan Weissman and then-graduate student in his lab Reuben Saunders, along with Xiaowei Zhuang, the David B. Arnold Professor of Science at Harvard University, and then-postdoc in her lab Will Allen, created a system that can test hundreds of different genetic modifications within a single mouse liver while capturing multiple types of data from the same cells.

“Understanding how our organs work requires looking at many different aspects of cell biology at once,” Saunders says. “With Perturb-Multi, we can see how turning off specific genes changes not just what other genes are active, but also how proteins are distributed within cells, how cellular structures are organized, and where cells are located in the tissue. It’s like having multiple specialized microscopes all focused on the same experiment.”

“This approach accelerates discovery by both allowing us to test the functions of many different genes at once, and then for each gene, allowing us to measure many different functional outputs or cell properties at once—and we do that in intact tissue from animals,” says Zhuang, who is also an HHMI Investigator.

A more efficient approach to genetic studies

Traditional genetic studies in mice often turn off one gene in an animal, and then observe what changes in that gene’s absence to learn about what the gene does. The researchers designed their approach to turn off hundreds of different genes across a single liver, while still only turning off one gene per cell—using what is known as a mosaic approach. This allowed them to study the roles of hundreds of individual genes at once in a single animal. The researchers then collected diverse types of data from cells across the same liver to get a full picture of the consequences of turning off the genes.

“Each cell serves as its own experiment, and because all the cells are in the same animal, we eliminate the variability that comes from comparing different mice,” Saunders says. “Every cell experiences the same physiological conditions, diet, and environment, making our comparisons much more precise.”

“The challenge we faced was that tissues, to perform their functions, rely on thousands of genes, expressed in many different cells, working together. Each gene, in turn, can control many aspects of a cell’s function. Testing these hundreds of genes in mice using current methods would be extremely slow and expensive—near impossible in practice,” Allen says.

Revealing new biology through combined measurements

The team applied Perturb-Multi to study genetic controls of liver physiology and function. Their study led to discoveries in three important aspects of liver biology: fat accumulation in liver cells—a precursor to liver disease; stress responses; and hepatocyte zonation (how liver cells specialize, assuming different traits and functions, based on their location within the liver).

“Overcoming the inherent complexity of biology in living animals required developing new tools that bridge multiple disciplines – including, in this case, genomics, imaging, and AI,” Allen says.

One striking finding emerged from studying genes that, when disrupted, cause fat accumulation in liver cells. The imaging data revealed that four different genes all led to similar fat droplet accumulation, but the sequencing data showed they did so through three completely different mechanisms.

“Without combining imaging and sequencing, we would have missed this complexity entirely,” Saunders says. “The imaging told us which genes affect fat accumulation, while the sequencing revealed whether this was due to increased fat production, cellular stress, or other pathways. This kind of mechanistic insight could be crucial for developing targeted therapies for fatty liver disease.”

The researchers also discovered new regulators of liver cell zonation. Unexpectedly, the newly discovered regulators include genes involved in modifying the extracellular matrix—the scaffolding between cells. “We found that cells can change their specialized functions without physically moving to a different zone,” Saunders says. “This suggests that liver cell identity is more flexible than previously thought.”

Technical innovation enables new science

Developing Perturb-Multi required solving several technical challenges. The team created new methods for preserving the content of interest in cells—RNA and proteins—during tissue processing, for collecting many types of imaging data and single-cell gene expression data from tissue samples that have been fixed with a preservative, and for integrating multiple types of data from the same cells.

“Overcoming the inherent complexity of biology in living animals required developing new tools that bridge multiple disciplines – including, in this case, genomics, imaging, and AI,” Allen says.

The two components of Perturb-Multi—the imaging and sequencing assays—together, applied to the same tissue, provide insights that are unattainable through either assay alone.

“Each component had to work perfectly while not interfering with the others,” says Weissman, who is also a professor of biology at the Massachusetts Institute of Technology and an HHMI Investigator. “The technical development took considerable effort, but the payoff is a system that can reveal biology we simply couldn’t see before.”

Expanding to new organs and other contexts

The researchers plan to expand Perturb-Multi to other organs, including the brain, and to study how genetic changes affect organ function under different conditions like disease states or dietary changes.

“Without combining imaging and sequencing, we would have missed this complexity entirely,” Saunders says.

“We’re also excited about using the data we generate to train machine learning models,” adds Saunders. “With enough examples of how genetic changes affect cells, we could eventually predict the effects of mutations without having to test them experimentally—a ‘virtual cell’ that could accelerate both research and drug development.”

“Perturbation data are critical for training such AI models and the paucity of existing perturbation data represents a major hindrance in such ‘virtual cell’ efforts,” Zhuang says. “We hope Perturb-Multi will fill this gap by accelerating the collection of perturbation data.”

The approach is designed to be scalable, with the potential for genome-wide studies that test thousands of genes simultaneously. As sequencing and imaging technologies continue to improve, the researchers anticipate that Perturb-Multi will become even more powerful and accessible to the broader research community.

“Our goal is to keep scaling up. We plan to do genome-wide perturbations, study different physiological conditions, and look at different organs,” says Weissman. “That we can now collect so many types of data from so many cells, at speed, is going to be critical for building AI models like virtual cells, and I think it’s going to help us answer previously unsolvable questions about health and disease.”

Notes

Reuben A. Saunders, William E. Allen, Xingjie Pan, Jaspreet Sandhu, Jiaqi Lu, Thomas K. Lau, Karina Smolyar, Zuri A. Sullivan, Catherine Dulac, Jonathan S. Weissman, Xiaowei Zhuang. “Perturb-Multimodal: a Platform for Pooled Genetic Screens with Sequencing and Imaging in Intact Mammalian Tissue.” Cell, June 12, 2025. DOI: 10.1016/j.cell.2025.05.022.

MIT Down syndrome researchers work on ways to ensure a healthy lifespan

An Alana Down Syndrome Center webinar, co-sponsored by the Massachusetts Down Syndrome Congress, presented numerous MIT studies that all share the goal of improving health throughout life for people with trisomy 21.

David Orenstein | The Picower Institute for Learning and Memory
April 24, 2025

In recent decades the life expectancy of people with Down syndrome has surged past 60 years, so the focus of research at the Alana Down Syndrome Center at MIT has been to make sure people can enjoy the best health during that increasing timeframe.

“A person with Down syndrome can live a long and happy life,” said Rosalind Mott Firenze, scientific director of the center founded at MIT in 2019 with a gift from the Alana Foundation. “So the question is now how do we improve health and maximize ability through the years? It’s no longer about lifespan, but about healthspan.”

Firenze and three of the center’s Alana Fellows scientists spoke during a webinar, hosted on April 17th, where they described the center’s work toward that goal. An audience of 99 people signed up to hear the webinar titled “Building a Better Tomorrow for Down Syndrome Through Research and Technology,” with many viewers hailing from the Massachusetts Down Syndrome Congress, which co-sponsored the event.

The research they presented covered ways to potentially improve health from stages before birth to adulthood in areas such as brain function, heart development, and sleep quality.

Boosting brain waves

One of the center’s most important areas of research involves testing whether boosting the power of a particular frequency of brain activity—“gamma” brain waves of 40Hz—can improve brain development and function. The lab of the center’s Director Li-Huei Tsai, Picower Professor in The Picower Institute for Learning and Memory and the Department of Brain and Cognitive Sciences, uses light that flickers and sound that clicks 40 times a second to increase that rhythm in the brain. In early studies of people with Alzheimer’s disease, which is a major health risk for people with Down syndrome, the non-invasive approach has proved safe, and appears to improve memory while preventing brain cells from dying. The reason it works appears to be because it promotes a healthy response among many types of brain cells.

Working with mice that genetically model Down syndrome, Alana Fellow Dong Shin Park has been using the sensory stimulation technology to study whether the healthy cellular response can affect brain development in a fetus while a mother is pregnant. In ongoing research, he said, he’s finding that exposing pregnant mice to the light and sound appears to improve fetal brain development and brain function in the pups after they are born.

In his research, Postdoctoral Associate Md. Rezaul Islam worked with 40Hz sensory stimulation and Down syndrome model mice at a much later stage in life—when they are adult aged. Together with former Tsai Lab member Brennan Jackson, he found that when the mice were exposed to the light and sound, their memory improved. The underlying reason seemed to be an increase not only in new connections among their brain cells, but also an increase in the generation of new ones. The research, currently online as a preprint, is set to publish in a peer-reviewed journal very soon.

Firenze said the Tsai lab has also begun to test the sensory stimulation in human adults with Down syndrome. In that testing, which is led by Dr. Diane Chan, it is proving safe and well tolerated, so the lab is hoping to do a year-long study with volunteers to see if the stimulation can delay or prevent the onset of Alzheimer’s disease.

Studying cells

Many Alana Center researchers are studying other aspects of the biology of cells in Down syndrome to improve healthspan. Leah Borden, an Alana Fellow in the lab of Biology Professor Laurie Boyer, is studying differences in heart development. Using advanced cultures of human heart tissues grown from trisomy 21 donors, she is finding that tissue tends to be stiffer than in cultures made from people without the third chromosome copy. The stiffness, she hypothesizes, might affect cellular function and migration during development, contributing to some of the heart defects that are common in the Down syndrome population.

Firenze pointed to several other advanced cell biology studies going on in the center. Researchers in the lab of Computer Science Professor Manolis Kellis, for instance, have used machine learning and single cell RNA sequencing to map the gene expression of more than 130,000 cells in the brains of people with or without Down syndrome to understand differences in their biology.

Researchers the lab of Y. Eva Tan Professor Edward Boyden, meanwhile, are using advanced tissue imaging techniques to look into the anatomy of cells in mice, Firenze said. They are finding differences in the structures of key organelles called mitochondria that provide cells with energy.

And in 2022, Firenze recalled, Tsai’s lab published a study showing that brain cells in Down syndrome mice exhibited a genome-wide disruption in how genes are expressed, leading them to take on a more senescent, or aged-like, state.

Striving for better sleep

One other theme of the Alana Center’s research that Firenze highlighted focuses on ways to understand and improve sleep for people with Down syndrome. In mouse studies in Tsai’s lab, they’ve begun to measure sleep differences between model and neurotypical mice to understand more about the nature of sleep disruptions.

“Sleep is different and we need to address this because it’s a key factor in your health,” Firenze said.

Firenze also highlighted how the Alana Center has collaborated with MIT’s Desphande Center for Technological Innovation to help advance a new device for treating sleep apnea in people with Down syndrome. Led by Mechanical Engineering Associate Professor Ellen Roche, the ZzAlign device improves on current technology by creating a custom-fit oral prosthesis accompanied by just a small tube to provide the needed air pressure to stabilize mouth muscles and prevent obstruction of the airway.

Through many examples of research projects aimed at improving brain and heart health and enhancing sleep, the webinar presented how MIT’s Alana Down Syndrome Center is working to advance the healthspan of people with Down syndrome.

 

MIT biologists discover a new type of control over RNA splicing

They identified proteins that influence splicing of about half of all human introns, allowing for more complex types of gene regulation.

Anne Trafton | MIT News
February 20, 2025

RNA splicing is a cellular process that is critical for gene expression. After genes are copied from DNA into messenger RNA, portions of the RNA that don’t code for proteins, called introns, are cut out and the coding portions are spliced back together.

This process is controlled by a large protein-RNA complex called the spliceosome. MIT biologists have now discovered a new layer of regulation that helps to determine which sites on the messenger RNA molecule the spliceosome will target.

The research team discovered that this type of regulation, which appears to influence the expression of about half of all human genes, is found throughout the animal kingdom, as well as in plants. The findings suggest that the control of RNA splicing, a process that is fundamental to gene expression, is more complex than previously known.

“Splicing in more complex organisms, like humans, is more complicated than it is in some model organisms like yeast, even though it’s a very conserved molecular process. There are bells and whistles on the human spliceosome that allow it to process specific introns more efficiently. One of the advantages of a system like this may be that it allows more complex types of gene regulation,” says Connor Kenny, an MIT graduate student and the lead author of the study.

Christopher Burge, the Uncas and Helen Whitaker Professor of Biology at MIT, is the senior author of the study, which appears today in Nature Communications.

Building proteins

RNA splicing, a process discovered in the late 1970s, allows cells to precisely control the content of the mRNA transcripts that carry the instructions for building proteins.

Each mRNA transcript contains coding regions, known as exons, and noncoding regions, known as introns. They also include sites that act as signals for where splicing should occur, allowing the cell to assemble the correct sequence for a desired protein. This process enables a single gene to produce multiple proteins; over evolutionary timescales, splicing can also change the size and content of genes and proteins, when different exons become included or excluded.

The spliceosome, which forms on introns, is composed of proteins and noncoding RNAs called small nuclear RNAs (snRNAs). In the first step of spliceosome assembly, an snRNA molecule known as U1 snRNA binds to the 5’ splice site at the beginning of the intron. Until now, it had been thought that the binding strength between the 5’ splice site and the U1 snRNA was the most important determinant of whether an intron would be spliced out of the mRNA transcript.

In the new study, the MIT team discovered that a family of proteins called LUC7 also helps to determine whether splicing will occur, but only for a subset of introns — in human cells, up to 50 percent.

Before this study, it was known that LUC7 proteins associate with U1 snRNA, but the exact function wasn’t clear. There are three different LUC7 proteins in human cells, and Kenny’s experiments revealed that two of these proteins interact specifically with one type of 5’ splice site, which the researchers called “right-handed.” A third human LUC7 protein interacts with a different type, which the researchers call “left-handed.”

The researchers found that about half of human introns contain a right- or left-handed site, while the other half do not appear to be controlled by interaction with LUC7 proteins. This type of control appears to add another layer of regulation that helps remove specific introns more efficiently, the researchers say.

“The paper shows that these two different 5’ splice site subclasses exist and can be regulated independently of one another,” Kenny says. “Some of these core splicing processes are actually more complex than we previously appreciated, which warrants more careful examination of what we believe to be true about these highly conserved molecular processes.”

“Complex splicing machinery”

Previous work has shown that mutation or deletion of one of the LUC7 proteins that bind to right-handed splice sites is linked to blood cancers, including about 10 percent of acute myeloid leukemias (AMLs). In this study, the researchers found that AMLs that lost a copy of the LUC7L2 gene have inefficient splicing of right-handed splice sites. These cancers also developed the same type of altered metabolism seen in earlier work.

“Understanding how the loss of this LUC7 protein in some AMLs alters splicing could help in the design of therapies that exploit these splicing differences to treat AML,” Burge says. “There are also small molecule drugs for other diseases such as spinal muscular atrophy that stabilize the interaction between U1 snRNA and specific 5’ splice sites. So the knowledge that particular LUC7 proteins influence these interactions at specific splice sites could aid in improving the specificity of this class of small molecules.”

Working with a lab led by Sascha Laubinger, a professor at Martin Luther University Halle-Wittenberg, the researchers found that introns in plants also have right- and left-handed 5’ splice sites that are regulated by Luc7 proteins.

The researchers’ analysis suggests that this type of splicing arose in a common ancestor of plants, animals, and fungi, but it was lost from fungi soon after they diverged from plants and animals.

“A lot what we know about how splicing works and what are the core components actually comes from relatively old yeast genetics work,” Kenny says. “What we see is that humans and plants tend to have more complex splicing machinery, with additional components that can regulate different introns independently.”

The researchers now plan to further analyze the structures formed by the interactions of Luc7 proteins with mRNA and the rest of the spliceosome, which could help them figure out in more detail how different forms of Luc7 bind to different 5’ splice sites.

The research was funded by the U.S. National Institutes of Health and the German Research Foundation.