Computational Biology Archives - MIT Department of Biology

3 Questions with new faculty member Matthew G. Jones: Building predictive models to characterize tumor progression

The assistant professor hopes to decode molecular processes on the genetic, epigenetic, and microenvironment levels to anticipate how and when tumors evolve to resist treatment.

Lillian Eden | Department of Biology

February 20, 2026

Just as Darwin’s finches evolved in response to natural selection in order to endure, the cells that make up a cancerous tumor similarly counter selective pressures in order to survive, evolve, and spread. Tumors are, in fact, complex sets of cells with their own unique structure and ability to change.

Today, artificial Intelligence and machine learning tools offer an unparalleled opportunity to illuminate the generalizable rules governing tumor progression on the genetic, epigenetic, metabolic, and microenvironmental levels.

Matthew G. Jones, an Assistant Professor in the Department of Biology at MIT, the Koch Institute for Integrative Cancer Research, and the Institute for Medical Engineering and Science, hopes to use computational approaches to build predictive models — to play a game of chess with cancer, making sense of a tumor’s ability to evolve and resist treatment with the ultimate goal of improving patient outcomes.

Q: What aspect of tumor progression are you hoping to explore and characterize?

A: A very common story with cancer is that patients will respond to a therapy at first, and then eventually that treatment will stop working. The reason this largely happens is that tumors have an incredible, and very challenging, ability to evolve: the ability to change their genetic makeup, protein signaling composition, and cellular dynamics. The tumor as a system also evolves at a structural level. Oftentimes, the reason why a patient succumbs to a tumor is because either the tumor has evolved to a state we can no longer control, or it evolves in an unpredictable manner.

In many ways, cancers can be thought of as, on the one hand, incredibly dysregulated and disorganized, and on the other hand, as having their own internal logic, which is constantly changing. The central thesis of my lab is that tumors follow stereotypical patterns in space and time, and we’re hoping to use computation and experimental technology to decode the molecular processes underlying these transformations.

We’re focused on one specific way tumors are evolving through a form of DNA amplification called extrachromosomal DNA. Excised from the chromosome, these ecDNAs are circularized and exist as their own separate pool of DNA particles in the nucleus.

Initially discovered in the 1960s, ecDNA were thought to be a rare event in cancer. However, as researchers began applying next-generation sequencing to large patient cohorts in the 2010s, it seemed like not only were these ecDNA amplifications conferring the ability of tumors to adapt to stresses, and therapies, faster, but that they were far more prevalent than initially thought.

We now know these ecDNA amplifications are apparent in about 25% of cancers, in the most aggressive cancers: brain, lung, and ovarian cancers. We have found that, for a variety of reasons, ecDNA amplifications are able to change the rule book by which tumors evolve in ways that allow them to accelerate to a more aggressive disease in very surprising ways.

Q: How are you planning to use machine learning and artificial intelligence to study ecDNA amplifications and tumor evolution?

A: There’s a mandate to translate what I’m doing in the lab to improve patients’ lives. I want to start with patient data to discover how various evolutionary pressures are driving disease and the mutations we observe.

One of the tools we use to study tumor evolution is single-cell lineage tracing technologies. Broadly, they allow us to study the lineages of individual cells. When we sample a particular cell, not only do we know what that cell looks like, but we can, ideally, pinpoint exactly when aggressive mutations appeared in the tumor’s history. That evolutionary history gives us a way of studying these dynamic processes that we otherwise wouldn’t be able to observe in real time and helps us make sense of how we might be able to intercept that evolution.

I hope we’re going to get better at stratifying patients who will respond to certain drugs, to anticipate and overcome drug resistance, and to identify new therapeutic targets.

Q: What excites you about joining this community, and what sorts of trainees are you hoping to recruit to your lab?

A: One of the things that I was really attracted to was the integration of excellence in both engineering and biological sciences. At the Koch Institute, every floor is structured to promote this interface between engineers and basic scientists, and beyond campus, we can connect with all the biomedical research enterprises in the Greater Boston Area.

Another thing that drew me to MIT was the fact that it places such a strong emphasis on education, training, and investing in student success. I’m a personal believer that what distinguishes academic research from industry research is that academic research is fundamentally a service job, in that we are training the next generation of scientists.

It was always a mission of mine to bring excellence to both computational and experimental technology disciplines. The types of trainees I’m hoping to recruit are those who are eager to collaborate and solve big problems that require both disciplines. The KI is uniquely set up for this type of hybrid lab: my dry lab is right next to my wet lab, and it’s a source of collaboration and connection, and that reflects the KI’s general vision.

New insights into a hidden process that protects cells from harmful mutations

To make up for the loss of an important gene's function, cells are known to ramp up activity of other genes with similar functions. New research from the Weissman Lab reveals insights into how cells coordinate this response.

Shafaq Zia | Whitehead Institute

February 12, 2026

Some genetic mutations that are expected to completely stop a gene from working surprisingly cause only mild or even no symptoms. Researchers in previous studies have discovered one reason why: cells can ramp up the activity of other genes that perform similar functions to make up for the loss of an important gene’s function. A new study, published Feb. 12 in the journal Science, from the lab of Whitehead Institute Member Jonathan Weissman now reveals insights into how cells can coordinate this compensation response.

Cells are constantly reading instructions stored in DNA. These instructions, called genes, tell them how to make the many proteins that carry out complex processes needed to sustain life. But first, they need to make a temporary copy of these genetic instructions called messenger RNA, or mRNA.

As part of normal maintenance, cells routinely break down these temporary messages. This process helps control gene activity — or how much protein is made from a given gene — and ensures that old or unnecessary messages don’t accumulate. Cells also destroy faulty mRNAs that contain errors. These messages, if used, could produce damaged proteins that clump together and interfere with normal cellular processes.

In 2019, external studies suggested that this cleanup could be serving as more than just a quality-control check. The researchers showed that when faulty mRNAs are broken down, this breakdown can signal cells to activate the compensation response. These works also suggested that cells decide which backup genes to turn up based on how closely these genes resemble the mRNA that’s being degraded.

But mRNA decay is a process that happens in the cytoplasm, outside the nucleus where DNA, and thereby genes, are stored. So, Mohamed El-Brolosy, a postdoc in the Weissman Lab and lead author of the study, and colleagues wondered how those two processes in different compartments of the cell could be connected. Understanding this mechanism with greater depth could enable development of therapeutics that trigger it in a targeted fashion.

The researchers started by investigating a specific gene that scientists know triggers a compensation response when its mRNA is destroyed by causing a closely related gene to become more active. To find out which molecules within the cell aid this process, the researchers systematically switched other genes off, one at a time.

That’s when they found a protein called ILF3. When the gene encoding this protein was turned off, cells could no longer ramp up the activity of the backup gene following mRNA decay.

Upon further investigation, the researchers identified small RNA fragments — left behind when faulty mRNAs are destroyed — underlying this response. These fragments contain a special sequence that acts like an “address”. The team proposed that this address guides ILF3 to related backup genes that share the same sequence as the faulty mRNA.

In fact, when they introduced mutations in this sequence, the cells’ compensation response dropped, suggesting that the system relies on precise sequence matching to target the correct backup genes.

“That was very exciting for us,” says Weissman, who is also a professor of biology at Massachusetts Institute of Technology and an investigator at the Howard Hughes Medical Institute (HHMI). “It showed us that this isn’t a generic stress response. It’s a regulated system.”

The researchers’ findings point toward new therapeutic possibilities, where boosting the activity of a related gene could mitigate symptoms of certain genetic diseases. More broadly, their work characterizes a mysterious layer of gene regulation.

El-Brolosy, M. A., et al. (2026). Mechanisms linking cytoplasmic decay of translation-defective mRNA to transcriptional adaptation. Science, 391, eaea1272. https://doi.org/10.1126/science.aea1272

3 Questions with new faculty member Yunha Hwang: Using computation to study the world’s best single-celled chemists

The assistant professor utilizes microbial genomes to examine the language of biology. Her appointment reflects MIT’s commitment to exploring the intersection of genetics research and AI.

Lillian Eden | Department of Biology

December 15, 2025

Today, out of an estimated 1 trillion species on Earth, 99.999 percent are considered microbial — bacteria, archaea, viruses, and single-celled eukaryotes. For much of our planet’s history, microbes ruled the Earth, able to live and thrive in the most extreme of environments. Researchers have only just begun in the last few decades to contend with the diversity of microbes — it’s estimated that less than 1 percent of known genes have laboratory-validated functions. Computational approaches offer researchers the opportunity to strategically parse this truly astounding amount of information.

An environmental microbiologist and computer scientist by training, new MIT faculty member Yunha Hwang is interested in the novel biology revealed by the most diverse and prolific life form on Earth. In a shared faculty position as the Samuel A. Goldblith Career Development Professor in the Department of Biology, as well as an assistant professor at the Department of Electrical Engineering and Computer Science and the MIT Schwarzman College of Computing, Hwang is exploring the intersection of computation and biology.

Q: What drew you to research microbes in extreme environments, and what are the challenges in studying them?

A: Extreme environments are great places to look for interesting biology. I wanted to be an astronaut growing up, and the closest thing to astrobiology is examining extreme environments on Earth. And the only thing that lives in those extreme environments are microbes. During a sampling expedition that I took part in off the coast of Mexico, we discovered a colorful microbial mat about 2 kilometers underwater that flourished because the bacteria breathed sulfur instead of oxygen — but none of the microbes I was hoping to study would grow in the lab.

The biggest challenge in studying microbes is that a majority of them cannot be cultivated, which means that the only way to study their biology is through a method called metagenomics. My latest work is genomic language modeling. We’re hoping to develop a computational system so we can probe the organism as much as possible “in silico,” just using sequence data. A genomic language model is technically a large language model, except the language is DNA as opposed to human language. It’s trained in a similar way, just in biological language as opposed to English or French. If our objective is to learn the language of biology, we should leverage the diversity of microbial genomes. Even though we have a lot of data, and even as more samples become available, we’ve just scratched the surface of microbial diversity.

Q: Given how diverse microbes are and how little we understand about them, how can studying microbes in silico, using genomic language modeling, advance our understanding of the microbial genome?

A: A genome is many millions of letters. A human cannot possibly look at that and make sense of it. We can program a machine, though, to segment data into pieces that are useful. That’s sort of how bioinformatics works with a single genome. But if you’re looking at a gram of soil, which can contain thousands of unique genomes, that’s just too much data to work with — a human and a computer together are necessary in order to grapple with that data.

During my PhD and master’s degree, we were only just discovering new genomes and new lineages that were so different from anything that had been characterized or grown in the lab. These were things that we just called “microbial dark matter.” When there are a lot of uncharacterized things, that’s where machine learning can be really useful, because we’re just looking for patterns — but that’s not the end goal. What we hope to do is to map these patterns to evolutionary relationships between each genome, each microbe, and each instance of life.

Previously, we’ve been thinking about proteins as a standalone entity — that gets us to a decent degree of information because proteins are related by homology, and therefore things that are evolutionarily related might have a similar function.

What is known about microbiology is that proteins are encoded into genomes, and the context in which that protein is bounded — what regions come before and after — is evolutionarily conserved, especially if there is a functional coupling. This makes total sense because when you have three proteins that need to be expressed together because they form a unit, then you might want them located right next to each other.

What I want to do is incorporate more of that genomic context in the way that we search for and annotate proteins and understand protein function, so that we can go beyond sequence or structural similarity to add contextual information to how we understand proteins and hypothesize about their functions.

Q: How can your research be applied to harnessing the functional potential of microbes?

A: Microbes are possibly the world’s best chemists. Leveraging microbial metabolism and biochemistry will lead to more sustainable and more efficient methods for producing new materials, new therapeutics, and new types of polymers.

But it’s not just about efficiency — microbes are doing chemistry we don’t even know how to think about. Understanding how microbes work, and being able to understand their genomic makeup and their functional capacity, will also be really important as we think about how our world and climate are changing. A majority of carbon sequestration and nutrient cycling is undertaken by microbes; if we don’t understand how a given microbe is able to fix nitrogen or carbon, then we will face difficulties in modeling the nutrient fluxes of the Earth.

On the more therapeutic side, infectious diseases are a real and growing threat. Understanding how microbes behave in diverse environments relative to the rest of our microbiome is really important as we think about the future and combating microbial pathogens.

A new way to understand and predict gene splicing

The KATMAP model, developed by researchers in the Department of Biology, can predict alternative cell splicing, which allows cells to create endless diversity from the same sets of genetic blueprints.

Lillian Eden | Department of Biology

November 4, 2025

Although heart cells and skin cells contain identical instructions for creating proteins encoded in their DNA, they’re able to fill such disparate niches because molecular machinery can cut out and stitch together different segments of those instructions to create endlessly unique combinations.

The ingenuity of using the same genes in different ways is made possible by a process called splicing and is controlled by splicing factors; which splicing factors a cell employs determines what sets of instructions that cell produces, which, in turn, gives rise to proteins that allow cells to fulfill different functions.

In an open-access paper published today in Nature Biotechnology, researchers in the MIT Department of Biology outlined a framework for parsing the complex relationship between sequences and splicing regulation to investigate the regulatory activities of splicing factors, creating models that can be applied to interpret and predict splicing regulation across different cell types, and even different species. Called Knockdown Activity and Target Models from Additive regression Predictions, KATMAP draws on experimental data from disrupting the expression of a splicing factor and information on which sequences the splicing factor interacts with to predict its likely targets.

Aside from the benefits of a better understanding of gene regulation, splicing mutations — either in the gene that is spliced or in the splicing factor itself — can give rise to diseases such as cancer by altering how genes are expressed, leading to the creation or accumulation of faulty or mutated proteins. This information is critical for developing therapeutic treatments for those diseases. The researchers also demonstrated that KATMAP can potentially be used to predict whether synthetic nucleic acids, a promising treatment option for disorders including a subset of muscular atrophy and epilepsy disorders, affect splicing.

Perturbing splicing

In eukaryotic cells, including our own, splicing occurs after DNA is transcribed to produce an RNA copy of a gene, which contains both coding and non-coding regions of RNA. The noncoding intron regions are removed, and the coding exon segments are spliced back together to make a near-final blueprint, which can then be translated into a protein.

According to first author Michael P. McGurk, a postdoc in the lab of MIT Professor Christopher Burge, previous approaches could provide an average picture of regulation, but could not necessarily predict the regulation of splicing factors at particular exons in particular genes.

KATMAP draws on RNA sequencing data generated from perturbation experiments, which alter the expression level of a regulatory factor by either overexpressing it or knocking down its levels. The consequences of overexpression or knockdown are that the genes regulated by the splicing factor should exhibit different levels of splicing after perturbation, which helps the model identify the splicing factor’s targets.

Cells, however, are complex, interconnected systems, where one small change can cause a cascade of effects. KATMAP is also able to distinguish between direct targets from indirect, downstream impacts by incorporating known information about the sequence the splicing factor is likely to interact with, referred to as a binding site or binding motif.

“In our analyses, we identify predicted targets as exons that have binding sites for this particular factor in the regions where this model thinks they need to be to impact regulation,” McGurk says, while non-targets may be affected by perturbation but don’t have the likely appropriate binding sites nearby.

This is especially helpful for splicing factors that aren’t as well-studied.

“One of our goals with KATMAP was to try to make the model general enough that it can learn what it needs to assume for particular factors, like how similar the binding site has to be to the known motif or how regulatory activity changes with the distance of the binding sites from the splice sites,” McGurk says.

Starting simple

Although predictive models can be very powerful at presenting possible hypotheses, many are considered “black boxes,” meaning the rationale that gives rise to their conclusions is unclear. KATMAP, on the other hand, is an interpretable model that enables researchers to quickly generate hypotheses and interpret splicing patterns in terms of regulatory factors while also understanding how the predictions were made.

“I don’t just want to predict things, I want to explain and understand,” McGurk says. “We set up the model to learn from existing information about splicing and binding, which gives us biologically interpretable parameters.”

The researchers did have to make some simplifying assumptions in order to develop the model. KATMAP considers only one splicing factor at a time, although it is possible for splicing factors to work in concert with one another. The RNA target sequence could also be folded in such a way that the factor wouldn’t be able to access a predicted binding site, so the site is present but not utilized.

“When you try to build up complete pictures of complex phenomena, it’s usually best to start simple,” McGurk says. “A model that only considers one splicing factor at a time is a good starting point.”

David McWaters, another postdoc in the Burge Lab and a co-author on the paper, conducted key experiments to test and validate that aspect of the KATMAP model.

Future directions

The Burge lab is collaborating with researchers at Dana-Farber Cancer Institute to apply KATMAP to the question of how splicing factors are altered in disease contexts, as well as with other researchers at MIT as part of an MIT HEALS grant to model splicing factor changes in stress responses. McGurk also hopes to extend the model to incorporate cooperative regulation for splicing factors that work together.

“We’re still in a very exploratory phase, but I would like to be able to apply these models to try to understand splicing regulation in disease or development. In terms of variation of splicing factors, they are related, and we need to understand both,” McGurk says.

Burge, the Uncas (1923) and Helen Whitaker Professor and senior author of the paper, will continue to work on generalizing this approach to build interpretable models for other aspects of gene regulation.

“We now have a tool that can learn the pattern of activity of a splicing factor from types of data that can be readily generated for any factor of interest,” says Burge, who is also an extra-mural member of the Koch Institute for Integrative Cancer Research and an associate member of the Broad Institute of MIT and Harvard. “As we build up more of these models, we’ll be better able to infer which splicing factors have altered activity in a disease state from transcriptomic data, to help understand which splicing factors are driving pathology.”

Education

Graduate: University of California, San Francisco, 2022
Undergraduate: Computer Science; University of California, Berkeley, 2017

Research Summary

From the moment that a tumor is born, it is evolving across several levels, including at the genetic, epigenetic, metabolic, and microenvironmental levels. The central goal of the Jones Lab is to develop innovative computational and technological approaches to uncover the mechanisms of tumor evolution, with the ultimate aim of identifying new therapeutic targets and creating predictive models to monitor tumor initiation and progression.

Currently, the lab’s research focuses on three interrelated goals: (1) investigating the molecular mechanisms underlying the spatiotemporal dynamics of copy-number alterations (particularly extrachromosomal DNA) in cancer populations; (2) developing new computational methods to trace cellular lineages; and (3) elucidating the principles by which tumors are organized over time. To pursue these aims, the lab integrates advances in computation and AI with cutting-edge multi-omic approaches (including single-cell, spatial, and long-read technologies), lineage tracing, and high-resolution imaging. Broadly, they expect that their studies will reveal generalizable rules governing tumor progression and treatment resistance, enable the predictive modeling of tumors, and inspire new approaches to intercept tumor progression.

Awards

Keynote Speaker at Cancer Genetics and Epigenetics Gordon Research Seminar, 2025
Cancer Grand Challenges Future Leaders Conference Best Talk Awardee, 2024
NCI K99/R00 Early-Career Pathway to Independence Award, 2024
UCSF Discovery Fellow, 2019

Locally produced proteins help mitochondria function

One of the ways that cells ensure proteins end up where they're needed is creating them at that location, through a process called localized translation. New research from the Weissman Lab has expanded our understanding localized translation at mitochondria and sheds light on the organizational principles of genes and the proteins they encode.

Greta Friar | Whitehead Institute

August 27, 2025

Now, Weissman, who is also a professor of biology at the Massachusetts Institute of Technology and an HHMI Investigator, and postdoc in his lab Jingchuan Luo have expanded our knowledge of localized translation at mitochondria, structures that generate energy for the cell. In a paper published in Cell on August 27, they share a new tool, LOCL-TL, for studying localized translation in close detail, and describe the discoveries it enabled about two classes of proteins that are locally translated at mitochondria.

The importance of localized translation at mitochondria relates to their unusual origin. Mitochondria were once bacteria that lived within our ancestors’ cells. Over time the bacteria lost their autonomy and became part of the larger cells, which included migrating most of their genes into the larger cell’s genome in the nucleus. Cells evolved processes to ensure that proteins needed by mitochondria that are encoded in genes in the larger cell’s genome get transported to the mitochondria. Mitochondria retain a few genes in their own genome, so production of proteins from the mitochondrial genome and that of the larger cell’s genome must be coordinated to avoid mismatched production of mitochondrial parts. Localized translation may help cells to manage the interplay between mitochondrial and nuclear protein production—among other purposes.

How to detect local protein production

For a protein to be made, genetic code stored in DNA is read into RNA, and then the RNA is read or translated by a ribosome, a cellular machine that builds a protein according to the RNA code. Weissman’s lab previously developed a method to study localized translation by tagging ribosomes near a structure of interest, and then capturing the tagged ribosomes in action and observing the proteins they are making. This approach, called proximity-specific ribosome profiling, allows researchers to see what proteins are being made where in the cell. The challenge that Luo faced was how to tweak this method to capture only ribosomes at work near mitochondria.

Ribosomes work quickly, so a ribosome that gets tagged while making a protein at the mitochondria can move on to making other proteins elsewhere in the cell in a matter of minutes. The only way researchers can guarantee that the ribosomes they capture are still working on proteins made near the mitochondria is if the experiment happens very quickly.

Weissman and colleagues had previously solved this time sensitivity problem in yeast cells with a ribosome-tagging tool called BirA that is activated by the presence of the molecule biotin. BirA is fused to the cellular structure of interest, and tags ribosomes it can touch—but only once activated. Researchers keep the cell depleted of biotin until they are ready to capture the ribosomes, to limit the time when tagging occurs. However, this approach does not work with mitochondria in mammalian cells because they need biotin to function normally, so it cannot be depleted.

Luo and Weissman adapted the existing tool to respond to blue light instead of biotin. The new tool, LOV-BirA, is fused to the mitochondria’s outer membrane. Cells are kept in the dark until the researchers are ready. Then they expose the cells to blue light, activating LOV-BirA to tag ribosomes. They give it a few minutes and then quickly extract the ribosomes. This approach proved very accurate at capturing only ribosomes working at mitochondria.

The researchers then used a method originally developed by the Weissman lab to extract the sections of RNA inside of the ribosomes. This allows them to see exactly how far along in the process of making a protein the ribosome is when captured, which can reveal whether the entire protein is made at the mitochondria, or whether it is partly produced elsewhere and only gets completed at the mitochondria.

“One advantage of our tool is the granularity it provides,” Luo says. “Being able to see what section of the protein is locally translated helps us understand more about how localized translation is regulated, which can then allow us to understand its dysregulation in disease and to control localized translation in future studies.”

Two protein groups are made at mitochondria

Using these approaches, the researchers found that about twenty percent of the genes needed in mitochondria that are located in the main cellular genome are locally translated at mitochondria. These proteins can be divided into two distinct groups with different evolutionary histories and mechanisms for localized translation.

One group consists of relatively long proteins, each containing more than 400 amino acids or protein building blocks. These proteins tend to be of bacterial origin—present in the ancestor of mitochondria—and they are locally translated in both mammalian and yeast cells, suggesting that their localized translation has been maintained through a long evolutionary history.

Like many mitochondrial proteins encoded in the nucleus, these proteins contain a mitochondrial targeting sequence (MTS), a zip code that tells the cell where to bring them. The researchers discovered that most proteins containing an MTS also contain a nearby inhibitory sequence that prevents transportation until they are done being made. This group of locally translated proteins lacks the inhibitory sequence, so they are brought to the mitochondria during their production.

Production of these longer proteins begins anywhere in the cell, and then after approximately the first 250 amino acids are made, they get transported to the mitochondria. While the rest of the protein gets made, it is simultaneously fed into a channel that brings it inside the mitochondria. This ties up the channel for a long time, limiting import of other proteins, so cells can only afford to do this simultaneous production and import for select proteins. The researchers hypothesize that these bacterial-origin proteins are given priority as an ancient mechanism to ensure that they are accurately produced and placed within mitochondria.

The second locally translated group consists of short proteins, each less than 200 amino acids long. These proteins are more recently evolved, and correspondingly, the researchers found that the mechanism for their localized translation is not shared by yeast. Their mitochondrial recruitment happens at the RNA level. Two sequences within regulatory sections of each RNA molecule that do not encode the final protein instead code for the cell’s machinery to recruit the RNAs to the mitochondria.

The researchers searched for molecules that might be involved in this recruitment, and identified the RNA binding protein AKAP1, which exists at mitochondria. When they eliminated AKAP1, the short proteins were translated indiscriminately around the cell. This provided an opportunity to learn more about the effects of localized translation, by seeing what happens in its absence. When the short proteins were not locally translated, this led to the loss of various mitochondrial proteins, including those involved in oxidative phosphorylation, our cells’ main energy generation pathway.

In future research, Weissman and Luo will delve deeper into how localized translation affects mitochondrial function and dysfunction in disease. The researchers also intend to use LOCL-TL to study localized translation in other cellular processes, including in relation to embryonic development, neural plasticity, and disease.

“This approach should be broadly applicable to different cellular structures and cell types, providing many opportunities to understand how localized translation contributes to biological processes,” Weissman says. “We’re particularly interested in what we can learn about the roles it may play in diseases including neurodegeneration, cardiovascular diseases, and cancers.”

Luo et al. “Proximity-specific ribosome profiling reveals the logic of localized mitochondrial translation.” Cell, August 27, 2025. https://doi.org/10.1016/j.cell.2025.08.002

Mapping cells in time and space: a new tool reveals a detailed history of tumor growth

Weissman and colleagues have developed an advanced lineage tracing tool that not only captures an accurate family tree of cell divisions, but also combines that with spatial information: identifying where each cell ends up within a tissue.

Greta Friar | Whitehead Institute

July 24, 2025

All life is connected in a vast family tree. Every organism exists in relationship to its ancestors, descendants, and cousins, and the path between any two individuals can be traced. The same is true of cells within organisms—each of the trillions of cells in the human body is produced through successive divisions from a fertilized egg, and can all be related to one another through a cellular family tree. In simpler organisms such as the worm C. elegans, this cellular family tree has been fully mapped, but the cellular family tree of a human is many times larger and more complex.

In the past, Whitehead Institute Member Jonathan Weissman and other researchers have developed lineage tracing methods to track and reconstruct the family trees of cell divisions in model organisms in order to understand more about the relationships between cells and how they assemble into tissues, organs, and—in some cases—tumors. These methods could help to answer many questions about how organisms develop and diseases like cancer are initiated and progress.

Now, Weissman and colleagues have developed an advanced lineage tracing tool that not only captures an accurate family tree of cell divisions, but also combines that with spatial information: identifying where each cell ends up within a tissue. The researchers used their tool, PEtracer, to observe the growth of metastatic tumors in mice. Combining lineage tracing and spatial data provided the researchers with a detailed view of how elements intrinsic to the cancer cells and from their environments influenced tumor growth, as Weissman and postdocs in his lab Luke Koblan, Kathryn Yost, and Pu Zheng, and graduate student William Colgan share in a paper published in the journal Science on July 24.

“Developing this tool required combining diverse skillsets through the sort of ambitious interdisciplinary collaboration that’s only possible at a place like Whitehead Institute,” says Weissman, who is also a professor of biology at the Massachusetts Institute of Technology and an HHMI Investigator. “Luke came in with an expertise in genetic engineering, Pu in imaging, Katie in cancer biology, and William in computation but the real key to their success was their ability to work together to build PEtracer.”

“Understanding how cells move in time and space is an important way to look at biology, and here we were able to see both of those things in high resolution. The idea is that by understanding both a cell’s past and where it ends up, you can see how different factors throughout its life influenced its behaviors. In this study we use these approaches to look at tumor growth, though in principle we can now begin to apply these tools to study other biology of interest like embryonic development,” Koblan says.

Designing a tool to track cells in space and time

PEtracer tracks cells’ lineages by repeatedly adding short, predetermined codes to the DNA of cells over time. Each piece of code, called a lineage tracing mark, is made up of 5 bases, the building blocks of DNA. These marks are inserted using a gene editing technology called prime editing, which directly rewrites stretches of DNA with minimal undesired byproducts. Over time, each cell acquires more lineage tracing marks, while also maintaining the marks of its ancestors. The researchers can then compare cells’ combinations of marks to figure out relationships and reconstruct the family tree.

“We used computational modeling to design the tool from first principles, to make sure that it was highly accurate, and compatible with imaging technology. We ran many simulations to land on the optimal parameters for a new lineage tracing tool, and then engineered our system to fit those parameters,” Colgan says.

When the tissue—in this case, a tumor growing in the lung of a mouse—had sufficiently grown, the researchers collected these tissues and used advanced imaging approaches to look at each cell’s lineage relationship to other cells via the lineage tracing marks, along with its spatial position within the imaged tissue and its identity (as determined by the levels of different RNAs expressed in each cell). PEtracer is compatible with both imaging approaches and sequencing methods that capture genetic information from single cells.

“Making it possible to collect and analyze all of this data from the imaging was a large challenge,” Zheng says. “What’s particularly exciting to me is not just that we were able to collect terabytes of data, but that we designed the project to collect data that we knew we could use to answer important questions and drive biological discovery.”

Reconstructing the history of a tumor

Combining the lineage tracing, gene expression, and spatial data let the researchers understand how the tumor grew. They could tell how closely related neighboring cells are and compare their traits. Using this approach, the researchers found that the tumors they were analyzing were made up of four distinct modules, or neighborhoods, of cells.

The tumor cells closest to the lung, the most nutrient-dense region, were the most fit, meaning their lineage history indicated the highest rate of cell division over time. Fitness in cancer cells tends to correlate to how aggressively tumors will grow.

The cells at the “leading edge” of the tumor, the far side from the lung, were more diverse and not as fit. Below the leading edge was a low-oxygen neighborhood of cells that might once have been leading edge cells, now trapped in a less desirable spot. Between these cells and the lung-adjacent cells was the tumor core, a region with both living and dead cells as well as cellular debris.

The researchers found that cancer cells across the family tree were equally likely to end up in most of the regions, with the exception of the lung adjacent region, where a few branches of the family tree dominated. This suggests that the cancer cells’ differing traits were heavily influenced by their environments, or the conditions in their local neighborhoods, rather than their family history. Further evidence of this point was that expression of certain fitness-related genes, such as Fgf1/Fgfbp1, correlated to a cell’s location rather than its ancestry. However, lung adjacent cells also had inherited traits that gave them an edge, including expression of the fitness-related gene Cldn4—showing that family history influenced outcomes as well.

These findings demonstrate how cancer growth is influenced both by factors intrinsic to certain lineages of cancer cells and by environmental factors that shape the behavior of cancer cells exposed to them.

“By looking at so many dimensions of the tumor in concert, we could gain insights that would not have been possible with a more limited view,” Yost says. “Being able to characterize different populations of cells within a tumor will enable researchers to develop therapies that target the most aggressive populations more effectively.”

“Now that we’ve done the hard work of designing the tool, we’re excited to apply it to look at all sorts of questions in health and disease, in embryonic development, and across other model species, with an eye toward understanding important problems in human health,” Koblan says. “The data we collect will also be useful for training AI models of cellular behavior. We’re excited to share this technology with other researchers and see what we all can discover.”

Luke W. Koblan, Kathryn E. Yost, Pu Zheng, William N. Colgan, Matthew G. Jones, Dian Yang, Arhan Kumar, Jaspreet Sandhu, Alexandra Schnell, Dawei Sun, Can Ergen, Reuben A. Saunders, Xiaowei Zhuang, William E. Allen, Nir Yosef, Jonathan S. Weissman. “High-resolution spatial mapping of cell state and lineage dynamics in vivo with PEtracer.” Science, online July 24, 2025. https://doi.org/10.1126/science.adx3800

Education

PhD, 2024, Evolutionary and Organismic Biology, Harvard University
MS, 2018, Earth Systems, Stanford University
B.Sc, 2018, Computer Science, Stanford University

Research Summary

Microbial genomes encode the largest molecular, biochemical, and functional diversity on Earth. We focus on developing machine learning models and experimental approaches to discover and design novel biological functions. We integrate computation with expertise in evolution, ecology, and biochemistry to characterize and harness the functional potential of microbes.

Putting liver cells in context: new method combines imaging and sequencing to study gene function in living tissue

Researchers in the Weissman Lab have developed a powerful approach that simultaneously measures how genetic changes such as turning off individual genes affect both gene expression and cell structure in intact liver tissue, with the goal of discovering how genes control organ function and disease.

Whitehead Institute

June 12, 2025

However, capturing both the “visuals and sound” of biological data, such as gene expression and cell structure data, from the same cells requires researchers to develop new approaches. They also have to make sure that the data they capture accurately reflects what happens in living organisms, including how cells interact with each other and their environments.

Whitehead Institute and Harvard University researchers have taken on these challenges and developed Perturb-Multimodal (Perturb-Multi), a powerful new approach that simultaneously measures how genetic changes such as turning off individual genes affect both gene expression and cell structure in intact liver tissue. The method, described in Cell on June 12, aims to accelerate discovery of how genes control organ function and disease.

The research team, led by Whitehead Institute Member Jonathan Weissman and then-graduate student in his lab Reuben Saunders, along with Xiaowei Zhuang, the David B. Arnold Professor of Science at Harvard University, and then-postdoc in her lab Will Allen, created a system that can test hundreds of different genetic modifications within a single mouse liver while capturing multiple types of data from the same cells.

“Understanding how our organs work requires looking at many different aspects of cell biology at once,” Saunders says. “With Perturb-Multi, we can see how turning off specific genes changes not just what other genes are active, but also how proteins are distributed within cells, how cellular structures are organized, and where cells are located in the tissue. It’s like having multiple specialized microscopes all focused on the same experiment.”

“This approach accelerates discovery by both allowing us to test the functions of many different genes at once, and then for each gene, allowing us to measure many different functional outputs or cell properties at once—and we do that in intact tissue from animals,” says Zhuang, who is also an HHMI Investigator.

A more efficient approach to genetic studies

Traditional genetic studies in mice often turn off one gene in an animal, and then observe what changes in that gene’s absence to learn about what the gene does. The researchers designed their approach to turn off hundreds of different genes across a single liver, while still only turning off one gene per cell—using what is known as a mosaic approach. This allowed them to study the roles of hundreds of individual genes at once in a single animal. The researchers then collected diverse types of data from cells across the same liver to get a full picture of the consequences of turning off the genes.

“Each cell serves as its own experiment, and because all the cells are in the same animal, we eliminate the variability that comes from comparing different mice,” Saunders says. “Every cell experiences the same physiological conditions, diet, and environment, making our comparisons much more precise.”

“The challenge we faced was that tissues, to perform their functions, rely on thousands of genes, expressed in many different cells, working together. Each gene, in turn, can control many aspects of a cell’s function. Testing these hundreds of genes in mice using current methods would be extremely slow and expensive—near impossible in practice,” Allen says.

Revealing new biology through combined measurements

The team applied Perturb-Multi to study genetic controls of liver physiology and function. Their study led to discoveries in three important aspects of liver biology: fat accumulation in liver cells—a precursor to liver disease; stress responses; and hepatocyte zonation (how liver cells specialize, assuming different traits and functions, based on their location within the liver).

“Overcoming the inherent complexity of biology in living animals required developing new tools that bridge multiple disciplines – including, in this case, genomics, imaging, and AI,” Allen says.

One striking finding emerged from studying genes that, when disrupted, cause fat accumulation in liver cells. The imaging data revealed that four different genes all led to similar fat droplet accumulation, but the sequencing data showed they did so through three completely different mechanisms.

“Without combining imaging and sequencing, we would have missed this complexity entirely,” Saunders says. “The imaging told us which genes affect fat accumulation, while the sequencing revealed whether this was due to increased fat production, cellular stress, or other pathways. This kind of mechanistic insight could be crucial for developing targeted therapies for fatty liver disease.”

The researchers also discovered new regulators of liver cell zonation. Unexpectedly, the newly discovered regulators include genes involved in modifying the extracellular matrix—the scaffolding between cells. “We found that cells can change their specialized functions without physically moving to a different zone,” Saunders says. “This suggests that liver cell identity is more flexible than previously thought.”

Technical innovation enables new science

Developing Perturb-Multi required solving several technical challenges. The team created new methods for preserving the content of interest in cells—RNA and proteins—during tissue processing, for collecting many types of imaging data and single-cell gene expression data from tissue samples that have been fixed with a preservative, and for integrating multiple types of data from the same cells.

The two components of Perturb-Multi—the imaging and sequencing assays—together, applied to the same tissue, provide insights that are unattainable through either assay alone.

“Each component had to work perfectly while not interfering with the others,” says Weissman, who is also a professor of biology at the Massachusetts Institute of Technology and an HHMI Investigator. “The technical development took considerable effort, but the payoff is a system that can reveal biology we simply couldn’t see before.”

Expanding to new organs and other contexts

The researchers plan to expand Perturb-Multi to other organs, including the brain, and to study how genetic changes affect organ function under different conditions like disease states or dietary changes.

“Without combining imaging and sequencing, we would have missed this complexity entirely,” Saunders says.

“We’re also excited about using the data we generate to train machine learning models,” adds Saunders. “With enough examples of how genetic changes affect cells, we could eventually predict the effects of mutations without having to test them experimentally—a ‘virtual cell’ that could accelerate both research and drug development.”

“Perturbation data are critical for training such AI models and the paucity of existing perturbation data represents a major hindrance in such ‘virtual cell’ efforts,” Zhuang says. “We hope Perturb-Multi will fill this gap by accelerating the collection of perturbation data.”

The approach is designed to be scalable, with the potential for genome-wide studies that test thousands of genes simultaneously. As sequencing and imaging technologies continue to improve, the researchers anticipate that Perturb-Multi will become even more powerful and accessible to the broader research community.

“Our goal is to keep scaling up. We plan to do genome-wide perturbations, study different physiological conditions, and look at different organs,” says Weissman. “That we can now collect so many types of data from so many cells, at speed, is going to be critical for building AI models like virtual cells, and I think it’s going to help us answer previously unsolvable questions about health and disease.”

Notes

Reuben A. Saunders, William E. Allen, Xingjie Pan, Jaspreet Sandhu, Jiaqi Lu, Thomas K. Lau, Karina Smolyar, Zuri A. Sullivan, Catherine Dulac, Jonathan S. Weissman, Xiaowei Zhuang. “Perturb-Multimodal: a Platform for Pooled Genetic Screens with Sequencing and Imaging in Intact Mammalian Tissue.” Cell, June 12, 2025. DOI: 10.1016/j.cell.2025.05.022.

Taking the pulse of sex differences in the heart

Work led by Talukdar and Page Lab postdoc Lukáš Chmátal shows that there are differences in how healthy male and female heart cells—specifically, cardiomyocytes, the muscle cells responsible for making the heart beat—generate energy.

Greta Friar | Whitehead Institute

February 18, 2025

Heart disease is the number one killer of men and women, but it often presents differently depending on sex. There are sex differences in the incidence, outcomes, and age of onset of different types of heart problems. Some of these differences can be explained by social factors—for example, women experience less-well recognized symptoms when having heart attacks, and so may take longer to be diagnosed and treated—but others are likely influenced by underlying differences in biology. Whitehead Institute Member David Page and colleagues have now identified some of these underlying biological differences in healthy male and female hearts, which may contribute to the observed differences in disease.

“My sense is that clinicians tend to think that sex differences in heart disease are due to differences in behavior,” says Harvard-MIT MD-PhD student Maya Talukdar, a graduate student in Page’s lab. “Behavioral factors do contribute, but even when you control for them, you still see sex differences. This implies that there are more basic physiological differences driving them.”

Page, who is also an HHMI Investigator and a professor of biology at the Massachusetts Institute of Technology, and members of his lab study the underlying biology of sex differences in health and disease, and recently they have turned their attention to the heart. In a paper published on February 17 in the women’s health edition of the journal Circulation, work led by Talukdar and Page lab postdoc Lukáš Chmátal shows that there are differences in how healthy male and female heart cells—specifically, cardiomyocytes, the muscle cells responsible for making the heart beat—generate energy.

“The heart is a hard-working pump, and heart failure often involves an energy crisis in which the heart can’t summon enough energy to pump blood fast enough to meet the body’s needs,” says Page. “What is intriguing about our current findings and their relationship to heart disease is that we’ve discovered sex differences in the generation of energy in cardiomyocytes, and this likely sets up males and females differently for an encounter with heart failure.”

Page and colleagues began their work by looking for sex differences in healthy hearts because they hypothesize that these impact sex differences in heart disease. Differences in baseline biology in the healthy state often affect outcomes when challenged by disease; for example, people with one copy of the sickle cell trait are more resistant to malaria, certain versions of the HLA gene are linked to slower progression of HIV, and variants of certain genes may protect against developing dementia.

Identifying baseline traits in the heart and figuring out how they interact with heart disease could not only reveal more about heart disease, but could also lead to new therapeutic strategies. If one group has a trait that naturally protects them against heart disease, then researchers can potentially develop medical therapies that induce or recreate that protective feature in others. In such a manner, Page and colleagues hope that their work to identify baseline sex differences could ultimately contribute to advances in prevention and treatment of heart disease.

The new work takes the first step by identifying relevant baseline sex differences. The researchers combined their expertise in sex differences with heart expertise provided by co-authors Christine Seidman, a Harvard Medical School professor and director of the Cardiovascular Genetics Center at Brigham and Women’s Hospital; Harvard Medical School Professor Jonathan Seidman; and Zoltan Arany, a professor and director of the Cardiovascular Metabolism Program at the University of Pennsylvania.

Along with providing heart expertise, the Seidmans and Arany provided data collected from healthy hearts. Gaining access to healthy heart tissue is difficult, and so the researchers felt fortunate to be able to perform new analyses on existing datasets that had not previously been looked at in the context of sex differences. The researchers also used data from the publicly available Genotype-Tissue Expression Project. Collectively, the datasets provided information on bulk and single cell gene expression, as well as metabolomics, of heart tissue—and in particular, of cardiomyocytes.

The researchers searched these datasets for differences between male and female hearts, and found evidence that female cardiomyocytes have higher activity of the primary pathway for energy generation than male cardiomyocytes. Fatty acid oxidation (FAO) is the pathway that produces most of the energy that powers the heart, in the form of the energy molecule ATP. The researchers found that many genes involved in FAO have higher expression levels in female cardiomyocytes. Metabolomic data reinforced these findings by showing that female hearts had greater flux of free fatty acids, the molecules used in FAO, and that female hearts used more free fatty acids than did males in the generation of ATP.

Altogether, these findings show that there are fundamental differences in how female and male hearts generate energy to pump blood. Further experiments are needed to explore whether these differences contribute to the sex differences seen in heart disease. The researchers suspect that an association is likely, because energy production is essential to heart function and failure.

In the meantime, Page and his lab members continue to investigate the biology underlying sex differences in tissues and organs throughout the body.

“We have a lot to learn about the molecular origins of sex differences in health and disease,” Chmátal says. “What’s exciting to me is that the knowledge that comes from these basic science discoveries could lead to treatments that benefit men and women, as well as to policy changes that take sex differences into account when determining how doctors are trained and patients are diagnosed and treated.”

Faculty

Staff

Current Faculty

In Memoriam

Biochemistry, Biophysics, and Structural Biology

Cancer Biology

Cell Biology

Computational Biology

Genetics

Human Disease

Immunology

Microbiology

Neurobiology

Stem Cell and Developmental Biology

Undergraduate Testimonials

General Institute Requirement

Advanced Standing Exam

Transfer Credit

Subject Offerings

Research Opportunities

Biology Undergraduate Student Association

Career Development

Tutoring

Diversity in the Graduate Program

NIH Training Grant

Career Outcomes

Graduate Testimonials

Application Process

Interdisciplinary and Joint Degree Programs

Funding

Living in Cambridge

Subject Offerings

Graduate Manual: Key Program Info

Graduate Teaching

Career Development Resources

Biology Graduate Student Council

BioPals Program

Postdoc Associations

Postdoc Testimonials

Workshops for MIT Biology Postdocs Entering the Academic Job Market

Bernard S. and Sophie G. Gould MIT Summer Research Program in Biology (BSG-MSRP-Bio)

Quantitative Methods Workshop

Summer Workshop for Teachers

MIT Field Trips

LEAH Knox Scholars Program

Additional Resources

Resources for MD/PhD Students

Preliminary Exam Guidelines

Thesis Committee Meetings

Guidelines for Graduating

Contact Us

Directory

Faculty

Staff

Community

2026 MIT Biology Catalyst Symposium

Get Help

Honors and Awards

Employment Opportunities

Support Biology

Faculty

Current Faculty

In Memoriam

Areas of Research

Biochemistry, Biophysics, and Structural Biology

Cancer Biology

Cell Biology

Computational Biology

Genetics

Human Disease

Immunology

Microbiology

Neurobiology

Stem Cell and Developmental Biology

Locations

Core Facilities

Video Gallery

Faculty Resources

Why Biology?

Undergraduate Testimonials