3 Questions with new faculty member Matthew G. Jones: Building predictive models to characterize tumor progression

The assistant professor hopes to decode molecular processes on the genetic, epigenetic, and microenvironment levels to anticipate how and when tumors evolve to resist treatment.

Lillian Eden | Department of Biology
February 20, 2026

Just as Darwin’s finches evolved in response to natural selection in order to endure, the cells that make up a cancerous tumor similarly counter selective pressures in order to survive, evolve, and spread. Tumors are, in fact, complex sets of cells with their own unique structure and ability to change. 

Today, artificial Intelligence and machine learning tools offer an unparalleled opportunity to illuminate the generalizable rules governing tumor progression on the genetic, epigenetic, metabolic, and microenvironmental levels. 

Matthew G. Jones, an Assistant Professor in the Department of Biology at MIT, the Koch Institute for Integrative Cancer Research, and the Institute for Medical Engineering and Science, hopes to use computational approaches to build predictive models — to play a game of chess with cancer, making sense of a tumor’s ability to evolve and resist treatment with the ultimate goal of improving patient outcomes. 

Q: What aspect of tumor progression are you hoping to explore and characterize? 

A: A very common story with cancer is that patients will respond to a therapy at first, and then eventually that treatment will stop working. The reason this largely happens is that tumors have an incredible, and very challenging, ability to evolve: the ability to change their genetic makeup, protein signaling composition, and cellular dynamics. The tumor as a system also evolves at a structural level. Oftentimes, the reason why a patient succumbs to a tumor is because either the tumor has evolved to a state we can no longer control, or it evolves in an unpredictable manner. 

In many ways, cancers can be thought of as, on the one hand, incredibly dysregulated and disorganized, and on the other hand, as having their own internal logic, which is constantly changing. The central thesis of my lab is that tumors follow stereotypical patterns in space and time, and we’re hoping to use computation and experimental technology to decode the molecular processes underlying these transformations.  

We’re focused on one specific way tumors are evolving through a form of DNA amplification called extrachromosomal DNA. Excised from the chromosome, these ecDNAs are circularized and exist as their own separate pool of DNA particles in the nucleus. 

Initially discovered in the 1960s, ecDNA were thought to be a rare event in cancer. However, as researchers began applying next-generation sequencing to large patient cohorts in the 2010s, it seemed like not only were these ecDNA amplifications conferring the ability of tumors to adapt to stresses, and therapies, faster, but that they were far more prevalent than initially thought.

We now know these ecDNA amplifications are apparent in about 25% of cancers, in the most aggressive cancers: brain, lung, and ovarian cancers. We have found that, for a variety of reasons, ecDNA amplifications are able to change the rule book by which tumors evolve in ways that allow them to accelerate to a more aggressive disease in very surprising ways. 

Q: How are you planning to use machine learning and artificial intelligence to study ecDNA amplifications and tumor evolution? 

A: There’s a mandate to translate what I’m doing in the lab to improve patients’ lives. I want to start with patient data to discover how various evolutionary pressures are driving disease and the mutations we observe. 

One of the tools we use to study tumor evolution is single-cell lineage tracing technologies. Broadly, they allow us to study the lineages of individual cells. When we sample a particular cell, not only do we know what that cell looks like, but we can, ideally, pinpoint exactly when aggressive mutations appeared in the tumor’s history. That evolutionary history gives us a way of studying these dynamic processes that we otherwise wouldn’t be able to observe in real time and helps us make sense of how we might be able to intercept that evolution. 

I hope we’re going to get better at stratifying patients who will respond to certain drugs, to anticipate and overcome drug resistance, and to identify new therapeutic targets.

Q: What excites you about joining this community, and what sorts of trainees are you hoping to recruit to your lab? 

A: One of the things that I was really attracted to was the integration of excellence in both engineering and biological sciences. At the Koch Institute, every floor is structured to promote this interface between engineers and basic scientists, and beyond campus, we can connect with all the biomedical research enterprises in the Greater Boston Area. 

Another thing that drew me to MIT was the fact that it places such a strong emphasis on education, training, and investing in student success. I’m a personal believer that what distinguishes academic research from industry research is that academic research is fundamentally a service job, in that we are training the next generation of scientists. 

It was always a mission of mine to bring excellence to both computational and experimental technology disciplines. The types of trainees I’m hoping to recruit are those who are eager to collaborate and solve big problems that require both disciplines. The KI is uniquely set up for this type of hybrid lab: my dry lab is right next to my wet lab, and it’s a source of collaboration and connection, and that reflects the KI’s general vision. 

New insights into a hidden process that protects cells from harmful mutations

To make up for the loss of an important gene's function, cells are known to ramp up activity of other genes with similar functions. New research from the Weissman Lab reveals insights into how cells coordinate this response.

Shafaq Zia | Whitehead Institute
February 12, 2026

Some genetic mutations that are expected to completely stop a gene from working surprisingly cause only mild or even no symptoms. Researchers in previous studies have discovered one reason why: cells can ramp up the activity of other genes that perform similar functions to make up for the loss of an important gene’s function. A new study, published Feb. 12 in the journal Science, from the lab of Whitehead Institute Member Jonathan Weissman now reveals insights into how cells can coordinate this compensation response.

Cells are constantly reading instructions stored in DNA. These instructions, called genes, tell them how to make the many proteins that carry out complex processes needed to sustain life. But first, they need to make a temporary copy of these genetic instructions called messenger RNA, or mRNA.

As part of normal maintenance, cells routinely break down these temporary messages. This process helps control gene activity — or how much protein is made from a given gene — and ensures that old or unnecessary messages don’t accumulate. Cells also destroy faulty mRNAs that contain errors. These messages, if used, could produce damaged proteins that clump together and interfere with normal cellular processes.

In 2019, external studies suggested that this cleanup could be serving as more than just a quality-control check. The researchers showed that when faulty mRNAs are broken down, this breakdown can signal cells to activate the compensation response. These works also suggested that cells decide which backup genes to turn up based on how closely these genes resemble the mRNA that’s being degraded.

But mRNA decay is a process that happens in the cytoplasm, outside the nucleus where DNA, and thereby genes, are stored. So, Mohamed El-Brolosy, a postdoc in the Weissman Lab and lead author of the study, and colleagues wondered how those two processes in different compartments of the cell could be connected. Understanding this mechanism with greater depth could enable development of therapeutics that trigger it in a targeted fashion.

The researchers started by investigating a specific gene that scientists know triggers a compensation response when its mRNA is destroyed by causing a closely related gene to become more active. To find out which molecules within the cell aid this process, the researchers systematically switched other genes off, one at a time.

That’s when they found a protein called ILF3. When the gene encoding this protein was turned off, cells could no longer ramp up the activity of the backup gene following mRNA decay.

Upon further investigation, the researchers identified small RNA fragments — left behind when faulty mRNAs are destroyed — underlying this response. These fragments contain a special sequence that acts like an “address”. The team proposed that this address guides ILF3 to related backup genes that share the same sequence as the faulty mRNA.

In fact, when they introduced mutations in this sequence, the cells’ compensation response dropped, suggesting that the system relies on precise sequence matching to target the correct backup genes.

“That was very exciting for us,” says Weissman, who is also a professor of biology at Massachusetts Institute of Technology and an investigator at the Howard Hughes Medical Institute (HHMI). “It showed us that this isn’t a generic stress response. It’s a regulated system.”

The researchers’ findings point toward new therapeutic possibilities, where boosting the activity of a related gene could mitigate symptoms of certain genetic diseases. More broadly, their work characterizes a mysterious layer of gene regulation.

El-Brolosy, M. A., et al. (2026). Mechanisms linking cytoplasmic decay of translation-defective mRNA to transcriptional adaptation. Science, 391, eaea1272. https://doi.org/10.1126/science.aea1272

How a unique class of neurons may set the table for brain development

A new MIT study from the Nedivi Lab finds that somatostatin-expressing neurons follow a unique trajectory when forming connections in the brain’s visual cortex that may help establish the conditions needed for sensory experience to refine circuits.

David Orenstein | The Picower Institute for Learning and Memory
January 14, 2026

The way the brain develops can shape us throughout our lives, so neuroscientists are intensely curious about how it happens. A new study by researchers in The Picower Institute for Learning and Memory at MIT that focused on visual cortex development in mice, reveals that an important class of neurons follows a set of rules that while surprising, might just create the right conditions for circuit optimization.

During early brain development, multiple types of neurons emerge in the visual cortex (where the brain processes vision). Many are “excitatory,” driving the activity of brain circuits, and others are “inhibitory,” meaning they control that activity. Just like a car needs not only an engine and a gas pedal, but also a steering wheel and brakes, a healthy balance between excitation and inhibition is required for proper brain function. During a “critical period” of development in the visual cortex, soon after the eyes first open, excitatory and inhibitory neurons forge and edit millions of connections, or synapses, to adapt nascent circuits to the incoming flood of visual experience. Over many days, in other words, the brain optimizes its attunement to the world.

In the new study in The Journal of Neuroscience, a team led by MIT research scientist Josiah Boivin and Professor Elly Nedivi visually tracked somatostatin (SST)-expressing inhibitory neurons forging synapses with excitatory cells along their sprawling dendrite branches, illustrating the action before, during and after the critical period with unprecedented resolution. Several of the rules the SST cells appeared to follow were unexpected—for instance, unlike other cell types, their activity did not depend on visual input—but now that the scientists know these neurons’ unique trajectory, they have a new idea about how it may enable sensory activity to influence development: SST cells might help usher in the critical period by establishing the baseline level of inhibition needed to ensure that only certain types of sensory input will trigger circuit refinement.

“Why would you need part of the circuit that’s not really sensitive to experience? It could be that it’s setting things up for the experience-dependent components to do their thing,” said Nedivi, William R. and Linda R. Young Professor in The Picower Institute and MIT’s Departments of Biology and Brain and Cognitive Sciences.

Boivin added: “We don’t yet know whether SST neurons play a causal role in the opening of the critical period, but they are certainly in the right place at the right time to sculpt cortical circuitry at a crucial developmental stage.”

A unique trajectory

To visualize SST-to-excitatory synapse development, Nedivi and Boivin’s team used a genetic technique that pairs expression of synaptic proteins with fluorescent molecules to resolve the appearance of the “boutons” SST cells use to reach out to excitatory neurons. They then performed a technique called eMAP, developed by Kwanghun Chung’s lab in the Picower Institute, that expands and clears brain tissue to increase magnification, allowing super-resolution visualization of the actual synapses those boutons ultimately formed with excitatory cells along their dendrites. Co-author and postdoc Bettina Schmerl helped lead the eMAP work.

These new techniques revealed that SST bouton appearance and then synapse formation surged dramatically when the eyes opened and then as the critical period got underway. But while excitatory neurons during this timeframe are still maturing, first in the deepest layers of the cortex and later in its more superficial layers, the SST boutons blanketed all layers simultaneously, meaning that, perhaps counter intuitively, they sought to establish their inhibitory influence regardless of the maturation stage of their intended partners.

Many studies have shown that eye opening and the onset of visual experience sets in motion the development and elaboration of excitatory cells and another major inhibitory neuron type (parvalbumin-expressing cells). Raising mice in the dark for different lengths of time, for instance, can distinctly alter what happens with these cells. Not so for the SST neurons. The new study showed that varying lengths of darkness had no effect on the trajectory of SST bouton and synapse appearance; it remained invariant, suggesting it is pre-ordained by a genetic program or an age-related molecular signal, rather than experience.

Moreover, after the initial frenzy of synapse formation during development, many synapses are then edited, or pruned away, so that only the ones needed for appropriate sensory responses endure. Again, the SST boutons and synapses proved to be exempt from these redactions. Though the pace of new SST synapse formation slowed at the peak of the critical period, the net number of synapses never declined and even continued increasing into adulthood.

“While a lot of people think that the only difference between inhibition and excitation is their valence, this demonstrates that inhibition works by a totally different set of rules,” Nedivi said.

In all, while other cell types were tailoring their synaptic populations to incoming experience, the SST neurons appeared to provide an early but steady inhibitory influence across all layers of the cortex. After excitatory synapses have been pruned back by the time of adulthood, the continued upward trickle of SST inhibition may contribute to the increase in the inhibition to excitation ratio that still allows the adult brain to learn, but not as dramatically or as flexibly as during early childhood.

A platform for future studies

In addition to shedding light on typical brain development, Nedivi said, the study’s techniques can enable side-by-side comparisons in mouse models of neurodevelopmental disorders such as autism or epilepsy where aberrations of excitation and inhibition balance are implicated.

Future studies using the techniques can also look at how different cell types connect with each other in brain regions other than the visual cortex, she added.

Boivin, who will soon open his own lab as a faculty member at Amherst College, said he is eager to apply the work in new ways.

“I’m excited to continue investigating inhibitory synapse formation on genetically defined cell types in my future lab,” Boivin said. “I plan to focus on the development of limbic brain regions that regulate behaviors relevant to adolescent mental health.”

In addition to Nedivi, Boivin and Schmerl, the paper’s other authors are Kendyll Martin, and Chia-Fang Lee.

Funding for the study came from the National Institutes of Health, the Office of Naval Research and the Freedom Together Foundation.

New chemical method makes it easier to select desirable traits in crops

Whitehead Institute Member Mary Gehring and colleagues offer a new method for generating large-scale genetic changes without irradiation.

Mackenzie White | Whitehead Institute
January 8, 2026

Crops increasingly need to thrive in a broader range of conditions, including drought, salinity, and heat. Traditional plant breeding can select for desirable traits, but is limited by the genetic variation that already exists in plants. In many crops, domestication and long-term selection have narrowed genetic diversity, constraining efforts to develop new varieties.

To work around these limits, researchers have developed ways to introduce helpful traits, such as drought or salt tolerance, into plants through mutation breeding. This deliberately introduces random genetic changes into plants. Then researchers screen the genetically altered plants to see which have acquired useful traits. One widely used approach relies on radiation to generate structural variants—large-scale DNA changes that can affect multiple genes at once. However, irradiation introduces logistical and regulatory hurdles that restrict who can use it and which crops can be studied.

In a paper published in PLOS Genetics on December 18, Whitehead Institute Member Mary Gehring and colleagues offer a new method for generating large-scale genetic changes without irradiation.

Lead author Lindsey Bechen, the Gehring lab manager; Gehring; former postdoc P.R.V. Satyaki (now a faculty member at the University of Toronto); and their colleagues developed the approach by exposing germinating seeds to etoposide, a chemotherapy drug, during early growth.

The drug interferes with an enzyme that helps manage DNA structure during cell division. When cells attempt to repair the resulting breaks in their DNA, errors in the repair process can produce large-scale rearrangements in the genome. Seeds collected from treated plants carry these changes in a heritable form.

The process relies on standard laboratory tools: seeds are germinated on growth medium containing the drug, then transferred to soil to complete their life cycle.

“I was surprised at how efficient it was,” says Gehring, who is also a professor of biology at MIT and an HHMI Investigator. “The diversity of new traits that you could see just by looking at the plants in the first generation was extensive.”

The researchers demonstrated the method in Arabidopsis thaliana, a model plant widely used in genetic studies. Roughly two-thirds of treated plant lines showed visible differences, including changes in leaf shape, plant size, pigmentation, and fertility. Genetic analyses linked these traits to deletions, duplications, and rearrangements of DNA segments.

In several cases, the team linked specific plant traits to individual genetic changes. A dwarf plant with thick stems and unusual leaves carried a large change that disrupted a gene involved in leaf development. Another plant, marked by green-and-white mottled leaves, carried a deletion in the gene IMMUTANS—the same gene identified in radiation-induced mutants described more than 60 years ago.

Beyond Arabidopsis, Gehring’s lab is applying the technique to pigeon pea, a drought-tolerant legume and an important source of dietary protein in parts of Asia and Africa. Pigeon pea is an underutilized crop with the potential to become a staple crop—if its lack of genetic diversity, caused by a historical cultivation bottleneck, can be overcome. Often referred to as orphan crops, species like pigeon pea receive limited research attention and often lack the genetic variation needed for breeding improved varieties.

“All of the traits that we might want to see in pigeon pea are not present in the existing population,” says Gehring. “The idea is to do a large-scale mutation experiment to increase genetic diversity.”

The team, which includes Gehring lab postdoc Sonia Boor, is now screening treated pigeon pea lines for salt tolerance, a trait that shapes where crops can be grown and how they perform in saline soils. Although pigeon pea takes longer to grow than Arabidopsis, the researchers have reached the second generation and identified several lines that show promising responses under saline conditions.

The researchers’ chemical approach may also be beneficial for crops that are difficult to modify using gene-editing tools such as CRISPR. Although CRISPR enables precise genetic changes, it often relies on genetic transformation, a technically challenging step for many plant species.

“A lot of species that one works with, either in agriculture or horticulture, are not amenable to genetic transformation,” says Gehring.

The new method complements existing genetic tools rather than replacing them. By providing a more accessible alternative to irradiation, chemical mutation could expand the availability of large-scale genetic changes and novel plant varieties.

Looking ahead, Gehring’s lab plans to develop comprehensive collections of Arabidopsis mutants carrying well-characterized structural variants. Such resources could help researchers better understand how large-scale changes in genome structure influence plant development and performance, informing future efforts to study and enhance crops.

Bechen, L. L., Ahsan, N., Bahrainwala, A., Gehring, M., & Satyaki, P. R. (2025). A simple method to efficiently generate structural variation in plants. PLOS Genetics21(12). https://doi.org/10.1371/journal.pgen.1011977
High-fat diets make liver cells more likely to become cancerous

New research from the Yilmaz Lab suggests liver cells exposed to too much fat revert to an immature state that is more susceptible to cancer-causing mutations.

Anne Trafton | MIT News
December 22, 2025

One of the biggest risk factors for developing liver cancer is a high-fat diet. A new study from MIT reveals how a fatty diet rewires liver cells and makes them more prone to becoming cancerous.

The researchers found that in response to a high-fat diet, mature hepatocytes in the liver revert to an immature, stem-cell-like state. This helps them to survive the stressful conditions created by the high-fat diet, but in the long term, it makes them more likely to become cancerous.

“If cells are forced to deal with a stressor, such as a high-fat diet, over and over again, they will do things that will help them survive, but at the risk of increased susceptibility to tumorigenesis,” says Alex K. Shalek, director of the Institute for Medical Engineering and Sciences (IMES), the J. W. Kieckhefer Professor in IMES and the Department of Chemistry, and a member of the Koch Institute for Integrative Cancer Research at MIT, the Ragon Institute of MGH, MIT, and Harvard, and the Broad Institute of MIT and Harvard.

The researchers also identified several transcription factors that appear to control this reversion, which they believe could make good targets for drugs to help prevent tumor development in high-risk patients.

Shalek; Ömer Yilmaz, an MIT associate professor of biology and a member of the Koch Institute; and Wolfram Goessling, co-director of the Harvard-MIT Program in Health Sciences and Technology, are the senior authors of the study, which appears today in Cell. MIT graduate student Constantine Tzouanas, former MIT postdoc Jessica Shay, and Massachusetts General Brigham postdoc Marc Sherman are the co-first authors of the paper.

Cell reversion

A high-fat diet can lead to inflammation and buildup of fat in the liver, a condition known as steatotic liver disease. This disease, which can also be caused by a wide variety of long-term metabolic stresses such as high alcohol consumption, may lead to liver cirrhosis, liver failure, and eventually cancer.

In the new study, the researchers wanted to figure out just what happens in cells of the liver when exposed to a high-fat diet — in particular, which genes get turned on or off as the liver responds to this long-term stress.

To do that, the researchers fed mice a high-fat diet and performed single-cell RNA-sequencing of their liver cells at key timepoints as liver disease progressed. This allowed them to monitor gene expression changes that occurred as the mice advanced through liver inflammation, to tissue scarring and eventually cancer.

In the early stages of this progression, the researchers found that the high-fat diet prompted hepatocytes, the most abundant cell type in the liver, to turn on genes that help them survive the stressful environment. These include genes that make them more resistant to apoptosis and more likely to proliferate.

At the same time, those cells began to turn off some of the genes that are critical for normal hepatocyte function, including metabolic enzymes and secreted proteins.

“This really looks like a trade-off, prioritizing what’s good for the individual cell to stay alive in a stressful environment, at the expense of what the collective tissue should be doing,” Tzouanas says.

Some of these changes happened right away, while others, including a decline in metabolic enzyme production, shifted more gradually over a longer period. Nearly all of the mice on a high-fat diet ended up developing liver cancer by the end of the study.

When cells are in a more immature state, it appears that they are more likely to become cancerous if a mutation occurs later on, the researchers say.

“These cells have already turned on the same genes that they’re going to need to become cancerous. They’ve already shifted away from the mature identity that would otherwise drag down their ability to proliferate,” Tzouanas says. “Once a cell picks up the wrong mutation, then it’s really off to the races and they’ve already gotten a head start on some of those hallmarks of cancer.”

The researchers also identified several genes that appear to orchestrate the changes that revert hepatocytes to an immature state. While this study was going on, a drug targeting one of these genes (thyroid hormone receptor) was approved to treat a severe form of steatotic liver disease called MASH fibrosis. And, a drug activating an enzyme that they identified (HMGCS2) is now in clinical trials to treat steatotic liver disease.

Another possible target that the new study revealed is a transcription factor called SOX4, which is normally only active during fetal development and in a small number of adult tissues (but not the liver).

Cancer progression

After the researchers identified these changes in mice, they sought to discover if something similar might be happening in human patients with liver disease. To do that, they analyzed data from liver tissue samples removed from patients at different stages of the disease. They also looked at tissue from people who had liver disease but had not yet developed cancer.

Those studies revealed a similar pattern to what the researchers had seen in mice: The expression of genes needed for normal liver function decreased over time, while genes associated with immature states went up. Additionally, the researchers found that they could accurately predict patients’ survival outcomes based on an analysis of their gene expression patterns.

“Patients who had higher expression of these pro-cell-survival genes that are turned on with high-fat diet survived for less time after tumors developed,” Tzouanas says. “And if a patient has lower expression of genes that support the functions that the liver normally performs, they also survive for less time.”

While the mice in this study developed cancer within a year or so, the researchers estimate that in humans, the process likely extends over a longer span, possibly around 20 years. That will vary between individuals depending on their diet and other risk factors such as alcohol consumption or viral infections, which can also promote liver cells’ reversion to an immature state.

The researchers now plan to investigate whether any of the changes that occur in response to a high-fat diet can be reversed by going back to a normal diet, or by taking weight-loss drugs such as GLP-1 agonists. They also hope to study whether any of the transcription factors they identified could make good targets for drugs that could help prevent diseased liver tissue from becoming cancerous.

“We now have all these new molecular targets and a better understanding of what is underlying the biology, which could give us new angles to improve outcomes for patients,” Shalek says.

The research was funded, in part, by a Fannie and John Hertz Foundation Fellowship, a National Science Foundation Graduate Research Fellowship, the National Institutes of Health, and the MIT Stem Cell Initiative through Foundation MIT.

3 Questions with new faculty member Yunha Hwang: Using computation to study the world’s best single-celled chemists

The assistant professor utilizes microbial genomes to examine the language of biology. Her appointment reflects MIT’s commitment to exploring the intersection of genetics research and AI.

Lillian Eden | Department of Biology
December 15, 2025

Today, out of an estimated 1 trillion species on Earth, 99.999 percent are considered microbial — bacteria, archaea, viruses, and single-celled eukaryotes. For much of our planet’s history, microbes ruled the Earth, able to live and thrive in the most extreme of environments. Researchers have only just begun in the last few decades to contend with the diversity of microbes — it’s estimated that less than 1 percent of known genes have laboratory-validated functions. Computational approaches offer researchers the opportunity to strategically parse this truly astounding amount of information.

An environmental microbiologist and computer scientist by training, new MIT faculty member Yunha Hwang is interested in the novel biology revealed by the most diverse and prolific life form on Earth. In a shared faculty position as the Samuel A. Goldblith Career Development Professor in the Department of Biology, as well as an assistant professor at the Department of Electrical Engineering and Computer Science and the MIT Schwarzman College of Computing, Hwang is exploring the intersection of computation and biology.  

Q: What drew you to research microbes in extreme environments, and what are the challenges in studying them?

A: Extreme environments are great places to look for interesting biology. I wanted to be an astronaut growing up, and the closest thing to astrobiology is examining extreme environments on Earth. And the only thing that lives in those extreme environments are microbes. During a sampling expedition that I took part in off the coast of Mexico, we discovered a colorful microbial mat about 2 kilometers underwater that flourished because the bacteria breathed sulfur instead of oxygen — but none of the microbes I was hoping to study would grow in the lab.

The biggest challenge in studying microbes is that a majority of them cannot be cultivated, which means that the only way to study their biology is through a method called metagenomics. My latest work is genomic language modeling. We’re hoping to develop a computational system so we can probe the organism as much as possible “in silico,” just using sequence data. A genomic language model is technically a large language model, except the language is DNA as opposed to human language. It’s trained in a similar way, just in biological language as opposed to English or French. If our objective is to learn the language of biology, we should leverage the diversity of microbial genomes. Even though we have a lot of data, and even as more samples become available, we’ve just scratched the surface of microbial diversity.

Q: Given how diverse microbes are and how little we understand about them, how can studying microbes in silico, using genomic language modeling, advance our understanding of the microbial genome?

A: A genome is many millions of letters. A human cannot possibly look at that and make sense of it. We can program a machine, though, to segment data into pieces that are useful. That’s sort of how bioinformatics works with a single genome. But if you’re looking at a gram of soil, which can contain thousands of unique genomes, that’s just too much data to work with — a human and a computer together are necessary in order to grapple with that data.

During my PhD and master’s degree, we were only just discovering new genomes and new lineages that were so different from anything that had been characterized or grown in the lab. These were things that we just called “microbial dark matter.” When there are a lot of uncharacterized things, that’s where machine learning can be really useful, because we’re just looking for patterns — but that’s not the end goal. What we hope to do is to map these patterns to evolutionary relationships between each genome, each microbe, and each instance of life.

Previously, we’ve been thinking about proteins as a standalone entity — that gets us to a decent degree of information because proteins are related by homology, and therefore things that are evolutionarily related might have a similar function.

What is known about microbiology is that proteins are encoded into genomes, and the context in which that protein is bounded — what regions come before and after — is evolutionarily conserved, especially if there is a functional coupling. This makes total sense because when you have three proteins that need to be expressed together because they form a unit, then you might want them located right next to each other.

What I want to do is incorporate more of that genomic context in the way that we search for and annotate proteins and understand protein function, so that we can go beyond sequence or structural similarity to add contextual information to how we understand proteins and hypothesize about their functions.

Q: How can your research be applied to harnessing the functional potential of microbes?

A: Microbes are possibly the world’s best chemists. Leveraging microbial metabolism and biochemistry will lead to more sustainable and more efficient methods for producing new materials, new therapeutics, and new types of polymers.

But it’s not just about efficiency — microbes are doing chemistry we don’t even know how to think about. Understanding how microbes work, and being able to understand their genomic makeup and their functional capacity, will also be really important as we think about how our world and climate are changing. A majority of carbon sequestration and nutrient cycling is undertaken by microbes; if we don’t understand how a given microbe is able to fix nitrogen or carbon, then we will face difficulties in modeling the nutrient fluxes of the Earth.

On the more therapeutic side, infectious diseases are a real and growing threat. Understanding how microbes behave in diverse environments relative to the rest of our microbiome is really important as we think about the future and combating microbial pathogens.

RNA editing study finds many ways for neurons to diversify

When MIT neurobiologists including Troy Littleton tracked how more than 200 motor neurons in fruit flies each edited their RNA, they cataloged hundreds of target sites and widely varying editing rates. Scores of edits altered proteins involved in neural communication and function.

David Orenstein | The Picower Institute for Learning and Memory
November 20, 2025

All starting from the same DNA, neurons ultimately take on individual characteristics in the brain and body. Differences in which genes they transcribe into RNA help determine which type of neuron they become, and from there, a new MIT study shows, individual cells edit a selection of sites in those RNA transcripts, each at their own widely varying rates.

The new study surveyed the whole landscape of RNA editing in more than 200 individual cells commonly used as models of fundamental neural biology: tonic and phasic motor neurons of the fruit fly. One of the main findings is that most sites were edited at rates between the “all or nothing” extremes many scientists have assumed based on more limited studies in mammals, said senior author Troy Littleton, Menicon Professor in the Departments of Biology and Brain and Cognitive Sciences. The resulting dataset and analyses published in eLife set the table for discoveries about how RNA editing affects neural function and what enzymes implement those edits.

“We have this ‘alphabet’ now for RNA editing in these neurons,” Littleton said. “We know which genes are edited in these neurons so we can go in and begin to ask questions as to what is that editing doing to the neuron at the most interesting targets.”

Andres Crane, who earned his PhD in Littleton’s lab based on this work, is the study’s lead author.

From a genome of about 15,000 genes, Littleton and Crane’s team found, the neurons made hundreds of edits in transcripts from hundreds of genes. For example, the team documented “canonical” edits of 316 sites in 210 genes. Canonical means that the edits were made by the well-studied enzyme ADAR, which is also found in mammals including humans. Of the 316 edits, 175 occurred in regions that encode the contents of proteins. Analysis indeed suggested 60 are likely to significantly alter amino acids. But they also found 141 more editing sites in areas that don’t code for proteins but instead affect their production, which means they could affect protein levels, rather than their contents.

The team also found many “non-canonical” edits that ADAR didn’t make. That’s important, Littleton said, because that information could aid in discovering more enzymes involved in RNA editing, potentially across species. That, in turn, could expand the possibilities for future genetic therapies.

“In the future, if we can begin to understand in flies what the enzymes are that make these other non-canonical edits, it would give us broader coverage for thinking about doing things like repairing human genomes where a mutation has broken a protein of interest,” Littleton said.

Moreover, by looking specifically at fly larvae, the team found many edits that were specific to juveniles vs. adults, suggesting potential significance during development. And because they looked at full gene transcripts of individual neurons, the team was also able to find editing targets that had not been cataloged before.

Widely varying rates

Some of the most heavily edited RNAs were from genes that make critical contributions to neural circuit communication such as neurotransmitter release, and the channels that cells form to regulate the flow of chemical ions that vary their electrical properties. The study identified 27 sites in 18 genes that were edited more than 90 percent of the time.

Yet neurons sometimes varied quite widely in whether they would edit a site, which suggests that even neurons of the same type can still take on significant degrees of individuality.

“Some neurons displayed ~100 percent editing at certain sites, while others displayed no editing for the same target,” the team wrote in eLife. “Such dramatic differences in editing rate at specific target sites is likely to contribute to the heterogeneous features observed within the same neuronal population.”

On average, any given site was edited about two-thirds of the time, and most sites were edited within a range well between all or nothing extremes.

“The vast majority of editing events we found were somewhere between 20% and 70%,” Littleton said. “We were seeing mixed ratios of edited and unedited transcripts within a single cell.”

Also, the more a gene was expressed, the less editing it experienced, suggesting that ADAR could only keep up so much with its editing opportunities.

Potential impacts on function

One of the key questions the data enables scientists to ask is what impact RNA edits have on the function of the cells. In a 2023 study, Littleton’s lab began to tackle this question by looking at just two edits they found in the most heavily edited gene: Complexin. Complexin’s protein product restrains release of the neurotransmitter glutamate, making it a key regulator of neural circuit communication. They found that by mixing and matching edits, neurons produced up to eight different versions of the protein with significant effects on their glutamate release and synaptic electrical current. But in the new study, the team reports 13 more edits in Complexin that are yet to be studied.

Littleton said he’s intrigued by another key protein, called Arc1, that the study shows experienced a non-canonical edit. Arc is a vitally important gene in “synaptic plasticity,” which is the property neurons have of adjusting the strength or presence of their “synapse” circuit connections in response to nervous system activity. Such neural nimbleness is hypothesized to be the basis of how the brain can responsively encode new information in learning and memory. Notably, Arc1 editing fails to occur in fruit flies that model Alzheimer’s disease.

Littleton said the lab is now working hard to understand how the RNA edits they’ve documented affect function in the fly motor neurons.

In addition to Crane and Littleton, the study’s other authors are Michiko Inouye and Suresh Jetti.

The National Institutes of Health, The Freedom Together Foundation and The Picower Institute for Learning and Memory provided support for the study.

Research:

Andrés B CraneMichiko O InouyeSuresh K JettiJ Troy Littleton (2025) A stochastic RNA editing process targets a select number of sites in individual Drosophila glutamatergic motoneurons eLife 14:RP108282.
https://doi.org/10.7554/eLife.108282.2

Alternate proteins from the same gene contribute differently to health and rare disease

Whitehead Institute Member Iain Cheeseman, graduate student Jimmy Ly, and colleagues propose that researchers and clinicians may be able to get more information from patients’ genomes by looking at them in a different way.

Greta Friar | Whitehead Institute
November 7, 2025

In a paper published in Molecular Cell on November 7, Whitehead Institute Member Iain Cheeseman, graduate student Jimmy Ly, and colleagues propose that researchers and clinicians may be able to get more information from patients’ genomes by looking at them in a different way.

The common wisdom is that each gene codes for one protein. Someone studying whether a patient has a mutation or version of a gene that contributes to their disease will therefore look for mutations that affect the “known” protein product of that gene. However, Cheeseman and others are finding that the majority of genes code for more than one protein. That means that a mutation that may seem insignificant because it does not appear to affect the known protein could nonetheless alter a different protein made by the same gene. Now, Cheeseman and Ly have shown that mutations affecting one or multiple proteins from the same gene can contribute differently to disease.

In their paper, the researchers first share what they have learned about how cells make use of the ability to generate different versions of proteins from the same gene. Then, they examine how mutations that affect these proteins contribute to disease. Through a collaboration with co-author Mark Fleming, the pathologist-in-chief at Boston Children’s Hospital, they provide two case studies of patients with atypical presentations of a rare anemia linked to mutations that selectively affect only one of two proteins produced by the gene implicated in the disease.

“We hope this work demonstrates the importance of considering whether a gene of interest makes multiple versions of a protein, and what the role of each version is in health and disease,” Ly says. “This information could lead to better understanding of the biology of disease, better diagnostics, and perhaps one day to tailored therapies to treat these diseases.”

Rethinking how cells use genes

Cells have several ways to make different versions of a protein, but the variation that Cheeseman and Ly study happens during protein production from genetic code. Cellular machines build each protein according to the instructions within a genetic sequence that begins at a “start codon” and ends at a “stop codon.” However, some genetic sequences contain more than one start codon, many that are hiding in plain sight. If the cellular machinery skips the first start codon and detects a second one, it may build a shorter version of the protein. In other cases, the machinery may detect a section that closely resembles a start codon at a point earlier in the sequence than its typical starting place, and build a longer version of the protein.

These events may sound like mistakes: the cell’s machinery accidentally creating the wrong version of the correct protein. To the contrary, protein production from these alternate starting places is an important feature of cell biology that exists across species. When Ly traced when certain genes evolved to produce multiple proteins, he found that this is a common, robust process that has been preserved throughout evolutionary history for millions of years.

Ly shows that one function this serves is to send versions of a protein to different parts of the cell. Many proteins contain zip code-like sequences that tell the cell’s machinery where to deliver them so the proteins can do their jobs. Ly found many examples in which longer and shorter versions of the same protein contained different zip codes and ended up in different places within the cell.

In particular, Ly found many cases in which one version of a protein ended up in mitochondria, structures that provide energy to cells, while another version ended up elsewhere. Because of the mitochondria’s role in the essential process of energy production, mutations to mitochondrial genes are often implicated in disease.

Ly wondered what would happen when a disease-causing mutation eliminates one version of a protein but leaves the other intact, causing the protein to only reach one of its two intended destinations. He looked through a database containing genetic information from people with rare diseases to see if such cases existed, and found that they did. In fact, there may be tens of thousands of such cases. However, without access to the people, Ly had no way of knowing what the consequences of this were in terms of symptoms and severity of disease.

Meanwhile, Cheeseman had begun working with Boston Children’s Hospital to foster collaborations between Whitehead Institute and the hospital’s researchers and clinicians to accelerate the pathway from research discovery to clinical application. Through these efforts, Cheeseman and Ly met Fleming.

One group of Fleming’s patients have a type of anemia called SIFD—Sideroblastic Anemia with B-Cell Immunodeficiency, Periodic Fevers, and Developmental Delay—that is caused by mutations to the TRNT1 gene. TRNT1 is one of the genes Ly had identified as producing a mitochondrial version of its protein and another version that ends up elsewhere: in the nucleus.

Fleming shared anonymized patient data with Ly, and Ly found two cases of interest in the genetic data. Most of the patients had mutations that impaired both versions of the protein, but one patient had a mutation that eliminated only the mitochondrial version of the protein, while another patient had a mutation that eliminated only the nuclear version.

When Ly shared his results, Fleming revealed that both of those patients had very atypical presentations of SIFD, supporting Ly’s hypothesis that mutations affecting different versions of a protein would have different consequences. The patient who only had the mitochondrial version was anemic but developmentally normal. The patient missing the mitochondrial version of the protein did not have developmental delays or chronic anemia but did have other immune symptoms, and was not correctly diagnosed until his fifties. There are likely other factors contributing to each patient’s exact presentation of the disease, but Ly’s work begins to unravel the mystery of their atypical symptoms.

Cheeseman and Ly want to make more clinicians aware of the prevalence of genes coding for more than one protein, so they know to check for mutations affecting any of the protein versions that could contribute to disease. For example, several TRNT1 mutations that only eliminate the shorter version of the protein are not flagged as disease-causing by current assessment tools. Cheeseman lab researchers including Ly and graduate student Matteo Di Bernardo are now developing a new assessment tool for clinicians, called SwissIsoform, that will identify relevant mutations that affect specific protein versions, including mutations that would otherwise be missed.

“Jimmy and Iain’s work will globally support genetic disease variant interpretation and help with connecting genetic differences to variation in disease symptoms,” Fleming says. “In fact, we have recently identified two other patients with mutations affecting only the mitochondrial versions of two other proteins, who similarly have milder symptoms than patients with mutations that affect both versions.”

Long term, the researchers hope that their discoveries could aid in understanding the molecular basis of disease and in developing new gene therapies: once researchers understand what has gone wrong within a cell to cause disease, they are better equipped to devise a solution. More immediately, the researchers hope that their work will make a difference by providing better information to clinicians and people with rare diseases.

“As a basic researcher who doesn’t typically interact with patients, there’s something very satisfying about knowing that the work you are doing is helping specific people,” Cheeseman says. “As my lab transitions to this new focus, I’ve heard many stories from people trying to navigate a rare disease and just get answers, and that has been really motivating to us, as we work to provide new insights into the disease biology.”

Jimmy Ly, Matteo Di Bernardo, Yi Fei Tao, Ekaterina Khalizeva, Christopher J. Giuliano, Sebastian Lourido, Mark D. Fleming, Iain M. Cheeseman. “Alternative start codon selection shapes mitochondrial function and rare human diseases.” Molecular Cell, November 7, 2025. DOI: https://10.0.3.248/j.molcel.2025.10.013

Q&A: Picower researchers including MIT Biology faculty Sara Prescott join effort to investigate the ‘Biology of Adversity’

Assistant Professor Sara Prescott and Research Affiliate Ravikiran Raju are key collaborators in a new Broad Institute research project to better understand physiological and medical effects of acute and chronic life stressors.

David Orenstein | The Picower Institute for Learning and Memory
November 3, 2025

Adverse experiences such as abuse and violence or poverty and deprivation have always been understood to be harmful, but the tools to understand how they may cause specific medical conditions and outcomes have only emerged recently. Technologies such as RNA or chromatin sequencing, for instance, can help scientists observe how stressors change gene expression, which can help establish mechanistic biological explanations for why people who’ve suffered adversity also experience higher risks of conditions such as stroke or Alzheimer’s disease.

Advancing scientific understanding of the physiological connections between adversity and disease can help pharmaceutical developers, physicians and public officials to develop meaningful interventions. Led by researcher Jason Buenrostro, the Broad Institute has launched a new research program, the “Biology of Adversity” project.. As leading collaborators in the effort, Picower Institute investigator Sara Prescott, assistant professor of biology, and Tsai Lab research affiliate Ravikiran Raju, a pediatrician at Boston Children’s Hospital, plan research projects in their Picower Institute labs to better elucidate how life stress leads to medical distress.

How can biology and neuroscience studies help people who’ve experienced adversity?

Prescott: Adversity comes in many flavors. But across different types of adversity, there is a common theme that it leads to psychological and emotional distress. If you were to ask a random person on the street, they’d probably tell you that distress is simply a feeling that exists only in the mind, rather than a biological process. But this is not true. We now appreciate that stress has predictable effects on the body, and there are severe long-term health consequences of experiencing chronic stress. Unfortunately, it’s been difficult to argue based on epidemiological data that stress itself (rather than other lifestyle factors like diet, smoking or access to health care services) is causally linked to poor health outcomes. This is confounded by the fact that we haven’t had good ways to empirically measure people’s levels of adversity and stress. This is part of what we want to address at the Biology of Adversity Project.

From a scientific perspective, there is still much to be understood about stress and the biological processes that lead to stress-associated diseases. And so that’s hopefully where efforts like the Biology of Adversity Project are going to come in. We can use scientific practices to come up with better guidelines for ways to track levels of stress, develop diagnostics, and then, hopefully, one day this will turn into actionable interventions. It’s not a random process of things going awry. There are going to be biological programs that are engaged in predictable ways. And we’re trying to understand, what exactly are these neural or biological programs? How many different types of programs are there? And how do each of those programs actually work down to the cellular and molecular level?

Raju: Efforts to combat adversity and stress have largely remained in the social space to date. But what we know from a growing body of epidemiological literature is that social stressors can have profound biological impact. They cause increases in mental health disorders, physical disorders like cancer, stroke, and heart disease. Individuals who experience chronic and high levels of stress are dying sooner. I think there is an imperative to understand what these forces are doing to our biology and how they’re dysregulating our physiology. Armed with that information, we can start to be more mechanistic and evidence-based in our promotion of resilience. What are the pathways that are made vulnerable when individuals are stressed? How do we rescue those deficiencies, whether it be through existing practices or novel interventions? A lot of the research we’re doing here at Picower is focusing on pathways that could be targeted and leveraged using specific micronutrients or specific small molecules that help promote resilience and prevent the onset of premature illness in individuals who are stress exposed.

What is the Biology of Adversity Project and how are each of you involved?

Prescott: My lab studies the autonomic nervous system, and we’re involved in the project’s animal studies. We think of stress as an adaptive response to prepare the body for an impending threat. When people experience stress, what happens? You engage a fight or flight response—you sweat, start to breathe harder, your heart rate goes up, your pupils dilate. This is protective in acute settings, but can become very maladaptive when these systems are activated for too long or in inappropriate settings, like when someone is having a panic attack. We predict that a lot of the long-term health consequences associated with adversity could relate to dysregulated autonomic stress responses.

And so that’s where our lab’s tools come in. We have good ways in animals to measure their heart rate and breathing in response to stress. We also have a wide range of genetic tools to specifically target different neural pathways in the periphery, possibly blocking stress pathways at the source. With these tools, we can explore what role those circuits have in long-term changes in these animals with greater precision than what was possible in the past.

Raju: My involvement came through my work on the Environmental and Social Determinants of Child Mental Health Conference in 2023, which I co-hosted with Li-Huei Tsai. I think this conference made the scientific community in Boston more aware that this was something of deep interest to researchers at Picower and MIT. In the creation of the Biology of Adversity Project, the center director, Jason Buenrostro, was doing a survey of the landscape of folks who were studying stress and adversity, and who were passionate about it and connected with us because of that symposium. Since then, I’ve been engaged in really exciting conversations with him and a exciting group of collaborators, including Sara Prescott. And so I’m really excited that a few of our projects are being showcased as flagship projects. We are currently using animal models of early life stress to try and build preclinical models to deepen our understanding of how stress dysregulates physiology. We’re developing pipelines for trying to think about promoting resilience through targeted interventions, using those preclinical models.

What research questions do you each plan to tackle?

Prescott: Broadly, we’re interested in the body-brain connection and how this relates to stress. How do different cues from within the body—like diet, or taking a deep breath–promote or regulate stress levels? These are interesting questions about how sensory inputs from the body feed into stress circuits in the brain. We’re also interested in the other direction—understanding how stress causes changes to peripheral organs, for example, by engaging the sympathetic nervous system. It’s well understood that sympathetic neurons are responsible for making you sweat and your heart race, but do they do other things as well? For example, the field is starting to appreciate that these same neurons regulate the immune system, and can signal to stem cells to promote or suppress tissue repair. These are important pathways to understand, as they could explain some of the links between chronic stress (where sympathetic neurons are over-activated) and increased rates of diseases like cancer. It also may have therapeutic applications down the road. I’m incredibly excited for the opportunity to work with people like Ravi, and others in the project, to apply our expertise in physiology and autonomic signaling towards this immensely important problem. I’m hoping that through this work we can move to an era where we can, from a societal perspective, understand how much our stress levels are damaging our body, be able to track that, and then find better ways to prevent the damage that’s happening.

Raju:  We are leveraging three key mouse models of environmental perturbations in this work: environmental enrichment, social isolation and resource deprivation. In studying enrichment, we are trying to better study the factors that promote resilience to stress. In our previous work on resilience, for example, we identified a transcription factor that’s specifically recruited to help ensure that neurons are resilient to the onset of Alzheimer’s pathology. So we’ve leveraged these enrichment models to study that mechanism and are able to then think of how that pathway might be leveraged in stress-exposed individuals. We are also using models of stress, specifically social isolation and resource deprivation. The idea here is that because mice are social mammals and rely on resources and social interactions and social networks in order to thrive, we can modulate these in a species-relevant way, and then study the pathways that are dysregulated. This will allow us to define vulnerable pathways in these preclinical models, and then assess if those same pathways are dysregulated in humans that are experiencing analagous environmental conditions. Armed with the right model, we can then determine how to reverse the physiological derangements induced by environmental stressors.

A new way to understand and predict gene splicing

The KATMAP model, developed by researchers in the Department of Biology, can predict alternative cell splicing, which allows cells to create endless diversity from the same sets of genetic blueprints.

Lillian Eden | Department of Biology
November 4, 2025

Although heart cells and skin cells contain identical instructions for creating proteins encoded in their DNA, they’re able to fill such disparate niches because molecular machinery can cut out and stitch together different segments of those instructions to create endlessly unique combinations.

The ingenuity of using the same genes in different ways is made possible by a process called splicing and is controlled by splicing factors; which splicing factors a cell employs determines what sets of instructions that cell produces, which, in turn, gives rise to proteins that allow cells to fulfill different functions.

In an open-access paper published today in Nature Biotechnology, researchers in the MIT Department of Biology outlined a framework for parsing the complex relationship between sequences and splicing regulation to investigate the regulatory activities of splicing factors, creating models that can be applied to interpret and predict splicing regulation across different cell types, and even different species. Called Knockdown Activity and Target Models from Additive regression Predictions, KATMAP draws on experimental data from disrupting the expression of a splicing factor and information on which sequences the splicing factor interacts with to predict its likely targets.

Aside from the benefits of a better understanding of gene regulation, splicing mutations — either in the gene that is spliced or in the splicing factor itself — can give rise to diseases such as cancer by altering how genes are expressed, leading to the creation or accumulation of faulty or mutated proteins. This information is critical for developing therapeutic treatments for those diseases. The researchers also demonstrated that KATMAP can potentially be used to predict whether synthetic nucleic acids, a promising treatment option for disorders including a subset of muscular atrophy and epilepsy disorders, affect splicing.

Perturbing splicing 

In eukaryotic cells, including our own, splicing occurs after DNA is transcribed to produce an RNA copy of a gene, which contains both coding and non-coding regions of RNA. The noncoding intron regions are removed, and the coding exon segments are spliced back together to make a near-final blueprint, which can then be translated into a protein.

According to first author Michael P. McGurk, a postdoc in the lab of MIT Professor Christopher Burge, previous approaches could provide an average picture of regulation, but could not necessarily predict the regulation of splicing factors at particular exons in particular genes.

KATMAP draws on RNA sequencing data generated from perturbation experiments, which alter the expression level of a regulatory factor by either overexpressing it or knocking down its levels. The consequences of overexpression or knockdown are that the genes regulated by the splicing factor should exhibit different levels of splicing after perturbation, which helps the model identify the splicing factor’s targets.

Cells, however, are complex, interconnected systems, where one small change can cause a cascade of effects. KATMAP is also able to distinguish between direct targets from indirect, downstream impacts by incorporating known information about the sequence the splicing factor is likely to interact with, referred to as a binding site or binding motif.

“In our analyses, we identify predicted targets as exons that have binding sites for this particular factor in the regions where this model thinks they need to be to impact regulation,” McGurk says, while non-targets may be affected by perturbation but don’t have the likely appropriate binding sites nearby.

This is especially helpful for splicing factors that aren’t as well-studied.

“One of our goals with KATMAP was to try to make the model general enough that it can learn what it needs to assume for particular factors, like how similar the binding site has to be to the known motif or how regulatory activity changes with the distance of the binding sites from the splice sites,” McGurk says.

Starting simple

Although predictive models can be very powerful at presenting possible hypotheses, many are considered “black boxes,” meaning the rationale that gives rise to their conclusions is unclear. KATMAP, on the other hand, is an interpretable model that enables researchers to quickly generate hypotheses and interpret splicing patterns in terms of regulatory factors while also understanding how the predictions were made.

“I don’t just want to predict things, I want to explain and understand,” McGurk says. “We set up the model to learn from existing information about splicing and binding, which gives us biologically interpretable parameters.”

The researchers did have to make some simplifying assumptions in order to develop the model. KATMAP considers only one splicing factor at a time, although it is possible for splicing factors to work in concert with one another. The RNA target sequence could also be folded in such a way that the factor wouldn’t be able to access a predicted binding site, so the site is present but not utilized.

“When you try to build up complete pictures of complex phenomena, it’s usually best to start simple,” McGurk says. “A model that only considers one splicing factor at a time is a good starting point.”

David McWaters, another postdoc in the Burge Lab and a co-author on the paper, conducted key experiments to test and validate that aspect of the KATMAP model.

Future directions

The Burge lab is collaborating with researchers at Dana-Farber Cancer Institute to apply KATMAP to the question of how splicing factors are altered in disease contexts, as well as with other researchers at MIT as part of an MIT HEALS grant to model splicing factor changes in stress responses. McGurk also hopes to extend the model to incorporate cooperative regulation for splicing factors that work together.

“We’re still in a very exploratory phase, but I would like to be able to apply these models to try to understand splicing regulation in disease or development. In terms of variation of splicing factors, they are related, and we need to understand both,” McGurk says.

Burge, the Uncas (1923) and Helen Whitaker Professor and senior author of the paper, will continue to work on generalizing this approach to build interpretable models for other aspects of gene regulation.

“We now have a tool that can learn the pattern of activity of a splicing factor from types of data that can be readily generated for any factor of interest,” says Burge, who is also an extra-mural member of the Koch Institute for Integrative Cancer Research and an associate member of the Broad Institute of MIT and Harvard. “As we build up more of these models, we’ll be better able to infer which splicing factors have altered activity in a disease state from transcriptomic data, to help understand which splicing factors are driving pathology.”