A new lens on autism’s sex bias

A perspective from the lab of Whitehead Institute Member David Page, published in Nature Genetics, proposes a genetic explanation for the female protective effect and suggests that biological differences between males and females contribute to autism’s strong sex bias.

Shafaq Zia | Whitehead Institute
March 30, 2026

Autism has a significant and enduring sex bias, with roughly four boys diagnosed for every girl. For many years, experts have believed this disparity arises primarily from diagnostic inequities because much of autism research — and the screening tools that grew out of it — has historically focused on boys, effectively setting a male standard for what autism “looks like.” As a result, girls and women are more likely to be overlooked, misdiagnosed, or diagnosed much later in life.

This disparity has also shaped the science around autism. When fewer females with the condition are identified, fewer are included in research studies, creating a feedback loop where scientific understanding of autism in females remains limited. Because of this underrepresentation of females, it has been difficult for scientists to disentangle how much of the sex bias in autism reflects social inequities versus underlying biological differences between the sexes.

While the search for biological explanations has largely lagged behind, one leading theory, known as the “female protective effect,” proposes that females may be biologically buffered against developing autism in a way males aren’t.

The idea can be traced back to studies showing that females diagnosed with autism tend to carry a higher number of genetic mutations or “hits” than males with the condition, meaning that they require a higher load of the same genetic mutations for autism to manifest. But, until now, there’s been little clarity on the exact biological mechanism behind this apparent resilience.

Now, a perspective from the lab of Whitehead Institute Member David Page, published March 30 in Nature Geneticsproposes a genetic explanation for the female protective effect and suggests that biological differences between males and females contribute to autism’s strong sex bias.

The work is one of many projects from the Page lab uncovering the biological underpinnings of sex bias in everything from heart health and autoimmune disease to certain cancers.

“The fact that we see sex biases in disease all across the body gives credence to the notion that the sex bias in autism isn’t simply emerging from diagnostic inequities and gendered expectations of what the conditions looks like,” says Page, who is also a professor of biology at Massachusetts Institute of Technology and an investigator at the Howard Hughes Medical Institute (HHMI).

The researchers propose that this protective effect extends beyond autism, and could help explain why 17 other congenital and developmental disorders predominantly affect males. By characterizing the biological factors that make one sex more or less likely to develop certain health conditions, scientists see an opportunity to improve how these conditions are diagnosed and how people receive care.

Page and Harvard-MIT MD-PhD student Maya Talukdar trace the female protective effect to the X chromosome. Talukdar is a graduate student in Page’s lab and the lead author of the perspective.

Most females have two X chromosomes (XX) while most males have one X and one Y chromosome (XY). Sex chromosomes can dial up and down the expression of thousands of genes on the other 22 pairs of chromosomes in a cell, impacting cell function across the entire body.

Historically, scientists believed that the second X chromosome in females is largely inactive. But, in recent years, research out of the Page lab has shown that the so-called “inactive X,” also called Xi, plays a crucial role in regulating gene expression on the active X chromosome, and the rest of the chromosomes.

In this perspective, the researchers point to a subset of genes that are expressed from both the active and inactive X chromosome — often known as genes that “escape” X chromosome inactivation. Many of these genes are dosage-sensitive regulators of key cellular processes. These processes influence thousands of other genes across the genome, including many linked to autism.

Because females have an extra copy of these regulatory genes expressed from Xi, Page and Talukdar propose that they may be better able to buffer the effects of autism-associated mutations than males.

The female protective effect beyond autism

This mechanism, the researchers say, extends beyond autism to a range of congenital and developmental diseases with a male bias.

“Many of the other congenital or developmental conditions we’re pointing to aren’t subject to diagnostic inequities in the way autism is,” says Talukdar. “This strengthens the idea that the female protective effect is emerging from genetic differences in males and females.”

One example is pyloric stenosis, which like autism, affects four boys for every girl. Infants with the condition experience severe vomiting due to thickening of the pyloric sphincter, the passage between the stomach and small intestine. As with autism, girls with pyloric stenosis appear to require more genetic “hits” in order to develop the condition.

The researchers’ new framework of looking at Xi to understand sex differences in disease could impact treatment and care not just for conditions that predominately affect males, but also for those that are more common in women, such as autoimmune diseases.

“Our biology isn’t one-size-fits-all,” Talukdar says “Sex differences clearly play a huge role in health, and it’s so important that we understand them.”

Maya Talukdar, David C. Page, “The inactive X chromosome as a female protector in autism and beyond,” Nature Genetics, 2026; https://doi.org/10.1038/s41588-026-02534-w 

Study reveals “two-factor authentication” system that controls microRNA destruction

A new study led by researchers in the Bartel Lab and Germany’s Max Planck Institute of Biochemistry reveals how cells selectively eliminate certain microRNAs, which tune which genes are active and when, through an unexpectedly intricate molecular recognition system.

Mackenzie White | Whitehead Institute
March 17, 2026

Cells rely on tiny molecules called microRNAs to tune which genes are active and when. Cells must carefully control the lifespan of microRNAs to prevent widespread disruption to gene regulation.

A new study led by researchers at Whitehead Institute and Germany’s Max Planck Institute of Biochemistry reveals how cells selectively eliminate certain microRNAs through an unexpectedly intricate molecular recognition system. The work, published on March 18 in Nature, shows that the process requires two separate RNA signals, similar to how many digital systems require two forms of identity verification before granting access.

The findings explain how cells use this “two-factor authentication” system to ensure that only intended microRNAs are destroyed, leaving the rest of the gene regulation machinery in operation.

MicroRNAs are short strands of RNA that help control gene expression. Working together with a protein called Argonaute, they bind to specific messenger RNAs—the molecules that carry genetic instructions from DNA to the cell’s protein-making machinery—and trigger their destruction. In this way, microRNAs can reduce the production of specific proteins.

While scientists recognized that microRNAs could be destroyed through a pathway known as target-directed microRNA degradation, or TDMD, the details of how cells recognized which microRNAs to eliminate remained unclear.

“We knew there was a pathway that could target microRNAs for degradation, but the biochemical mechanism behind it wasn’t understood,” says David Bartel, Whitehead Institute Member and co-senior author of the study.

Earlier work from Bartel’s lab and others had identified a key player in this pathway: the ZSWIM8 E3 ubiquitin ligase. E3 ubiquitin ligases are involved in the cell’s recycling system and attach a small molecular tag called ubiquitin to other proteins, marking them for destruction.

The researchers first showed that the ZSWIM8 E3 ligase specifically binds and tags Argonaute, the protein that holds microRNAs and helps regulate genes. The researchers’ next challenge was to understand how this machinery recognized only Argonaute complexes carrying specific microRNAs that should be degraded.

The answer turned out to be surprisingly sophisticated.

Using a combination of biochemistry and cryo-electron microscopy—an imaging technique that reveals molecular structures at near-atomic resolution—the researchers discovered that the degradation system relies on a dual-RNA recognition process. First, Argonaute must carry a specific microRNA. Second, another RNA molecule called a “trigger RNA” must bind to that microRNA in a particular way.

The degradation machinery activates only when both signals are present.

This dual requirement ensures exquisite specificity. Each cell contains over a hundred thousand Argonaute–microRNA complexes regulating many genes, and destroying them indiscriminately would disrupt essential biological processes.

“The vast majority of Argonaute molecules in the cell are doing useful work regulating gene expression,” says Bartel, who is also a professor of biology at MIT and an HHMI Investigator. “You only want to degrade the ones carrying a particular microRNA and bound to the right trigger RNA. Without that specificity, the cell would lose its microRNAs and the essential regulation that they provide.”

The structural images revealed complex molecular interactions. The ZSWIM8 ligase detects multiple structural changes that occur when the two RNAs bind together within the Argonaute protein.

“When we saw the structure, everything clicked,” says Elena Slobodyanyuk, a graduate student in Bartel’s lab and co-first author of the study. “You could see how the pairing of the trigger RNA with the microRNA reshapes the Argonaute complex in a way that the ligase can recognize.”

Beyond explaining how TDMD works, the findings may impact how scientists think about the regulation of RNA molecules more broadly.

“A lot of E3 ligases recognize their targets through simpler signals,” says Jakob Farnung, co-first author and researcher in the Department of Molecular Machines and Signaling at the Max Planck Institute of Biochemistry. “It was like opening a treasure chest where every detail revealed something new and mesmerizing.”

MicroRNAs typically persist in cells for much longer time periods than most messenger RNAs, but some degrade far more quickly, and the TDMD pathway appears to account for many of these unusually short-lived microRNAs.

The researchers are now investigating whether other RNAs can trigger similar degradation pathways and whether additional microRNAs are regulated through variations of the mechanism shown in this study.

“This opens up a whole new way of thinking about how RNA molecules can control protein degradation,” says Brenda Schulman, study co-senior author and Director of the Department of Molecular Machines and Signaling at the Max Planck Institute of Biochemistry. “Here, the recognition was far more elaborate than expected. There’s likely much more left to discover.”

Uncovering the details of this intricate regulatory system required interdisciplinary collaboration, combining expertise in RNA biochemistry, structural biology, and ubiquitin enzymology to solve this long-standing molecular puzzle.

“This was a project that required the strengths of two labs working at the forefront of their fields,” says Schulman, who is also an alum of Whitehead Institute. “It was an incredible team effort.”

Paper: Jakob Farnung, Elena Slobodyanyuk, Peter Y. Wang, Lianne W. Blodgett, Daniel H. Lin, Susanne von Gronau, Brenda A. Schulman & David P. Bartel. “The E3 ubiquitin ligase mechanism specifying targeted microRNA degradation.” Nature (2026). https://doi.org/10.1038/s41586-026-10232-0

CryoPRISM: A new tool for observing cellular machinery in a more natural environment

The method allows researchers to observe biomolecular complexes in a quick, accurate, and budget-friendly way, providing new insights into bacterial protein synthesis.

Ekaterina Khalizeva | Department of Biology
March 20, 2026

The blobfish, once considered the ugliest animal in the world, has since had quite the redemption arc. Years after it was first discovered, scientists realized that the deep-sea creature appeared so unnervingly blobby only because it went through an extreme change in pressure when it was brought up to the surface. In its natural environment, 4,000 feet underwater, the fish looks perfectly handsome.

Structural biologists, whose goal is to deduce a molecule’s structure and function within a cell, face the risk of making a similar mistake. If biomolecular complexes are extracted from the cell, better-quality images can be obtained, but the molecules may not look natural. On the other hand, studying molecules without disrupting their environment at all is technically challenging, like filming deep underwater.

A new method, called purification-free ribosome imaging from subcellular mixtures (cryoPRISM), offers an appealing compromise. Developed by graduate students Mira May and Gabriela López-Pérez in the Davis lab in the MIT Department of Biology and recently published in PNAS, the technique allows biologists to visualize molecular complexes without taking them too far out of their natural context.

CryoPRISM captures molecular structures in cells that have just been broken open. This comes as close to preserving the natural interactions between molecules as possible, short of the extremely resource-intensive in-cell structural imaging, according to associate professor of biology Joey Davis, the faculty lead of the study.

“We think that the cryoPRISM method is a sweet spot where we preserve much of the native cellular contacts, but still have the resolution that lets us actually see molecular details,” Davis says. “Even in the extremely well-trodden system of translation in E. coli, which people have worked on for over 50 years, we are still finding new states that had just escaped people’s attention.”

A negative control that was not so negative

The development of cryoPRISM, as many discoveries in science, resulted from an unexpected observation that Mira May, the co-first author of the study, made while working on a different project.

Like all living organisms, bacteria rely on a process called translation to manufacture the proteins that carry out essential functions within the cell, from copying DNA to digesting nutrients. A key machine involved in translation is the ribosome — a biomolecular complex that assembles proteins based on instructions encoded by another molecule called mRNA. To regulate its activity, cells employ additional proteins that can change the shape of the ribosome, thus guiding its function.

May sought to identify new players in ribosomal regulation using cryoEM, by rapidly freezing lots of purified molecules and collecting thousands of 2D images to reconstruct their 3D structures. May was trying to pull ribosomes out of cells to visualize them together with their regulators. For her experiments, she designed a negative control containing unpurified bacterial lysate — a mixture of everything spilled from burst cells.

May expected to get noisy, low-quality images from this sample. To her surprise, instead, she saw intact ribosomes together with their natural interacting partners.

In just a few days, this technique experimentally validated data that would have taken months to acquire using other approaches.

“As I found more and more ribosomal states, this project became a method, not just a one-off finding,” May recalls.

Discovering new biology in a saturated field

Once May and her colleagues were confident that cryoPRISM could detect known ribosomal states, they began searching for ones that had previously escaped detection.

“It’s not just that we can recapitulate things that have been previously observed, but we can actually also discover novel ribosomal biology,” May says.

One of the novel states May identified has important implications for our understanding of the evolution of translation regulation.

During active translation, bacterial ribosomes are accompanied by a group of helper proteins called elongation factors. These factors bring in the materials for protein synthesis, like tRNAs and amino acids.

When cells encounter unfavorable conditions, such as colder temperatures, they reduce translation, which means that many ribosomes are out of work. These idle, hibernating ribosomes stop decoding mRNA, and the interface where they usually interact with helper molecules gets blocked by a hibernation factor called RaiA. This protein helps idle ribosomes avoid reactivation, like a sleeping mask that prevents a person from being woken up by light.

May observed the idle ribosomal state in her data, which on its own did not surprise her – this state had been described before. What surprised her was that some inactive ribosomes were interacting not only with RaiA, but also with an elongation factor called EF-G, which in bacteria was previously believed to only interact with active ribosomes.

A similar phenomenon has been seen before in more complex organisms, but observing it in a microbe suggests that its evolutionary origin may be older than previously thought.

“It fits an emerging model in the field, that elongation factors might bind to hibernating ribosomes to protect both the ribosome and themselves from degradation during periods of stress,” May explains. “Think of it like short-term storage.”

An unstressed cell might quickly eliminate unneeded inactive ribosomes, but because any stressor that puts ribosomes to sleep could be temporary, the cell may prefer to hold off on destroying them. That way, the ribosomes can be quickly reactivated if conditions improve.

The future of cryoPRISM

May has already teamed up with other MIT researchers to use cryoPRISM to visualize ribosomes in cells that are notoriously difficult to work with, including pathogenic organisms, which can be challenging to culture at the scale required for particle purification, and red blood cells isolated from patients, which cannot be cultured at all.

Besides its immediate application for translation research, cryoPRISM is a stepping stone toward the broader goal of structural biology: studying biomolecules in their natural environment.

To truly learn about deep-sea fish, scientists need to look at them in the deep sea; and to learn about cellular machines, scientists need to look at them in cells. According to Davis, cryoPRISM perfectly fits into the “theme of structural biology moving closer and closer to cellular context.”

Whitehead Institute Member Jonathan Weissman joins global Cancer Grand Challenges team

Whitehead Institute Member Jonathan Weissman has been named to a newly funded Cancer Grand Challenges team that will tackle one of the most elusive frontiers in cancer biology: the “dark proteome.”

Mackenzie White | Whitehead Institute
March 4, 2026

Whitehead Institute Member Jonathan Weissman has been named to a newly funded Cancer Grand Challenges team that will tackle one of the most elusive frontiers in cancer biology: the “dark proteome.”

The interdisciplinary team, ILLUMINE, will receive up to $25 million over approximately five years through Cancer Grand Challenges to investigate proteins expressed by cancer cells that don’t correspond exactly to known genes. These include proteins produced from previously unrecognized regions of the genome, proteins created from offset start sites of known genes, and proteins with altered amino acid sequences that cannot be explained by known DNA mutations. The origins and functions of this dark proteome remain largely unknown.

Cancer Grand Challenges is a global research initiative co-founded in 2020 by Cancer Research UK and the National Cancer Institute (part of the National Institutes of Health) in the United States. The initiative supports a global community of diverse, world-class research teams to come together, think differently, and take on some of cancer’s toughest challenges.

The ILLUMINE team is led by Reuven Agami of the Netherlands Cancer Institute and brings together clinicians, advocates, and scientists across eight institutions in four countries. The team is funded by Cancer Research UK, the National Cancer Institute, the Cancer Research Institute, and KiKa (Children Cancer Free Foundation) through Cancer Grand Challenges. It is one of five new teams announced this year, representing a total investment of $125 million.

Weissman, also a professor of biology at MIT and an investigator of the Howard Hughes Medical Institute, studies how proteins are produced and folded inside cells, and how disruptions in these processes contribute to disease. His laboratory developed ribosome profiling, a technique that reveals which parts of the genome are actively being translated into proteins inside cells.

This method is directly relevant to the dark proteome challenge. If cancer cells generate proteins from unexpected regions of the genome, understanding when and how those proteins are made is critical. Weissman’s lab continues to refine tools that measure protein production at scale, helping to illuminate these hidden products and their potential role in cancer.

By comprehensively identifying and characterizing the dark proteome, the ILLUMINE team aims to uncover novel, potentially universal tumor antigens — cancer cell molecules that are recognizable by the immune system — and develop innovative immunotherapies for hard-to-treat cancers.

Collectively, the newly funded teams unite researchers from nine countries and 34 institutions, bringing together more than 40 investigators to address long-standing barriers in cancer research.

Dr. David Scott, Director of Cancer Grand Challenges, said of the initiative: “Together, we’re creating opportunities for bold team science that could redefine what’s possible for people affected by cancer.”

3 Questions with new faculty member Matthew G. Jones: Building predictive models to characterize tumor progression

The assistant professor hopes to decode molecular processes on the genetic, epigenetic, and microenvironment levels to anticipate how and when tumors evolve to resist treatment.

Lillian Eden | Department of Biology
March 10, 2026

Just as Darwin’s finches evolved in response to natural selection in order to endure, the cells that make up a cancerous tumor similarly counter selective pressures in order to survive, evolve, and spread. Tumors are, in fact, complex sets of cells with their own unique structure and ability to change. 

Today, artificial Intelligence and machine learning tools offer an unparalleled opportunity to illuminate the generalizable rules governing tumor progression on the genetic, epigenetic, metabolic, and microenvironmental levels. 

Matthew G. Jones, an Assistant Professor in the Department of Biology at MIT, the Koch Institute for Integrative Cancer Research, and the Institute for Medical Engineering and Science, hopes to use computational approaches to build predictive models — to play a game of chess with cancer, making sense of a tumor’s ability to evolve and resist treatment with the ultimate goal of improving patient outcomes. 

Q: What aspect of tumor progression are you hoping to explore and characterize? 

A: A very common story with cancer is that patients will respond to a therapy at first, and then eventually that treatment will stop working. The reason this largely happens is that tumors have an incredible, and very challenging, ability to evolve: the ability to change their genetic makeup, protein signaling composition, and cellular dynamics. The tumor as a system also evolves at a structural level. Oftentimes, the reason why a patient succumbs to a tumor is because either the tumor has evolved to a state we can no longer control, or it evolves in an unpredictable manner. 

In many ways, cancers can be thought of as, on the one hand, incredibly dysregulated and disorganized, and on the other hand, as having their own internal logic, which is constantly changing. The central thesis of my lab is that tumors follow stereotypical patterns in space and time, and we’re hoping to use computation and experimental technology to decode the molecular processes underlying these transformations.  

We’re focused on one specific way tumors are evolving through a form of DNA amplification called extrachromosomal DNA. Excised from the chromosome, these ecDNAs are circularized and exist as their own separate pool of DNA particles in the nucleus. 

Initially discovered in the 1960s, ecDNA were thought to be a rare event in cancer. However, as researchers began applying next-generation sequencing to large patient cohorts in the 2010s, it seemed like not only were these ecDNA amplifications conferring the ability of tumors to adapt to stresses, and therapies, faster, but that they were far more prevalent than initially thought.

We now know these ecDNA amplifications are apparent in about 25% of cancers, in the most aggressive cancers: brain, lung, and ovarian cancers. We have found that, for a variety of reasons, ecDNA amplifications are able to change the rule book by which tumors evolve in ways that allow them to accelerate to a more aggressive disease in very surprising ways. 

Q: How are you planning to use machine learning and artificial intelligence to study ecDNA amplifications and tumor evolution? 

A: There’s a mandate to translate what I’m doing in the lab to improve patients’ lives. I want to start with patient data to discover how various evolutionary pressures are driving disease and the mutations we observe. 

One of the tools we use to study tumor evolution is single-cell lineage tracing technologies. Broadly, they allow us to study the lineages of individual cells. When we sample a particular cell, not only do we know what that cell looks like, but we can, ideally, pinpoint exactly when aggressive mutations appeared in the tumor’s history. That evolutionary history gives us a way of studying these dynamic processes that we otherwise wouldn’t be able to observe in real time and helps us make sense of how we might be able to intercept that evolution. 

I hope we’re going to get better at stratifying patients who will respond to certain drugs, to anticipate and overcome drug resistance, and to identify new therapeutic targets.

Q: What excites you about joining this community, and what sorts of trainees are you hoping to recruit to your lab? 

A: One of the things that I was really attracted to was the integration of excellence in both engineering and biological sciences. At the Koch Institute, every floor is structured to promote this interface between engineers and basic scientists, and beyond campus, we can connect with all the biomedical research enterprises in the Greater Boston Area. 

Another thing that drew me to MIT was the fact that it places such a strong emphasis on education, training, and investing in student success. I’m a personal believer that what distinguishes academic research from industry research is that academic research is fundamentally a service job, in that we are training the next generation of scientists. 

It was always a mission of mine to bring excellence to both computational and experimental technology disciplines. The types of trainees I’m hoping to recruit are those who are eager to collaborate and solve big problems that require both disciplines. The KI is uniquely set up for this type of hybrid lab: my dry lab is right next to my wet lab, and it’s a source of collaboration and connection, and that reflects the KI’s general vision. 

New insights into a hidden process that protects cells from harmful mutations

To make up for the loss of an important gene's function, cells are known to ramp up activity of other genes with similar functions. New research from the Weissman Lab reveals insights into how cells coordinate this response.

Shafaq Zia | Whitehead Institute
February 12, 2026

Some genetic mutations that are expected to completely stop a gene from working surprisingly cause only mild or even no symptoms. Researchers in previous studies have discovered one reason why: cells can ramp up the activity of other genes that perform similar functions to make up for the loss of an important gene’s function. A new study, published Feb. 12 in the journal Science, from the lab of Whitehead Institute Member Jonathan Weissman now reveals insights into how cells can coordinate this compensation response.

Cells are constantly reading instructions stored in DNA. These instructions, called genes, tell them how to make the many proteins that carry out complex processes needed to sustain life. But first, they need to make a temporary copy of these genetic instructions called messenger RNA, or mRNA.

As part of normal maintenance, cells routinely break down these temporary messages. This process helps control gene activity — or how much protein is made from a given gene — and ensures that old or unnecessary messages don’t accumulate. Cells also destroy faulty mRNAs that contain errors. These messages, if used, could produce damaged proteins that clump together and interfere with normal cellular processes.

In 2019, external studies suggested that this cleanup could be serving as more than just a quality-control check. The researchers showed that when faulty mRNAs are broken down, this breakdown can signal cells to activate the compensation response. These works also suggested that cells decide which backup genes to turn up based on how closely these genes resemble the mRNA that’s being degraded.

But mRNA decay is a process that happens in the cytoplasm, outside the nucleus where DNA, and thereby genes, are stored. So, Mohamed El-Brolosy, a postdoc in the Weissman Lab and lead author of the study, and colleagues wondered how those two processes in different compartments of the cell could be connected. Understanding this mechanism with greater depth could enable development of therapeutics that trigger it in a targeted fashion.

The researchers started by investigating a specific gene that scientists know triggers a compensation response when its mRNA is destroyed by causing a closely related gene to become more active. To find out which molecules within the cell aid this process, the researchers systematically switched other genes off, one at a time.

That’s when they found a protein called ILF3. When the gene encoding this protein was turned off, cells could no longer ramp up the activity of the backup gene following mRNA decay.

Upon further investigation, the researchers identified small RNA fragments — left behind when faulty mRNAs are destroyed — underlying this response. These fragments contain a special sequence that acts like an “address”. The team proposed that this address guides ILF3 to related backup genes that share the same sequence as the faulty mRNA.

In fact, when they introduced mutations in this sequence, the cells’ compensation response dropped, suggesting that the system relies on precise sequence matching to target the correct backup genes.

“That was very exciting for us,” says Weissman, who is also a professor of biology at Massachusetts Institute of Technology and an investigator at the Howard Hughes Medical Institute (HHMI). “It showed us that this isn’t a generic stress response. It’s a regulated system.”

The researchers’ findings point toward new therapeutic possibilities, where boosting the activity of a related gene could mitigate symptoms of certain genetic diseases. More broadly, their work characterizes a mysterious layer of gene regulation.

El-Brolosy, M. A., et al. (2026). Mechanisms linking cytoplasmic decay of translation-defective mRNA to transcriptional adaptation. Science, 391, eaea1272. https://doi.org/10.1126/science.aea1272

3 Questions with new faculty member Yunha Hwang: Using computation to study the world’s best single-celled chemists

The assistant professor utilizes microbial genomes to examine the language of biology. Her appointment reflects MIT’s commitment to exploring the intersection of genetics research and AI.

Lillian Eden | Department of Biology
December 15, 2025

Today, out of an estimated 1 trillion species on Earth, 99.999 percent are considered microbial — bacteria, archaea, viruses, and single-celled eukaryotes. For much of our planet’s history, microbes ruled the Earth, able to live and thrive in the most extreme of environments. Researchers have only just begun in the last few decades to contend with the diversity of microbes — it’s estimated that less than 1 percent of known genes have laboratory-validated functions. Computational approaches offer researchers the opportunity to strategically parse this truly astounding amount of information.

An environmental microbiologist and computer scientist by training, new MIT faculty member Yunha Hwang is interested in the novel biology revealed by the most diverse and prolific life form on Earth. In a shared faculty position as the Samuel A. Goldblith Career Development Professor in the Department of Biology, as well as an assistant professor at the Department of Electrical Engineering and Computer Science and the MIT Schwarzman College of Computing, Hwang is exploring the intersection of computation and biology.  

Q: What drew you to research microbes in extreme environments, and what are the challenges in studying them?

A: Extreme environments are great places to look for interesting biology. I wanted to be an astronaut growing up, and the closest thing to astrobiology is examining extreme environments on Earth. And the only thing that lives in those extreme environments are microbes. During a sampling expedition that I took part in off the coast of Mexico, we discovered a colorful microbial mat about 2 kilometers underwater that flourished because the bacteria breathed sulfur instead of oxygen — but none of the microbes I was hoping to study would grow in the lab.

The biggest challenge in studying microbes is that a majority of them cannot be cultivated, which means that the only way to study their biology is through a method called metagenomics. My latest work is genomic language modeling. We’re hoping to develop a computational system so we can probe the organism as much as possible “in silico,” just using sequence data. A genomic language model is technically a large language model, except the language is DNA as opposed to human language. It’s trained in a similar way, just in biological language as opposed to English or French. If our objective is to learn the language of biology, we should leverage the diversity of microbial genomes. Even though we have a lot of data, and even as more samples become available, we’ve just scratched the surface of microbial diversity.

Q: Given how diverse microbes are and how little we understand about them, how can studying microbes in silico, using genomic language modeling, advance our understanding of the microbial genome?

A: A genome is many millions of letters. A human cannot possibly look at that and make sense of it. We can program a machine, though, to segment data into pieces that are useful. That’s sort of how bioinformatics works with a single genome. But if you’re looking at a gram of soil, which can contain thousands of unique genomes, that’s just too much data to work with — a human and a computer together are necessary in order to grapple with that data.

During my PhD and master’s degree, we were only just discovering new genomes and new lineages that were so different from anything that had been characterized or grown in the lab. These were things that we just called “microbial dark matter.” When there are a lot of uncharacterized things, that’s where machine learning can be really useful, because we’re just looking for patterns — but that’s not the end goal. What we hope to do is to map these patterns to evolutionary relationships between each genome, each microbe, and each instance of life.

Previously, we’ve been thinking about proteins as a standalone entity — that gets us to a decent degree of information because proteins are related by homology, and therefore things that are evolutionarily related might have a similar function.

What is known about microbiology is that proteins are encoded into genomes, and the context in which that protein is bounded — what regions come before and after — is evolutionarily conserved, especially if there is a functional coupling. This makes total sense because when you have three proteins that need to be expressed together because they form a unit, then you might want them located right next to each other.

What I want to do is incorporate more of that genomic context in the way that we search for and annotate proteins and understand protein function, so that we can go beyond sequence or structural similarity to add contextual information to how we understand proteins and hypothesize about their functions.

Q: How can your research be applied to harnessing the functional potential of microbes?

A: Microbes are possibly the world’s best chemists. Leveraging microbial metabolism and biochemistry will lead to more sustainable and more efficient methods for producing new materials, new therapeutics, and new types of polymers.

But it’s not just about efficiency — microbes are doing chemistry we don’t even know how to think about. Understanding how microbes work, and being able to understand their genomic makeup and their functional capacity, will also be really important as we think about how our world and climate are changing. A majority of carbon sequestration and nutrient cycling is undertaken by microbes; if we don’t understand how a given microbe is able to fix nitrogen or carbon, then we will face difficulties in modeling the nutrient fluxes of the Earth.

On the more therapeutic side, infectious diseases are a real and growing threat. Understanding how microbes behave in diverse environments relative to the rest of our microbiome is really important as we think about the future and combating microbial pathogens.

A new way to understand and predict gene splicing

The KATMAP model, developed by researchers in the Department of Biology, can predict alternative cell splicing, which allows cells to create endless diversity from the same sets of genetic blueprints.

Lillian Eden | Department of Biology
November 4, 2025

Although heart cells and skin cells contain identical instructions for creating proteins encoded in their DNA, they’re able to fill such disparate niches because molecular machinery can cut out and stitch together different segments of those instructions to create endlessly unique combinations.

The ingenuity of using the same genes in different ways is made possible by a process called splicing and is controlled by splicing factors; which splicing factors a cell employs determines what sets of instructions that cell produces, which, in turn, gives rise to proteins that allow cells to fulfill different functions.

In an open-access paper published today in Nature Biotechnology, researchers in the MIT Department of Biology outlined a framework for parsing the complex relationship between sequences and splicing regulation to investigate the regulatory activities of splicing factors, creating models that can be applied to interpret and predict splicing regulation across different cell types, and even different species. Called Knockdown Activity and Target Models from Additive regression Predictions, KATMAP draws on experimental data from disrupting the expression of a splicing factor and information on which sequences the splicing factor interacts with to predict its likely targets.

Aside from the benefits of a better understanding of gene regulation, splicing mutations — either in the gene that is spliced or in the splicing factor itself — can give rise to diseases such as cancer by altering how genes are expressed, leading to the creation or accumulation of faulty or mutated proteins. This information is critical for developing therapeutic treatments for those diseases. The researchers also demonstrated that KATMAP can potentially be used to predict whether synthetic nucleic acids, a promising treatment option for disorders including a subset of muscular atrophy and epilepsy disorders, affect splicing.

Perturbing splicing 

In eukaryotic cells, including our own, splicing occurs after DNA is transcribed to produce an RNA copy of a gene, which contains both coding and non-coding regions of RNA. The noncoding intron regions are removed, and the coding exon segments are spliced back together to make a near-final blueprint, which can then be translated into a protein.

According to first author Michael P. McGurk, a postdoc in the lab of MIT Professor Christopher Burge, previous approaches could provide an average picture of regulation, but could not necessarily predict the regulation of splicing factors at particular exons in particular genes.

KATMAP draws on RNA sequencing data generated from perturbation experiments, which alter the expression level of a regulatory factor by either overexpressing it or knocking down its levels. The consequences of overexpression or knockdown are that the genes regulated by the splicing factor should exhibit different levels of splicing after perturbation, which helps the model identify the splicing factor’s targets.

Cells, however, are complex, interconnected systems, where one small change can cause a cascade of effects. KATMAP is also able to distinguish between direct targets from indirect, downstream impacts by incorporating known information about the sequence the splicing factor is likely to interact with, referred to as a binding site or binding motif.

“In our analyses, we identify predicted targets as exons that have binding sites for this particular factor in the regions where this model thinks they need to be to impact regulation,” McGurk says, while non-targets may be affected by perturbation but don’t have the likely appropriate binding sites nearby.

This is especially helpful for splicing factors that aren’t as well-studied.

“One of our goals with KATMAP was to try to make the model general enough that it can learn what it needs to assume for particular factors, like how similar the binding site has to be to the known motif or how regulatory activity changes with the distance of the binding sites from the splice sites,” McGurk says.

Starting simple

Although predictive models can be very powerful at presenting possible hypotheses, many are considered “black boxes,” meaning the rationale that gives rise to their conclusions is unclear. KATMAP, on the other hand, is an interpretable model that enables researchers to quickly generate hypotheses and interpret splicing patterns in terms of regulatory factors while also understanding how the predictions were made.

“I don’t just want to predict things, I want to explain and understand,” McGurk says. “We set up the model to learn from existing information about splicing and binding, which gives us biologically interpretable parameters.”

The researchers did have to make some simplifying assumptions in order to develop the model. KATMAP considers only one splicing factor at a time, although it is possible for splicing factors to work in concert with one another. The RNA target sequence could also be folded in such a way that the factor wouldn’t be able to access a predicted binding site, so the site is present but not utilized.

“When you try to build up complete pictures of complex phenomena, it’s usually best to start simple,” McGurk says. “A model that only considers one splicing factor at a time is a good starting point.”

David McWaters, another postdoc in the Burge Lab and a co-author on the paper, conducted key experiments to test and validate that aspect of the KATMAP model.

Future directions

The Burge lab is collaborating with researchers at Dana-Farber Cancer Institute to apply KATMAP to the question of how splicing factors are altered in disease contexts, as well as with other researchers at MIT as part of an MIT HEALS grant to model splicing factor changes in stress responses. McGurk also hopes to extend the model to incorporate cooperative regulation for splicing factors that work together.

“We’re still in a very exploratory phase, but I would like to be able to apply these models to try to understand splicing regulation in disease or development. In terms of variation of splicing factors, they are related, and we need to understand both,” McGurk says.

Burge, the Uncas (1923) and Helen Whitaker Professor and senior author of the paper, will continue to work on generalizing this approach to build interpretable models for other aspects of gene regulation.

“We now have a tool that can learn the pattern of activity of a splicing factor from types of data that can be readily generated for any factor of interest,” says Burge, who is also an extra-mural member of the Koch Institute for Integrative Cancer Research and an associate member of the Broad Institute of MIT and Harvard. “As we build up more of these models, we’ll be better able to infer which splicing factors have altered activity in a disease state from transcriptomic data, to help understand which splicing factors are driving pathology.”

Matthew G. Jones

Education

  • Graduate: University of California, San Francisco, 2022
  • Undergraduate: Computer Science; University of California, Berkeley, 2017

Research Summary

From the moment that a tumor is born, it is evolving across several levels, including at the genetic, epigenetic, metabolic, and microenvironmental levels. The central goal of the Jones Lab is to develop innovative computational and technological approaches to uncover the mechanisms of tumor evolution, with the ultimate aim of identifying new therapeutic targets and creating predictive models to monitor tumor initiation and progression.

Currently, the lab’s research focuses on three interrelated goals: (1) investigating the molecular mechanisms underlying the spatiotemporal dynamics of copy-number alterations (particularly extrachromosomal DNA) in cancer populations; (2) developing new computational methods to trace cellular lineages; and (3) elucidating the principles by which tumors are organized over time. To pursue these aims, the lab integrates advances in computation and AI with cutting-edge multi-omic approaches (including single-cell, spatial, and long-read technologies), lineage tracing, and high-resolution imaging. Broadly, they expect that their studies will reveal generalizable rules governing tumor progression and treatment resistance, enable the predictive modeling of tumors, and inspire new approaches to intercept tumor progression.

Awards

  • Keynote Speaker at Cancer Genetics and Epigenetics Gordon Research Seminar, 2025
  • Cancer Grand Challenges Future Leaders Conference Best Talk Awardee, 2024
  • NCI K99/R00 Early-Career Pathway to Independence Award, 2024
  • UCSF Discovery Fellow, 2019
Locally produced proteins help mitochondria function

One of the ways that cells ensure proteins end up where they're needed is creating them at that location, through a process called localized translation. New research from the Weissman Lab has expanded our understanding localized translation at mitochondria and sheds light on the organizational principles of genes and the proteins they encode.

Greta Friar | Whitehead Institute
August 27, 2025

Now, Weissman, who is also a professor of biology at the Massachusetts Institute of Technology and an HHMI Investigator, and postdoc in his lab Jingchuan Luo have expanded our knowledge of localized translation at mitochondria, structures that generate energy for the cell. In a paper published in Cell on August 27, they share a new tool, LOCL-TL, for studying localized translation in close detail, and describe the discoveries it enabled about two classes of proteins that are locally translated at mitochondria.

The importance of localized translation at mitochondria relates to their unusual origin. Mitochondria were once bacteria that lived within our ancestors’ cells. Over time the bacteria lost their autonomy and became part of the larger cells, which included migrating most of their genes into the larger cell’s genome in the nucleus. Cells evolved processes to ensure that proteins needed by mitochondria that are encoded in genes in the larger cell’s genome get transported to the mitochondria. Mitochondria retain a few genes in their own genome, so production of proteins from the mitochondrial genome and that of the larger cell’s genome must be coordinated to avoid mismatched production of mitochondrial parts. Localized translation may help cells to manage the interplay between mitochondrial and nuclear protein production—among other purposes.

How to detect local protein production

For a protein to be made, genetic code stored in DNA is read into RNA, and then the RNA is read or translated by a ribosome, a cellular machine that builds a protein according to the RNA code. Weissman’s lab previously developed a method to study localized translation by tagging ribosomes near a structure of interest, and then capturing the tagged ribosomes in action and observing the proteins they are making. This approach, called proximity-specific ribosome profiling, allows researchers to see what proteins are being made where in the cell. The challenge that Luo faced was how to tweak this method to capture only ribosomes at work near mitochondria.

Ribosomes work quickly, so a ribosome that gets tagged while making a protein at the mitochondria can move on to making other proteins elsewhere in the cell in a matter of minutes. The only way researchers can guarantee that the ribosomes they capture are still working on proteins made near the mitochondria is if the experiment happens very quickly.

Weissman and colleagues had previously solved this time sensitivity problem in yeast cells with a ribosome-tagging tool called BirA that is activated by the presence of the molecule biotin. BirA is fused to the cellular structure of interest, and tags ribosomes it can touch—but only once activated. Researchers keep the cell depleted of biotin until they are ready to capture the ribosomes, to limit the time when tagging occurs. However, this approach does not work with mitochondria in mammalian cells because they need biotin to function normally, so it cannot be depleted.

Luo and Weissman adapted the existing tool to respond to blue light instead of biotin. The new tool, LOV-BirA, is fused to the mitochondria’s outer membrane. Cells are kept in the dark until the researchers are ready. Then they expose the cells to blue light, activating LOV-BirA to tag ribosomes. They give it a few minutes and then quickly extract the ribosomes. This approach proved very accurate at capturing only ribosomes working at mitochondria.

The researchers then used a method originally developed by the Weissman lab to extract the sections of RNA inside of the ribosomes. This allows them to see exactly how far along in the process of making a protein the ribosome is when captured, which can reveal whether the entire protein is made at the mitochondria, or whether it is partly produced elsewhere and only gets completed at the mitochondria.

“One advantage of our tool is the granularity it provides,” Luo says. “Being able to see what section of the protein is locally translated helps us understand more about how localized translation is regulated, which can then allow us to understand its dysregulation in disease and to control localized translation in future studies.”

Two protein groups are made at mitochondria

Using these approaches, the researchers found that about twenty percent of the genes needed in mitochondria that are located in the main cellular genome are locally translated at mitochondria. These proteins can be divided into two distinct groups with different evolutionary histories and mechanisms for localized translation.

One group consists of relatively long proteins, each containing more than 400 amino acids or protein building blocks. These proteins tend to be of bacterial origin—present in the ancestor of mitochondria—and they are locally translated in both mammalian and yeast cells, suggesting that their localized translation has been maintained through a long evolutionary history.

Like many mitochondrial proteins encoded in the nucleus, these proteins contain a mitochondrial targeting sequence (MTS), a zip code that tells the cell where to bring them. The researchers discovered that most proteins containing an MTS also contain a nearby inhibitory sequence that prevents transportation until they are done being made. This group of locally translated proteins lacks the inhibitory sequence, so they are brought to the mitochondria during their production.

Production of these longer proteins begins anywhere in the cell, and then after approximately the first 250 amino acids are made, they get transported to the mitochondria. While the rest of the protein gets made, it is simultaneously fed into a channel that brings it inside the mitochondria. This ties up the channel for a long time, limiting import of other proteins, so cells can only afford to do this simultaneous production and import for select proteins. The researchers hypothesize that these bacterial-origin proteins are given priority as an ancient mechanism to ensure that they are accurately produced and placed within mitochondria.

The second locally translated group consists of short proteins, each less than 200 amino acids long. These proteins are more recently evolved, and correspondingly, the researchers found that the mechanism for their localized translation is not shared by yeast. Their mitochondrial recruitment happens at the RNA level. Two sequences within regulatory sections of each RNA molecule that do not encode the final protein instead code for the cell’s machinery to recruit the RNAs to the mitochondria.

The researchers searched for molecules that might be involved in this recruitment, and identified the RNA binding protein AKAP1, which exists at mitochondria. When they eliminated AKAP1, the short proteins were translated indiscriminately around the cell. This provided an opportunity to learn more about the effects of localized translation, by seeing what happens in its absence. When the short proteins were not locally translated, this led to the loss of various mitochondrial proteins, including those involved in oxidative phosphorylation, our cells’ main energy generation pathway.

In future research, Weissman and Luo will delve deeper into how localized translation affects mitochondrial function and dysfunction in disease. The researchers also intend to use LOCL-TL to study localized translation in other cellular processes, including in relation to embryonic development, neural plasticity, and disease.

“This approach should be broadly applicable to different cellular structures and cell types, providing many opportunities to understand how localized translation contributes to biological processes,” Weissman says. “We’re particularly interested in what we can learn about the roles it may play in diseases including neurodegeneration, cardiovascular diseases, and cancers.”

Luo et al. “Proximity-specific ribosome profiling reveals the logic of localized mitochondrial translation.” Cell, August 27, 2025. https://doi.org/10.1016/j.cell.2025.08.002