Machine-learning model helps determine protein structures

New technique reveals many possible conformations that a protein may take.

Anne Trafton | MIT News Office
February 4, 2021

Cryo-electron microscopy (cryo-EM) allows scientists to produce high-resolution, three-dimensional images of tiny molecules such as proteins. This technique works best for imaging proteins that exist in only one conformation, but MIT researchers have now developed a machine-learning algorithm that helps them identify multiple possible structures that a protein can take.

Unlike AI techniques that aim to predict protein structure from sequence data alone, protein structure can also be experimentally determined using cryo-EM, which produces hundreds of thousands, or even millions, of two-dimensional images of protein samples frozen in a thin layer of ice. Computer algorithms then piece together these images, taken from different angles, into a three-dimensional representation of the protein in a process termed reconstruction.

In a Nature Methods paper, the MIT researchers report a new AI-based software for reconstructing multiple structures and motions of the imaged protein — a major goal in the protein science community. Instead of using the traditional representation of protein structure as electron-scattering intensities on a 3D lattice, which is impractical for modeling multiple structures, the researchers introduced a new neural network architecture that can efficiently generate the full ensemble of structures in a single model.

“With the broad representation power of neural networks, we can extract structural information from noisy images and visualize detailed movements of macromolecular machines,” says Ellen Zhong, an MIT graduate student and the lead author of the paper.

With their software, they discovered protein motions from imaging datasets where only a single static 3D structure was originally identified. They also visualized large-scale flexible motions of the spliceosome — a protein complex that coordinates the splicing of the protein coding sequences of transcribed RNA.

“Our idea was to try to use machine-learning techniques to better capture the underlying structural heterogeneity, and to allow us to inspect the variety of structural states that are present in a sample,” says Joseph Davis, the Whitehead Career Development Assistant Professor in MIT’s Department of Biology.

Davis and Bonnie Berger, the Simons Professor of Mathematics at MIT and head of the Computation and Biology group at the Computer Science and Artificial Intelligence Laboratory, are the senior authors of the study, which appears today in Nature Methods. MIT postdoc Tristan Bepler is also an author of the paper.

Visualizing a multistep process

The researchers demonstrated the utility of their new approach by analyzing structures that form during the process of assembling ribosomes — the cell organelles responsible for reading messenger RNA and translating it into proteins. Davis began studying the structure of ribosomes while a postdoc at the Scripps Research Institute. Ribosomes have two major subunits, each of which contains many individual proteins that are assembled in a multistep process.

To study the steps of ribosome assembly in detail, Davis stalled the process at different points and then took electron microscope images of the resulting structures. At some points, blocking assembly resulted in accumulation of just a single structure, suggesting that there is only one way for that step to occur. However, blocking other points resulted in many different structures, suggesting that the assembly could occur in a variety of ways.

Because some of these experiments generated so many different protein structures, traditional cryo-EM reconstruction tools did not work well to determine what those structures were.

“In general, it’s an extremely challenging problem to try to figure out how many states you have when you have a mixture of particles,” Davis says.

After starting his lab at MIT in 2017, he teamed up with Berger to use machine learning to develop a model that can use the two-dimensional images produced by cryo-EM to generate all of the three-dimensional structures found in the original sample.

In the new Nature Methods study, the researchers demonstrated the power of the technique by using it to identify a new ribosomal state that hadn’t been seen before. Previous studies had suggested that as a ribosome is assembled, large structural elements, which are akin to the foundation for a building, form first. Only after this foundation is formed are the “active sites” of the ribosome, which read messenger RNA and synthesize proteins, added to the structure.

In the new study, however, the researchers found that in a very small subset of ribosomes, about 1 percent, a structure that is normally added at the end actually appears before assembly of the foundation. To account for that, Davis hypothesizes that it might be too energetically expensive for cells to ensure that every single ribosome is assembled in the correct order.

“The cells are likely evolved to find a balance between what they can tolerate, which is maybe a small percentage of these types of potentially deleterious structures, and what it would cost to completely remove them from the assembly pathway,” he says.

Viral proteins

The researchers are now using this technique to study the coronavirus spike protein, which is the viral protein that binds to receptors on human cells and allows them to enter cells. The receptor binding domain (RBD) of the spike protein has three subunits, each of which can point either up or down.

“For me, watching the pandemic unfold over the past year has emphasized how important front-line antiviral drugs will be in battling similar viruses, which are likely to emerge in the future. As we start to think about how one might develop small molecule compounds to force all of the RBDs into the ‘down’ state so that they can’t interact with human cells, understanding exactly what the ‘up’ state looks like and how much conformational flexibility there is will be informative for drug design. We hope our new technique can reveal these sorts of structural details,” Davis says.

The research was funded by the National Science Foundation Graduate Research Fellowship Program, the National Institutes of Health, and the MIT Jameel Clinic for Machine Learning and Health. This work was supported by MIT Satori computation cluster hosted at the MGHPCC.

New gene regulation model provides insight into brain development

A well-known protein family binds to many more RNA sequences than previously thought to help neurons grow.

Raleigh McElvery
August 17, 2020

In every cell, RNA-binding proteins (RBPs) help tune gene expression and control biological processes by binding to RNA sequences. Researchers often assume that individual RBPs latch tightly to just one RNA sequence. For instance, an essential family of RBPs, the Rbfox family, was thought to bind one particular RNA sequence alone. However, it’s becoming increasingly clear that this idea greatly oversimplifies Rbfox’s vital role in development.

Members of the Rbfox family are among the best-studied RBPs and have been implicated in mammalian brain, heart, and muscle development since their discovery 25 years ago. They influence how RNA transcripts are “spliced” together to form a final RNA product, and have been associated with disorders like autism and epilepsy. But this family of RBPs is compelling for another reason as well: until recently, it was considered a classic example of predictable binding.

More often than not, it seemed, Rbfox proteins bound to a very specific sequence, or motif, of nucleotide bases, “GCAUG.” Occasionally, binding analyses hinted that Rbfox proteins might attach to other RNA sequences as well, but these findings were usually discarded. Now, a team of biologists from MIT has found that Rbfox proteins actually bind less tightly — but no less frequently — to a handful of other RNA nucleotide sequences besides GCAUG. These so-called “secondary motifs” could be key to normal brain development, and help neurons grow and assume specific roles.

“Previously, possible binding of Rbfox proteins to atypical sites had been largely ignored,” says Christopher Burge, professor of biology and the study’s senior author. “But we’ve helped demonstrate that these secondary motifs form their own separate class of binding sites with important physiological functions.”

Graduate student Bridget Begg is the first author of the study, published on August 17 in Nature Structural & Molecular Biology.

“Two-wave” regulation

After the discovery that GCAUG was the primary RNA binding site for mammalian Rbfox proteins, researchers characterized its binding in living cells using a technique called CLIP (crosslinking-immunoprecipitation). However, CLIP has several limitations. For example, it can indicate where a protein is bound, but not how much protein is bound there. It’s also hampered by some technical biases, including substantial false-negative and false-positive results.

To address these shortcomings, the Burge lab developed two complementary techniques to better quantify protein binding, this time in a test tube: RBNS (RNA Bind-n-Seq), and later, nsRBNS (RNA Bind-n-Seq with natural sequences), both of which incubate an RBP of interest with a synthetic RNA library. First author Begg performed nsRBNS with naturally-occurring mammalian RNA sequences, and identified a variety of intermediate-affinity secondary motifs that were bound in the absence of GCAUG. She then compared her own data with publicly-available CLIP results to examine the “aberrant” binding that had often been discarded, demonstrating that signals for these motifs existed across many CLIP datasets.

To probe the biological role of these motifs, Begg performed reporter assays to show that the motifs could regulate Rbfox’s RNA splicing behavior. Subsequently, computational analyses by Begg and co-author Marvin Jens using mouse neuronal data established a handful of secondary motifs that appeared to be involved in neuronal differentiation and cellular diversification.

Based on analyses of these key secondary motifs, Begg and colleagues devised a “two-wave” model. Early in development, they believe, Rbfox proteins bind predominantly to high-affinity RNA sequences like GCAUG, in order to tune gene expression. Later on, as the Rbfox concentration increases, those primary motifs become fully occupied and Rbfox additionally binds to the secondary motifs. This results in a second wave of Rbfox-regulated RNA splicing with a different set of genes.

Begg theorizes that the first wave of Rbfox proteins binds GCAUG sequences early in development, and she showed that they regulate genes involved in nerve growth, like cytoskeleton and membrane organization. The second wave appears to help neurons establish electrical and chemical signaling. In other cases, secondary motifs might help neurons specialize into different subtypes with different jobs.

John Conboy, a molecular biologist at Lawrence Berkeley National Lab and an expert in Rbfox binding, says the Burge lab’s two-wave model clearly shows how a single RBP can bind different RNA sequences — regulating splicing of distinct gene sets and influencing key processes during brain development. “This quantitative analysis of RNA-protein interactions, in a field that is often semi-quantitative at best, contributes fascinating new insights into the role of RNA splicing in cell type specification,” he says.

A binding spectrum

The researchers suspect that this two-wave model is not unique to Rbfox. “This is probably happening with many different RBPs that regulate development and other dynamic processes,” Burge says. “In the future, considering secondary motifs will help us to better understand developmental disorders and diseases, which can occur when RBPs are over- or under-expressed.”

Begg adds that secondary motifs should be incorporated into computer models that predict gene expression, in order to probe cellular behavior. “I think it’s very exciting that these more finely-tuned developmental processes, like neuronal differentiation, could be regulated by secondary motifs,” she says.

Both Begg and Burge agree it’s time to consider the entire spectrum of Rbfox binding, which are highly influenced by factors like protein concentration, binding strength, and timing. According to Begg, “Rbfox regulation is actually more complex than we sometimes give it credit for.”

Citation:
“Concentration-dependent splicing is enabled by Rbfox motifs of intermediate affinity”
Nature Structural & Molecular Biology, online August 17, 2020, DOI: 10.1038/s41594-020-0475-8
Bridget E. Begg, Marvin Jens, Peter Y. Wang, Christine M. Minor, and Christopher B. Burge

Top illustration: Some RNA-binding proteins like Rbfox (gold ellipses) help tune gene expression and control biological processes by latching onto more RNA sequences (black and gold lines) as their concentration increases (teal shading). Credit: Bridget Begg
Posted: 8.17.20
Bringing RNA into genomics

ENCODE consortium identifies RNA sequences that are involved in regulating gene expression.

Anne Trafton | MIT News Office
July 29, 2020

The human genome contains about 20,000 protein-coding genes, but the coding parts of our genes account for only about 2 percent of the entire genome. For the past two decades, scientists have been trying to find out what the other 98 percent is doing.

A research consortium known as ENCODE (Encyclopedia of DNA Elements) has made significant progress toward that goal, identifying many genome locations that bind to regulatory proteins, helping to control which genes get turned on or off. In a new study that is also part of ENCODE, researchers have now identified many additional sites that code for RNA molecules that are likely to influence gene expression.

These RNA sequences do not get translated into proteins, but act in a variety of ways to control how much protein is made from protein-coding genes. The research team, which includes scientists from MIT and several other institutions, made use of RNA-binding proteins to help them locate and assign possible functions to tens of thousands of sequences of the genome.

“This is the first large-scale functional genomic analysis of RNA-binding proteins with multiple different techniques,” says Christopher Burge, an MIT professor of biology. “With the technologies for studying RNA-binding proteins now approaching the level of those that have been available for studying DNA-binding proteins, we hope to bring RNA function more fully into the genomic world.”

Burge is one of the senior authors of the study, along with Xiang-Dong Fu and Gene Yeo of the University of California at San Diego, Eric Lecuyer of the University of Montreal, and Brenton Graveley of UConn Health.

The lead authors of the study, which appears today in Nature, are Peter Freese, a recent MIT PhD recipient in Computational and Systems Biology; Eric Van Nostrand, Gabriel Pratt, and Rui Xiao of UCSD; Xiaofeng Wang of the University of Montreal; and Xintao Wei of UConn Health.

RNA regulation

Much of the ENCODE project has thus far relied on detecting regulatory sequences of DNA using a technique called ChIP-seq. This technique allows researchers to identify DNA sites that are bound to DNA-binding proteins such as transcription factors, helping to determine the functions of those DNA sequences.

However, Burge points out, this technique won’t detect genomic elements that must be copied into RNA before getting involved in gene regulation. Instead, the RNA team relied on a technique known as eCLIP, which uses ultraviolet light to cross-link RNA molecules with RNA-binding proteins (RBPs) inside cells. Researchers then isolate specific RBPs using antibodies and sequence the RNAs they were bound to.

RBPs have many different functions — some are splicing factors, which help to cut out sections of protein-coding messenger RNA, while others terminate transcription, enhance protein translation, break down RNA after translation, or guide RNA to a specific location in the cell. Determining the RNA sequences that are bound to RBPs can help to reveal information about the function of those RNA molecules.

“RBP binding sites are candidate functional elements in the transcriptome,” Burge says. “However, not all sites of binding have a function, so then you need to complement that with other types of assays to assess function.”

The researchers performed eCLIP on about 150 RBPs and integrated those results with data from another set of experiments in which they knocked down the expression of about 260 RBPs, one at a time, in human cells. They then measured the effects of this knockdown on the RNA molecules that interact with the protein.

Using a technique developed by Burge’s lab, the researchers were also able to narrow down more precisely where the RBPs bind to RNA. This technique, known as RNA Bind-N-Seq, reveals very short sequences, sometimes containing structural motifs such as bulges or hairpins, that RBPs bind to.

Overall, the researchers were able to study about 350 of the 1,500 known human RBPs, using one or more of these techniques per protein. RNA splicing factors often have different activity depending on where they bind in a transcript, for example activating splicing when they bind at one end of an intron and repressing it when they bind the other end. Combining the data from these techniques allowed the researchers to produce an “atlas” of maps describing how each RBP’s activity depends on its binding location.

“Why they activate in one location and repress when they bind to another location is a longstanding puzzle,” Burge says. “But having this set of maps may help researchers to figure out what protein features are associated with each pattern of activity.”

Additionally, Lecuyer’s group at the University of Montreal used green fluorescent protein to tag more than 300 RBPs and pinpoint their locations within cells, such as the nucleus, the cytoplasm, or the mitochondria. This location information can also help scientists to learn more about the functions of each RBP and the RNA it binds to.

“The strength of this manuscript is in the generation of a comprehensive and multilayered dataset that can be used by the biomedical community to develop therapies targeted to specific sites on the genome using genome-editing strategies, or on the transcriptome using antisense oligonucleotides or agents that mediate RNA interference,” says Gil Ast, a professor of human molecular genetics and biochemistry at Tel Aviv University, who was not involved in the research.

Linking RNA and disease

Many research labs around the world are now using these data in an effort to uncover links between some of the RNA sequences identified and human diseases. For many diseases, researchers have identified genetic variants called single nucleotide polymorphisms (SNPs) that are more common in people with a particular disease.

“If those occur in a protein-coding region, you can predict the effects on protein structure and function, which is done all the time. But if they occur in a noncoding region, it’s harder to figure out what they may be doing,” Burge says. “If they hit a noncoding region that we identified as binding to an RBP, and disrupt the RBP’s motif, then we could predict that the SNP may alter the splicing or stability of the gene.”

Burge and his colleagues now plan to use their RNA-based techniques to generate data on additional RNA-binding proteins.

“This work provides a resource that the human genetics community can use to help identify genetic variants that function at the RNA level,” he says.

The research was funded by the National Human Genome Research Institute ENCODE Project, as well as a grant from the Fonds de Recherche de Québec-Santé.

These muscle cells are guideposts to help regenerative flatworms grow back their eyes
Eva Frederick | Whitehead Institute
June 25, 2020

If anything happens to the eyes of the tiny, freshwater-dwelling planarian Schmidtea mediterranea, they can grow them back within just a few days. How they do this is a scientific conundrum — one that Peter Reddien’s lab at Whitehead Institute has been studying for years.

The lab’s latest project offers some insight: in a paper published in Science June 25, researchers in Reddien’s lab have identified a new type of cell that likely serves as a guidepost to help route axons from the eyes to the brain as the worms complete the difficult task of regrowing their neural circuitry.

Schmidtea mediterranea’s eyes are composed of light-capturing photoreceptor neurons connected to the brain with long, spindly processes called axons. They use their eyes to respond to light to help navigate their environment.

The worms, which are popular models for research into regeneration, can regrow pretty much any part of their body; eyes are an interesting part to study because regenerating the visual system requires the worms rewire their neurons to connect them to the brain.

When neural systems develop in embryos, the first nerve fibers, called pioneer axons, snake their way through tissue to form the circuitry needed to perceive and interpret external stimuli. The axons are helped along their way by specialized cells called guidepost cells. These special cells are positioned at choice points — places where the axon’s path could fork in different directions.

In many organisms, these guidepost cells aren’t a priority anymore once development is finished, and typically are not renewed through adulthood. That’s one reason why, when humans experience brain or nerve damage, the injury is usually permanent.

“This is a fundamental mystery of regeneration that we hadn’t even been thinking about,” says Reddien, the senior author of the paper who is also a professor of biology at Massachusetts Institute of Technology and an investigator with the Howard Hughes Medical Institute. “How can an adult animal regenerate a functional nervous system when the original development of the nervous system typically involves a number of cues that are thought to be transient?”

Then, in 2018, Reddien Lab scientist Lucila Scimone found something surprising in adult planarians: groups of mysterious cells that looked like they might play a role in guiding growing axons. She’d noticed this group of cells because they co-expressed two genes not often seen together and some were conspicuously close to the eyes.

“I was captivated by these cells,” she says. They appeared in very small numbers (a normal worm might have around 5; a large one might have up to 10) in every planarian she examined. They were divided into two distinct groups: some around the flatworms’ eyes, and others spaced out along the path to the brain center. When she traced the path of existing axons leading from the planarians’ eyes to their brain, they coincided with the positions of these cells without exception.

When the researchers characterized the cells, they found that they did not express any of the genes that are hallmarks of photoreceptor neurons; instead, they had markers often found in muscle tissue. “That was very striking, because muscle cells — that’s not what they do in most animals,” Scimone says.

In other organisms, guidepost cells are often neurons or glia. It would be unusual for muscle cells to serve as guideposts; but past work in the Reddien Lab had shown that planarian muscle cells played other special roles, such as secreting the extracellular matrix. The researchers now wondered whether they could add the role of guidepost to the long list of planarian muscle cell functions.

To test their hypothesis, the researchers designed a series of experiments. “We developed an eye transplantation method where you can take an eye from an animal and transplant it into another animal,” says Reddien Lab postdoc Kutay Deniz Atabay. “When you do this, the axonal projections from that eye will basically, if positioned appropriately, correctly wire themselves into the brain, producing a functional state.”

The researchers also created genetically engineered planarians that had the muscle cells, but no eyes, and then transplanted eyes onto their eyeless heads. Sure enough, the neurons grew as normal, snaking towards the cells and then adjusting their trajectories after encountering them.

Without the cells, it was a different story. When the researchers transplanted eyes to distant parts of planarians’ bodies without a population of these muscle cells, the photoreceptor neurons did not connect to the brain center. Likewise, when they transplanted eyes into planarians that had been modified to not have these muscle cells, their photoreceptor neurons still grew — but they did not wire properly to reach the brain.

These findings combined suggested that the cells were fully independent of the visual system — they did not form because of eyes or photoreceptor neurons, but likely established themselves before the neurons grew — which provided more evidence for the guidepost role.

The guidepost-like activity of these cells then begged the question: how do the cells themselves know where to be? “We found that there’s a pattern of signaling molecules in muscle that is setting where these cells should be,” Reddien says. “If we perturb the global positional information of the system, these cells get placed in the wrong positions, and then axons go to the wrong positions — so we think there’s a positional information framework that places the cells during regeneration, and that allows them to work as guideposts in the correct locations.”

At this point, the researchers don’t know exactly how the cells are able to communicate with growing axons to serve as guideposts. They could be releasing some sort of signaling molecule that attracts the axons, or they could be communicating by using trans-membrane proteins.

“That will be an exciting direction for the future,” Reddien says. “We have now identified the transcriptome for the cells, which means we know all the genes that these cells express. That provides us with an intriguing list of genes that can be probed functionally, to try to see which ones are mediating the functions of these cells.”

This study is a step forward in a body of work that aims to expand the capabilities of regenerative medicine. “Imagine a scenario where someone experiences a spinal cord injury or an eye injury or stroke that leads to the loss of a neural circuit,” says Atabay. “The reason we can’t fully cure these cases today is that we lack fundamental information regarding how these systems can regenerate. Looking at regenerative organisms provides a lot of insights. From this case, we see that regenerating the lost system may not be enough; you may also need to regenerate systems that are properly patterning that system.”

***

Written by Eva Frederick

***

Scimone, M. L. et al. “Muscle and neuronal guidepost-like cells facilitate planarian visual system regeneration.” Science, June 25, 2020.

Jonathan Weissman

Education

  • PhD, 1993, MIT
  • AB, 1988, Physics, Harvard

Research Summary

We study how cells ensure that proteins fold into their correct shape, as well as the role of protein misfolding in disease and normal physiology. We also build innovative tools for broadly exploring organizational principles of biological systems. These include ribosome profiling, which globally monitors protein translation, CRIPSRi/a for controlling the expression of human genes and rewiring the epigenome, and lineage tracing tools, to record the history of cells.

Awards

  • Ira Herskowitz Award, Genetic Society of America, 2020
  • European Molecular Biology Organization, Member, 2017
  • National Academy of Sciences Award for Scientific Discovery, 2015
  • American Academy of Microbiology, Fellow, 2010
  • National Academy of Sciences, Member, 2009
  • Raymond and Beverly Sackler International Prize in Biophysics, Tel Aviv University, 2008
  • Protein Society Irving Sigal Young Investigator’s Award, 2004
  • Howard Hughes Medical Institute, Assistant Investigator, 2000
  • Searle Scholars Program Fellowship, 1997
  • David and Lucile Packard Fellowship, 1996
Genetic study takes research on sex differences to new heights

Differences in male and female gene expression, including those contributing to height differences, found throughout the body in humans and other mammals.

Greta Friar | Whitehead Institute
July 19, 2019

Throughout the animal kingdom, males and females frequently exhibit sexual dimorphism: differences in characteristic traits that often make it easy to tell them apart. In mammals, one of the most common sex-biased traits is size, with males typically being larger than females. This is true in humans: Men are, on average, taller than women. However, biological differences among males and females aren’t limited to physical traits like height. They’re also common in disease. For example, women are much more likely to develop autoimmune diseases, while men are more likely to develop cardiovascular diseases.

In spite of the widespread nature of these sex biases, and their significant implications for medical research and treatment, little is known about the underlying biology that causes sex differences in characteristic traits or disease. In order to address this gap in understanding, Whitehead Institute Director David Page has transformed the focus of his lab in recent years from studying the X and Y sex chromosomes to working to understand the broader biology of sex differences throughout the body. In a paper published in Science, Page, a professor of biology at MIT and a Howard Hughes Medical Institute investigator; Sahin Naqvi, first author and former MIT graduate student (now a postdoc at Stanford University); and colleagues present the results of a wide-ranging investigation into sex biases in gene expression, revealing differences in the levels at which particular genes are expressed in males versus females.

The researchers’ findings span 12 tissue types in five species of mammals, including humans, and led to the discovery that a combination of sex-biased genes accounts for approximately 12 percent of the average height difference between men and women. This finding demonstrates a functional role for sex-biased gene expression in contributing to sex differences. The researchers also found that the majority of sex biases in gene expression are not shared between mammalian species, suggesting that — in some cases — sex-biased gene expression that can contribute to disease may differ between humans and the animals used as models in medical research.

Having the same gene expressed at different levels in each sex is one way to perpetuate sex differences in traits in spite of the genetic similarity of males and females within a species — since with the exception of the 46th chromosome (the Y in males or the second X in females), the sexes share the same pool of genes. For example, if a tall parent passes on a gene associated with an increase in height to both a son and a daughter, but the gene has male-biased expression, then that gene will be more highly expressed in the son, and so may contribute more height to the son than the daughter.

The researchers searched for sex-biased genes in tissues across the body in humans, macaques, mice, rats, and dogs, and they found hundreds of examples in every tissue. They used height for their first demonstration of the contribution of sex-biased gene expression to sex differences in traits because height is an easy-to-measure and heavily studied trait in quantitative genetics.

“Discovering contributions of sex-biased gene expression to height is exciting because identifying the determinants of height is a classic, century-old problem, and yet by looking at sex differences in this new way we were able to provide new insights,” Page says. “My hope is that we and other researchers can repeat this model to similarly gain new insights into diseases that show sex bias.”

Because height is so well studied, the researchers had access to public data on the identity of hundreds of genes that affect height. Naqvi decided to see how many of those height genes appeared in the researchers’ new dataset of sex-biased genes, and whether the genes’ sex biases corresponded to the expected effects on height. He found that sex-biased gene expression contributed approximately 1.6 centimeters to the average height difference between men and women, or 12 percent of the overall observed difference.

The scope of the researchers’ findings goes beyond height, however. Their database contains thousands of sex-biased genes. Slightly less than a quarter of the sex-biased genes that they catalogued appear to have evolved that sex bias in an early mammalian ancestor, and to have maintained that sex bias today in at least four of the five species studied. The majority of the genes appear to have evolved their sex biases more recently, and are specific to either one species or a certain lineage, such as rodents or primates.

Whether or not a sex-biased gene is shared across species is a particularly important consideration for medical and pharmaceutical research using animal models. For example, previous research identified certain genetic variants that increase the risk of Type 2 diabetes specifically in women; however, the same variants increase the risk of Type 2 diabetes indiscriminately in male and female mice. Therefore, mice would not be a good model to study the genetic basis of this sex difference in humans. Even when the animal appears to have the same sex difference in disease as humans, the specific sex-biased genes involved might be different. Based on their finding that most sex bias is not shared between species, Page and colleagues urge researchers to use caution when picking an animal model to study sex differences at the level of gene expression.

“We’re not saying to avoid animal models in sex-differences research, only not to take for granted that the sex-biased gene expression behind a trait or disease observed in an animal will be the same as that in humans. Now that researchers have species and tissue-specific data available to them, we hope they will use it to inform their interpretation of results from animal models,” Naqvi says.

The researchers have also begun to explore what exactly causes sex-biased expression of genes not found on the sex chromosomes. Naqvi discovered a mechanism by which sex-biased expression may be enabled: through sex-biased transcription factors, proteins that help to regulate gene expression. Transcription factors bind to specific DNA sequences called motifs, and he found that certain sex-biased genes had the motif for a sex-biased transcription factor in their promoter regions, the sections of DNA that turn on gene expression. This means that, for example, a male-biased transcription factor was selectively binding to the promoter region for, and so increasing the expression of, male-biased genes — and likewise for female-biased transcription factors and female-biased genes. The question of what regulates the transcription factors remains for further study — but all sex differences are ultimately controlled by either the sex chromosomes or sex hormones.

The researchers see the collective findings of this paper as a foundation for future sex-differences research.

“We’re beginning to build the infrastructure for a systematic understanding of sex biases throughout the body,” Page says. “We hope these datasets are used for further research, and we hope this work gives people a greater appreciation of the need for, and value of, research into the molecular differences in male and female biology.”

This work was supported by Biogen, Whitehead Institute, National Institutes of Health, Howard Hughes Medical Institute, and generous gifts from Brit and Alexander d’Arbeloff and Arthur W. and Carol Tobin Brill.

Junk DNA makes a comeback

Third-year graduate student Emma Kowal is searching DNA for sequences that regulate gene expression.

Saima Sidik
July 8, 2019

“I went into science because of a certain obsession with the romance of it,” says Emma Kowal, a third-year graduate student in Chris Burge’s lab in the MIT Department of Biology. “I loved the idea of the scientist as an adventurer exploring the frontiers of knowledge and the universe. And I haven’t let go of that yet.”

Kowal has always been an avid science fiction reader, and now she’s living out a real-life scientific odyssey. The quest she’s taken on for her PhD research involves an understudied type of DNA sequence called an intron, and the roles that introns might play in regulating gene expression.

Introns lie between the DNA sequences that cells use for protein production, and are initially incorporated into the messenger RNA, or mRNA, that cells produce as an intermediate step in synthesizing proteins from DNA. But before they complete protein synthesis, cells remove introns from mRNA through a process called splicing, which has led many people to view introns as junk DNA with splicing acting like a garbage disposal.

“Introns appeal to me as the underdog genomic region,” Kowal says. Although they’re often seen as unimportant, introns are ubiquitous and plentiful, collectively making up 24% of the human genome. All eukaryotes have them, and, on average, each human gene encodes eight. Many researchers, including Kowal, think that introns have been underestimated, and that they may play an important role in regulating gene expression.

Introns are only the latest chapter in Kowal’s RNA story. She began her research career as a Harvard University undergraduate student working in the Szostak Lab at Massachusetts General Hospital, where she studied how RNA catalyzed the evolution of cells on the early earth. Although studying primordial life was intellectually stimulating, Kowal wanted to work on something more applied, and so she joined the Church Lab in the Harvard Department of Genetics. There she developed methods for purifying and imaging enigmatic RNA-containing lipid compartments called extracellular vesicles, which cells release into their surrounding environments possibly to communicate with one another.

For the sequel to her bachelor’s degree, Kowal chose to attend MIT Biology because she’d heard that, “at MIT, everyone is one standard deviation nerdier, on average, than they are at other schools.” In this sense, she has not been disappointed. Kowal calls the energy at MIT “unparalleled,” and she says, “people are jazzed about what they’re doing, and the whole campus reflects that.”

In some ways, these reflections are physical. Much of the artwork around MIT pays homage to major scientific discoveries, and Kowal says this reverence for science is one factor that attracted her to MIT. From the mural of DNA in the Biology Department to the golden neurons that descend alongside the staircase in the McGovern Institute for Brain Research, it’s as if the community is saying, “look at how awesome the universe is!” as Kowal puts it.

In other ways, this energy is reflected in the people she converses with daily. “I really like the students here,” Kowal says. “Everyone is enthusiastic, but also down to earth.” When she’s not exploring the realms of science, Kowal sometimes has more fanciful adventures with the Dungeons and Dragons group that she’s formed with some of her classmates.

Kowal didn’t necessarily intend to continue working on RNA at MIT Biology, but when she heard about Chris Burge’s lab, which focuses on RNA and the proteins that mediate its production and stability, she felt a call to action.

The Burge Lab combines high throughput experimental techniques with bioinformatics, and Kowal wants to develop expertise in both these fields. “If you’re skilled in generating and analyzing big data sets, you can ask questions that other people can’t,” she says. The Burge lab seemed like the perfect setting for her PhD.

Over and over, scientists have noticed that cells produce more protein from genes that contain introns than when those same introns are removed. Intron mediated enhancement (IME), as this effect is called, is a “stunningly broad phenomenon,” Kowal says, and scientists have observed it in a wide range of organisms, from yeast to plants to humans.Burge asks his students to begin their degrees with a month-long reading period during which they sift through the literature to find a topic that they want to study. “You’re not allowed to pick up a pipette or do any analysis during your reading period,” Kowal says. “You just read and discuss your ideas and let things percolate.” As she read, Kowal came across a number of studies that discussed the influence that introns have on gene expression levels.

Splicing machinery, which removes introns from mRNA, likely plays a role in IME. This machinery binds mRNA as it’s being produced from DNA, then interacts with, and influences, the RNA production machinery. However, researchers have created mutant introns that can’t be recognized by splicing machinery, and sometimes these introns still enhance gene expression, so splicing isn’t the only factor that drives IME. Moreover, replacing one intron with another of the same size containing a different DNA sequence can change its effect, implying that the exact DNA sequences within introns may dictate their effects on gene expression. Kowal is intrigued by this last point, and wants to find these intronic sequences and figure out which have the largest effects on gene expression and why.

“This is an old mystery that’s ripe for new tools,” Kowal says. Over the last decade, researchers have begun using a technique called RNAseq to count the copies of mRNA that are made from each gene in a population of cells. Instead of replacing an intron with a single alternative DNA sequence, Kowal plans to replace an intron with a myriad of random DNA sequences, then use RNAseq to count how many copies of mRNA cells make when they encode each of these random introns.

Preparing to test these random sequences has been an odyssey in and of itself, and Kowal has spent the last year building the system that she’ll use. First, she needed to decide which intron to replace. She chose one from a gene called UbC. Removing this intron reduces expression of UbC by ten-fold.

Besides contributing strongly to IME, the UbC intron is a great candidate for Kowal’s experiment because it lies in a regulatory region of the UbC mRNA that precedes the portion that’s translated into protein. This let her replace the UbC protein coding region with a fluorescent protein that she’ll use to visualize how much protein cells make when they encode each random intron sequence.

Kowal has spent the last year meticulously incorporating a library of random introns into this synthetic version of the UbC gene. She anticipates being able to introduce them into cells soon, to see which random introns result in the highest levels of mRNA and protein production. Thanks to RNAseq, she’ll be able to monitor how much each random intron contributes to mRNA expression. Because she can measure how brightly the fluorescent protein glows, she can correlate these mRNA levels with protein levels. From this, she’ll learn which intron sequences enhance gene expression most strongly, and she’ll also know whether these introns lead to higher levels of mRNA production, or if the same amount of mRNA is made into more protein. This distinction will offer her clues about the mechanism that introns use to enhance gene expression.

Once Kowal knows which intron sequences promote gene expression most effectively, she’ll take advantage of the Burge lab’s bioinformatics expertise to analyze the distribution of these sequences throughout genomes and predict how they affect global gene expression. Kowal suspects certain intron sequences are bound by proteins that mediate mRNA production and stability, and she thinks her work will identify these protein-intron pairs.

Kowal balances her scientific adventures with outdoor adventures. Specifically, she’s recently fallen in love with rock climbing. “Climbing is a great counterpart to science because it’s something you can chip away at, and then there’s this huge satisfaction when you finally achieve a climb,” she says. “And also, between climbing and pipetting, I have really strong fingers.”

As for her love of science fiction, Kowal hopes to one day pen a science-based adventure of her own, but not before she’s made her mark as a scientist, either as a professor or in industry. ”It makes sense for me to focus most of my energy on science right now,” she says. “But after I’ve led a spectacular, adventurous life in science, maybe I’ll use my reflections to write a novel.”

Posted 7.8.19
A chemical approach to imaging cells from the inside

Researchers develop a new microscopy system for creating maps of cells, using chemical reactions to encode spatial information.

Karen Zusi | Broad Institute
June 14, 2019

The following press release was issued today by the Broad Institute of MIT and Harvard.

A team of researchers at the McGovern Institute and Broad Institute of MIT and Harvard has developed a new technique for mapping cells. The approach, called DNA microscopy, shows how biomolecules such as DNA and RNA are organized in cells and tissues, revealing spatial and molecular information that is not easily accessible through other microscopy methods. DNA microscopy also does not require specialized equipment, enabling large numbers of samples to be processed simultaneously.

“DNA microscopy is an entirely new way of visualizing cells that captures both spatial and genetic information simultaneously from a single specimen,” says first author Joshua Weinstein, a postdoctoral associate at the Broad Institute. “It will allow us to see how genetically unique cells — those comprising the immune system, cancer, or the gut, for instance — interact with one another and give rise to complex multicellular life.”

The new technique is described in Cell. Aviv Regev, core institute member and director of the Klarman Cell Observatory at the Broad Institute and professor of biology at MIT, and Feng Zhang, core institute member of the Broad Institute, investigator at the McGovern Institute for Brain Research at MIT, and the James and Patricia Poitras Professor of Neuroscience at MIT, are co-authors. Regev and Zhang are also Howard Hughes Medical Institute Investigators.

The evolution of biological imaging

In recent decades, researchers have developed tools to collect molecular information from tissue samples, data that cannot be captured by either light or electron microscopes. However, attempts to couple this molecular information with spatial data — to see how it is naturally arranged in a sample — are often machinery-intensive, with limited scalability.

DNA microscopy takes a new approach to combining molecular information with spatial data, using DNA itself as a tool.

To visualize a tissue sample, researchers first add small synthetic DNA tags, which latch on to molecules of genetic material inside cells. The tags are then replicated, diffusing in “clouds” across cells and chemically reacting with each other, further combining and creating more unique DNA labels. The labeled biomolecules are collected, sequenced, and computationally decoded to reconstruct their relative positions and a physical image of the sample.

The interactions between these DNA tags enable researchers to calculate the locations of the different molecules — somewhat analogous to cell phone towers triangulating the locations of different cell phones in their vicinity. Because the process only requires standard lab tools, it is efficient and scalable.

In this study, the authors demonstrate the ability to molecularly map the locations of individual human cancer cells in a sample by tagging RNA molecules. DNA microscopy could be used to map any group of molecules that will interact with the synthetic DNA tags, including cellular genomes, RNA, or proteins with DNA-labeled antibodies, according to the team.

“DNA microscopy gives us microscopic information without a microscope-defined coordinate system,” says Weinstein. “We’ve used DNA in a way that’s mathematically similar to photons in light microscopy. This allows us to visualize biology as cells see it and not as the human eye does. We’re excited to use this tool in expanding our understanding of genetic and molecular complexity.”

Funding for this study was provided by the Simons Foundation, Klarman Cell Observatory, NIH (R01HG009276, 1R01- HG009761, 1R01- MH110049, and 1DP1-HL141201), New York Stem Cell Foundation, Simons Foundation, Paul G. Allen Family Foundation, Vallee Foundation, the Poitras Center for Affective Disorders Research at MIT, the Hock E. Tan and K. Lisa Yang Center for Autism Research at MIT, J. and P. Poitras, and R. Metcalfe. 

The authors have applied for a patent on this technology.

Pulin Li

Education

  • PhD, 2012, Chemical Biology, Harvard University
  • BS, 2006, Life Sciences, Peking University

Research Summary

We are curious about how circuits of interacting genes in individual cells enable multicellular functions, such as self-organizing into structured tissues. To address this question, we analyze genetic circuits in natural systems, combining quantitative measurements and mathematical modeling. In parallel, we test the sufficiency of the circuits and understand their design principles by multi-scale reconstitution, from genes to circuits to multicellular behavior, using synthetic biology and bioengineering tools. Together, we aim to provide both a quantitative understanding of embryonic development and new ways to engineer tissues.

Awards

  • New Innovator Award, National Institutes of Health Common Fund’s High-Risk, High-Reward Research Program, 2021
  • R.R. Bensley Award in Cell Biology, American Association for Anatomy, 2021
  • Santa Cruz Developmental Biology Young Investigator Award, 2016
  • NIH Pathway to Independence Award K99/R00 (NICHD), 2016
  • American Cancer Society Postdoctoral Fellowship, 2015
In search of nature’s winning recipe

Graduate student Darren Parker aims to understand the ratio of ingredients that constitutes the optimal cell.

Raleigh McElvery
May 31, 2019

Fifth-year graduate student Darren Parker is as much a baker as he is a biologist — at least metaphorically speaking. He’s on a mission to understand the ratio of ingredients required to concoct nature’s winning recipe for the optimal cell. Researchers have a solid understanding of which components are essential for cellular function, but they have yet to determine whether it’s critical for cells to generate exactly the right amount of protein.

“In that way, my graduate work is actually pretty simple,” Parker says. “I just want to know if changing the amounts of a specific ingredient has an effect on the overall product.”

The oldest of four brothers, Parker grew up in a suburb just outside of Chicago. When he enrolled in the University of Illinois Urbana-Champaign for his undergraduate studies in 2009, he was considering a major in environmental science. “My high school biology classes were mostly rote memorization,” he explains, “so the molecular aspects just didn’t resonate with me. I was more interested in studying life on a larger scale.”

After his first year, he entered the Integrated Biology program, which essentially “encompassed all biology that wasn’t molecular biology.” He was still required to take an introductory molecular biology course, though, as part of the major. But this time around, something clicked.

He remembers performing his first genetic knockdown experiment, decreasing the level of dopamine receptors in roundworms and witnessing the behavioral ramifications in real time. “I finally had a handle on the molecular concepts enough to really get what was going on,” he says.

He attributed his newfound appreciation for the basic mechanisms underlying life to his fortified chemistry skills. At the beginning of his third year, he officially declared a biochemistry major, and joined a lab in the Department of Chemistry studying nucleic acid enzymes.

Parker’s job was to sift through trillions of short DNA strands, selecting only those that could act like enzymes and cut RNA. He would then home in on the nucleotide sequences within those strands that were best suited to carry out the reaction. After a year-and-a-half, he’d successfully identified a few DNA sequences that could cut RNA molecules with a distinct chemistry. After this point he was excited to try “studying life” as opposed to synthetic reactions.

Mid-way through his fourth year, he joined a biology lab in the College of Medicine probing alternative splicing in liver and heart development. It was a new group with only a few members, and Parker had more experience as an undergraduate than some of the first-year graduate students, so he hit the ground running. His last-minute switch to biochemistry meant he had five years of studies instead of the usual four — totaling six full semesters (and several summers) in lab.

After identifying a key splicing protein required for the liver to fully mature in mice and humans, Parker became even more fascinated by molecular biology and determined to pursue a career in science with bigger picture applications.

At the urging of his advisor, Parker sent in his graduate school applications. He was primarily interested in microbiology and infectious disease research — although he had no prior experience working in bacteria, only a longstanding interest in the intersection of science and society. He ultimately chose MIT Biology because of the breadth of labs. He could join a microbiology lab, or pursue an entirely different path, all within the same department. Gene-Wei Li’s lab seemed like “the perfect mix.”

“Gene was asking questions in molecular biology from the unique perspective of a physicist, looking at biological questions in a way I had never even considered before,” he says. “Gene had also just joined the department and wasn’t tied to a specific field or model organism yet, so I had the chance to build my own projects from the bottom up; I wasn’t just slotting in somewhere.”

Best of all, the Li lab was all about drilling down into to the mechanics of protein production in order to understand the cell as a whole — the bigger picture perspective that Parker was longing for.

Parker began by exploring ways to modify high-throughput RNA sequencing. He aimed to make this popular method cheaper and more scalable, in hopes of knocking out many individual genes in E. coli to test the genome-wide effects. He then pivoted his project and applied his new technique to study the effects of reducing essential genes in B. subtilis, another model bacterium. The family of enzymes that was the most interesting to him from these experiments were the aminoacyl-tRNA synthetases.

tRNAs, or transfer RNAs, carry amino acids to the ribosome so that the cell can produce proteins. This process requires the help of enzymes — tRNA synthetases — to “charge” the tRNAs with an amino acid. Only then can the ribosome transfer the amino acid from the tRNAs to the growing chain of amino acids that eventually forms the protein. Like an inquisitive baker, Parker wanted to know what would happen if he added more or less tRNA synthetase to the recipe of a bacterial cell.

His results would make Goldilocks proud. Over the past few years, he’s shown that too much or too little tRNA synthetase prevents the cell from growing at a normal rate. The amount must be just right.

“It turns out that what’s most important to the cell is maintaining the ratios of those very conserved ingredients,” Parker says. “The cell will actively use less of those ingredients if the synthetase is limiting, and this leads to a much slower growing cell. Adding too much tRNA synthetase is just a waste because the cell already has as much as it needs to sustain translation.”

This same family of tRNA synthetase proteins, he adds, are also implicated in some neurological diseases in humans, which gives him further impetus to study them.

At this point, Parker has taste-tested his fair share of biological areas, and he’s found his niche. “It was a long process,” he says. “That was probably best, though, because it gave me more time to explore.”

Once he graduates, he plans to go into industry, perhaps continuing to tweak the list of ingredients in order to engineer cells to do new things.

“The next time you’re in the kitchen and you want to add more or less of your favorite ingredient,” he urges, “just think about how the cell might feel if you did so with your favorite gene.”

Photo credit: Raleigh McElvery
Posted 5.30.19