New technique reveals how gene transcription is coordinated in cells

By capturing short-lived RNA molecules, scientists can map relationships between genes and the regulatory elements that control them.

Anne Trafton | MIT News
June 5, 2024

The human genome contains about 23,000 genes, but only a fraction of those genes are turned on inside a cell at any given time. The complex network of regulatory elements that controls gene expression includes regions of the genome called enhancers, which are often located far from the genes that they regulate.

This distance can make it difficult to map the complex interactions between genes and enhancers. To overcome that, MIT researchers have invented a new technique that allows them to observe the timing of gene and enhancer activation in a cell. When a gene is turned on around the same time as a particular enhancer, it strongly suggests the enhancer is controlling that gene.

Learning more about which enhancers control which genes, in different types of cells, could help researchers identify potential drug targets for genetic disorders. Genomic studies have identified mutations in many non-protein-coding regions that are linked to a variety of diseases. Could these be unknown enhancers?

“When people start using genetic technology to identify regions of chromosomes that have disease information, most of those sites don’t correspond to genes. We suspect they correspond to these enhancers, which can be quite distant from a promoter, so it’s very important to be able to identify these enhancers,” says Phillip Sharp, an MIT Institute Professor Emeritus and member of MIT’s Koch Institute for Integrative Cancer Research.

Sharp is the senior author of the new study, which appears today in Nature. MIT Research Assistant D.B. Jay Mahat is the lead author of the paper.

Hunting for eRNA

Less than 2 percent of the human genome consists of protein-coding genes. The rest of the genome includes many elements that control when and how those genes are expressed. Enhancers, which are thought to turn genes on by coming into physical contact with gene promoter regions through transiently forming a complex, were discovered about 45 years ago.

More recently, in 2010, researchers discovered that these enhancers are transcribed into RNA molecules, known as enhancer RNA or eRNA. Scientists suspect that this transcription occurs when the enhancers are actively interacting with their target genes. This raised the possibility that measuring eRNA transcription levels could help researchers determine when an enhancer is active, as well as which genes it’s targeting.

“That information is extraordinarily important in understanding how development occurs, and in understanding how cancers change their regulatory programs and activate processes that lead to de-differentiation and metastatic growth,” Mahat says.

However, this kind of mapping has proven difficult to perform because eRNA is produced in very small quantities and does not last long in the cell. Additionally, eRNA lacks a modification known as a poly-A tail, which is the “hook” that most techniques use to pull RNA out of a cell.

One way to capture eRNA is to add a nucleotide to cells that halts transcription when incorporated into RNA. These nucleotides also contain a tag called biotin that can be used to fish the RNA out of a cell. However, this current technique only works on large pools of cells and doesn’t give information about individual cells.

While brainstorming ideas for new ways to capture eRNA, Mahat and Sharp considered using click chemistry, a technique that can be used to join two molecules together if they are each tagged with “click handles” that can react together.

The researchers designed nucleotides labeled with one click handle, and once these nucleotides are incorporated into growing eRNA strands, the strands can be fished out with a tag containing the complementary handle. This allowed the researchers to capture eRNA and then purify, amplify, and sequence it. Some RNA is lost at each step, but Mahat estimates that they can successfully pull out about 10 percent of the eRNA from a given cell.

Using this technique, the researchers obtained a snapshot of the enhancers and genes that are being actively transcribed at a given time in a cell.

“You want to be able to determine, in every cell, the activation of transcription from regulatory elements and from their corresponding gene. And this has to be done in a single cell because that’s where you can detect synchrony or asynchrony between regulatory elements and genes,” Mahat says.

Timing of gene expression

Demonstrating their technique in mouse embryonic stem cells, the researchers found that they could calculate approximately when a particular region starts to be transcribed, based on the length of the RNA strand and the speed of the polymerase (the enzyme responsible for transcription) — that is, how far the polymerase transcribes per second. This allowed them to determine which genes and enhancers were being transcribed around the same time.

The researchers used this approach to determine the timing of the expression of cell cycle genes in more detail than has previously been possible. They were also able to confirm several sets of known gene-enhancer pairs and generated a list of about 50,000 possible enhancer-gene pairs that they can now try to verify.

Learning which enhancers control which genes would prove valuable in developing new treatments for diseases with a genetic basis. Last year, the U.S. Food and Drug Administration approved the first gene therapy treatment for sickle cell anemia, which works by interfering with an enhancer that results in activation of a fetal globin gene, reducing the production of sickled blood cells.

The MIT team is now applying this approach to other types of cells, with a focus on autoimmune diseases. Working with researchers at Boston Children’s Hospital, they are exploring immune cell mutations that have been linked to lupus, many of which are found in non-coding regions of the genome.

“It’s not clear which genes are affected by these mutations, so we are beginning to tease apart the genes these putative enhancers might be regulating, and in what cell types these enhancers are active,” Mahat says. “This is a tool for creating gene-to-enhancer maps, which are fundamental in understanding the biology, and also a foundation for understanding disease.”

The findings of this study also offer evidence for a theory that Sharp has recently developed, along with MIT professors Richard Young and Arup Chakraborty, that gene transcription is controlled by membraneless droplets known as condensates. These condensates are made of large clusters of enzymes and RNA, which Sharp suggests may include eRNA produced at enhancer sites.

“We picture that the communication between an enhancer and a promoter is a condensate-type, transient structure, and RNA is part of that. This is an important piece of work in building the understanding of how RNAs from enhancers could be active,” he says.

The research was funded by the National Cancer Institute, the National Institutes of Health, and the Emerald Foundation Postdoctoral Transition Award.

“Rosetta Stone” of cell signaling could expedite precision cancer medicine

An atlas of human protein kinases enables scientists to map cell signaling pathways with unprecedented speed and detail. Michael Yaffe, the David H. Koch Professor of Science at MIT, the director of the MIT Center for Precision Cancer Medicine, a member of MIT’s Koch Institute for Integrative Cancer Research, and a senior author of the new study published in Nature, is hoping to apply the comprehensive atlas of enzymes that regulate a wide variety of cellular activities to individual patients’ tumors to learn more about how the signaling states differ in cancer cancer, which could reveal new

Megan Scudellari | Koch Institute
June 3, 2024

A newly complete database of human protein kinases and their preferred binding sites provides a powerful new platform to investigate cell signaling pathways.

Culminating 25 years of research, MIT, Harvard University, and Yale University scientists and collaborators have unveiled a comprehensive atlas of human tyrosine kinases — enzymes that regulate a wide variety of cellular activities — and their binding sites.

The addition of tyrosine kinases to a previously published dataset from the same group now completes a free, publicly available atlas of all human kinases and their specific binding sites on proteins, which together orchestrate fundamental cell processes such as growth, cell division, and metabolism.

Now, researchers can use data from mass spectrometry, a common laboratory technique, to identify the kinases involved in normal and dysregulated cell signaling in human tissue, such as during inflammation or cancer progression.

“I am most excited about being able to apply this to individual patients’ tumors and learn about the signaling states of cancer and heterogeneity of that signaling,” says Michael Yaffe, who is the David H. Koch Professor of Science at MIT, the director of the MIT Center for Precision Cancer Medicine, a member of MIT’s Koch Institute for Integrative Cancer Research, and a senior author of the new study. “This could reveal new druggable targets or novel combination therapies.”

The study, published in Nature, is the product of a long-standing collaboration with senior authors Lewis Cantley at Harvard Medical School and Dana-Farber Cancer Institute, Benjamin Turk at Yale School of Medicine, and Jared Johnson at Weill Cornell Medical College.

The paper’s lead authors are Tomer Yaron-Barir at Columbia University Irving Medical Center, and MIT’s Brian Joughin, with contributions from Kontstantin Krismer, Mina Takegami, and Pau Creixell.

Kinase kingdom

Human cells are governed by a network of diverse protein kinases that alter the properties of other proteins by adding or removing chemical compounds called phosphate groups. Phosphate groups are small but powerful: When attached to proteins, they can turn proteins on or off, or even dramatically change their function. Identifying which of the almost 400 human kinases phosphorylate a specific protein at a particular site on the protein was traditionally a lengthy, laborious process.

Beginning in the mid 1990s, the Cantley laboratory developed a method using a library of small peptides to identify the optimal amino acid sequence — called a motif, similar to a scannable barcode — that a kinase targets on its substrate proteins for the addition of a phosphate group. Over the ensuing years, Yaffe, Turk, and Johnson, all of whom spent time as postdocs in the Cantley lab, made seminal advancements in the technique, increasing its throughput, accuracy, and utility.

Johnson led a massive experimental effort exposing batches of kinases to these peptide libraries and observed which kinases phosphorylated which subsets of peptides. In a corresponding Nature paper published in January 2023, the team mapped more than 300 serine/threonine kinases, the other main type of protein kinase, to their motifs. In the current paper, they complete the human “kinome” by successfully mapping 93 tyrosine kinases to their corresponding motifs.

Next, by creating and using advanced computational tools, Yaron-Barir, Krismer, Joughin, Takegami, and Yaffe tested whether the results were predictive of real proteins, and whether the results might reveal unknown signaling events in normal and cancer cells. By analyzing phosphoproteomic data from mass spectrometry to reveal phosphorylation patterns in cells, their atlas accurately predicted tyrosine kinase activity in previously studied cell signaling pathways.

For example, using recently published phosphoproteomic data of human lung cancer cells treated with two targeted drugs, the atlas identified that treatment with erlotinib, a known inhibitor of the protein EGFR, downregulated sites matching a motif for EGFR. Treatment with afatinib, a known HER2 inhibitor, downregulated sites matching the HER2 motif. Unexpectedly, afatinib treatment also upregulated the motif for the tyrosine kinase MET, a finding that helps explain patient data linking MET activity to afatinib drug resistance.

Actionable results

There are two key ways researchers can use the new atlas. First, for a protein of interest that is being phosphorylated, the atlas can be used to narrow down hundreds of kinases to a short list of candidates likely to be involved. “The predictions that come from using this will still need to be validated experimentally, but it’s a huge step forward in making clear predictions that can be tested,” says Yaffe.

Second, the atlas makes phosphoproteomic data more useful and actionable. In the past, researchers might gather phosphoproteomic data from a tissue sample, but it was difficult to know what that data was saying or how to best use it to guide next steps in research. Now, that data can be used to predict which kinases are upregulated or downregulated and therefore which cellular signaling pathways are active or not.

“We now have a new tool now to interpret those large datasets, a Rosetta Stone for phosphoproteomics,” says Yaffe. “It is going to be particularly helpful for turning this type of disease data into actionable items.”

In the context of cancer, phosophoproteomic data from a patient’s tumor biopsy could be used to help doctors quickly identify which kinases and cell signaling pathways are involved in cancer expansion or drug resistance, then use that knowledge to target those pathways with appropriate drug therapy or combination therapy.

Yaffe’s lab and their colleagues at the National Institutes of Health are now using the atlas to seek out new insights into difficult cancers, including appendiceal cancer and neuroendocrine tumors. While many cancers have been shown to have a strong genetic component, such as the genes BRCA1 and BRCA2 in breast cancer, other cancers are not associated with any known genetic cause. “We’re using this atlas to interrogate these tumors that don’t seem to have a clear genetic driver to see if we can identify kinases that are driving cancer progression,” he says.

Biological insights

In addition to completing the human kinase atlas, the team made two biological discoveries in their recent study. First, they identified three main classes of phosphorylation motifs, or barcodes, for tyrosine kinases. The first class is motifs that map to multiple kinases, suggesting that numerous signaling pathways converge to phosphorylate a protein boasting that motif. The second class is motifs with a one-to-one match between motif and kinase, in which only a specific kinase will activate a protein with that motif. This came as a partial surprise, as tyrosine kinases have been thought to have minimal specificity by some in the field.

The final class includes motifs for which there is no clear match to one of the 78 classical tyrosine kinases. This class includes motifs that match to 15 atypical tyrosine kinases known to also phosphorylate serine or threonine residues. “This means that there’s a subset of kinases that we didn’t recognize that are actually playing an important role,” says Yaffe. It also indicates there may be other mechanisms besides motifs alone that affect how a kinase interacts with a protein.

The team also discovered that tyrosine kinase motifs are tightly conserved between humans and the worm species C. elegans, despite the species being separated by more than 600 million years of evolution. In other words, a worm kinase and its human homologue are phosphorylating essentially the same motif. That sequence preservation suggests that tyrosine kinases are highly critical to signaling pathways in all multicellular organisms, and any small change would be harmful to an organism.

The research was funded by the Charles and Marjorie Holloway Foundation, the MIT Center for Precision Cancer Medicine, the Koch Institute Frontier Research Program via L. Scott Ritterbush, the Leukemia and Lymphoma Society, the National Institutes of Health, Cancer Research UK, the Brain Tumour Charity, and the Koch Institute Support (core) grant from the National Cancer Institute.