Revising the textbook on introns

Whitehead Institute researchers uncover a group of introns in yeast that possess surprising stability and function.

Nicole Davis | Whitehead Institute
January 16, 2019

A research team from Whitehead Institute has uncovered a surprising and previously unrecognized role for introns, the parts of genes that lack the instructions for making proteins and are typically cut away and rapidly destroyed. Through studies of baker’s yeast, the researchers identified a highly unusual group of introns that linger and accumulate, in their fully intact form, long after they have been freed from their neighboring sequences, which are called exons. Importantly, these persistent introns play a role in regulating yeast growth, particularly under stressful conditions.

The researchers, whose work appears online in the journal Nature, suggest that some introns also might accumulate and carry out functions in other organisms.

“This is the first time anyone has found a biological role for full-length, excised introns,” says senior author David Bartel, a member of the Whitehead Institute. “Our findings challenge the view of these introns as simply byproducts of gene expression, destined for rapid degradation.”

Imagine the DNA that makes up your genes as the raw footage of a movie. The exons are the scenes used in the final cut, whereas the introns are the outtakes — shots that are removed, or spliced out, and therefore not represented in the finished product.

Despite their second-class status, introns are known to play a variety of important roles. Yet these activities are primarily confined to the period prior to splicing — that is, before introns are separated from their nearby exons. After splicing, some introns can be whittled down and retained for other uses — part of a group of so-called “non-coding RNAs.” But by and large, introns have been thought to be relegated to the genome’s cutting room floor.

Bartel and his Whitehead Institute colleagues, including world-renowned yeast expert Gerald Fink, now add an astonishing new dimension to this view: Full-length introns — that is, those that have been cut out but remain otherwise intact — can persist and carry out useful biological functions. As reported in their Nature paper, the team discovered that these extraordinary introns are regulated by and function within the essential TORC1 growth signaling network, forming a previously unknown branch of this network that controls cell growth during periods of stress.

“Our initial reaction was: ‘This is really weird,’” recalls first author Jeffrey Morgan, a former graduate student in Bartel’s lab who is now a postdoc in Jared Rutter’s lab at the University of Utah. “We came across genes where the introns were much more abundant than the exons, which is the exact opposite of what you’d expect.”

The researchers identified a total of 34 of these unusually stable introns, representing 11 percent of all introns in the yeast, also known as Saccharomyces cerevisiae. Surprisingly, there are very few criteria that determine which introns will become stable introns. For example, the genetic sequences of the introns or the regions that surround them are of no significance. The only defining — and necessary — feature, the team found, is a structural one, and involves the precise shape the introns adopt as they are being excised from their neighboring exons. Excised introns typically form a lasso-shaped structure, known as a lariat. The length of the lasso’s handle appears to dictate whether an intron will be stabilized or not.

Remarkably, both yeast and introns have been studied for several decades. Yet until now, these unique introns went undetected. One reason, Bartel and his colleagues believe, is the conditions under which yeast are typically grown. Often, researchers study yeast that are growing very rapidly — so-called log-phase growth. That is because abnormalities are often easiest to detect when cells are multiplying quickly.

“Biologists have focused heavily on log-phase for very good reasons, but in the wild, yeast are very rarely in that condition, whether it’s because of limited nutrients or other stresses,” says Bartel, who is also professor of biology at MIT and a Howard Hughes Medical Institute investigator.

He and his colleagues decided to grow yeast under more stressful circumstances, and that is what ultimately led them to their discovery. Although their experiments were confined to yeast, the researchers believe it is possible other organisms may harbor this long-overlooked class of introns — and that similar approaches using less-often-studied conditions could help illuminate them.

“Right now, we can say it is happening in yeast, but we’d be surprised if this is the only organism in which it is happening,” Bartel says.

The research was supported by the National Institutes of Health and the Howard Hughes Medical Institute.

Biologists discover an unusual hallmark of aging in neurons

Snippets of RNA that accumulate in brain cells could interfere with normal function.

Anne Trafton | MIT News Office
November 27, 2018

As we age, neurons in our brains can become damaged by free radicals. MIT biologists have now discovered that this type of damage, known as oxidative stress, produces an unusual pileup of short snippets of RNA in some neurons.

This RNA buildup, which the researchers believe may be a marker of neurodegenerative diseases, can reduce protein production. The researchers observed this phenomenon in both mouse and human brains, especially in a part of the brain called the striatum — a site involved in diseases such as Parkinson’s and Huntington’s.

“The brain is very metabolically active, and over time, that causes oxidative damage, but it affects some neurons more than others,” says Christopher Burge, an MIT professor of biology. “This phenomenon appears to be a previously unrecognized consequence of oxidative stress, which impacts hundreds of genes and may influence translation and RNA regulation globally.”

Burge and Myriam Heiman, the Latham Family Career Development Associate Professor of Brain and Cognitive Sciences, are the senior authors of the paper, which appears in the Nov. 27 issue of Cell Reports. Peter Sudmant, a former MIT postdoc, is the lead author of the paper, and postdoc Hyeseung Lee and former postdoc Daniel Dominguez are also authors.

A mysterious finding

For this study, the researchers used a technique developed by Heiman that allows them to isolate and sequence messenger RNA from specific types of cells. Messenger RNA carries protein-building instructions to cell organelles called ribosomes, which read the mRNA and translate the instructions into proteins by stringing together amino acids in the correct sequence.

Heiman’s technique involves tagging ribosomes from a specific type of cells with green fluorescent protein, so that when a tissue sample is analyzed, researchers can use the fluorescent tag to isolate and sequence RNA from only those cells. This allows them to determine which proteins are being produced by different types of cells.

“This is particularly useful in the nervous system where you’ve got different types of neurons and glia closely intertwined together, if you want to isolate the mRNAs from one particular cell type,” Burge says.

In separate groups of mice, the researchers tagged ribosomes from either D1 or D2 spiny projection neurons, which make up 95 percent of the neurons found in the striatum. They labeled these cells in younger mice (6 weeks old) and 2-year-old mice, which are roughly equivalent to humans in their 70s or 80s.

The researchers had planned to look for gene expression differences between those two cell types, and to explore how they were affected by age. “These two types of neurons are implicated in several neurodegenerative diseases that are aging-related, so it is important to understand how normal aging changes their cellular and molecular properties,” says Heiman, who is a member of MIT’s Picower Institute for Learning and Memory and the Broad Institute of MIT and Harvard.

To the researchers’ surprise, a mysterious result emerged — in D1 neurons from aged mice (but not neurons from young mice or D2 neurons from aged mice), they found hundreds of genes that expressed only a short fragment of the original mRNA sequence. These snippets, known as 3’ untranslated regions (UTRs), were stuck to ribosomes, preventing the ribosomes from assembling normal proteins. “While these RNAs have been observed before, the magnitude and age-associated cell-type specificity was really unprecedented,” says Sudmant.

The 3’ UTR snippets appeared to originate from about 400 genes with a wide variety of functions. Meanwhile, many other genes were totally unaffected.

“There are some genes that are completely normal, even in aged D1 neurons. There’s a gene-specific aspect to this phenomenon that is quite interesting and mysterious,” Burge says.

The findings led the researchers to explore a possible role for oxidative stress in this 3’ UTR accumulation. Neurons burn a great deal of energy, which can produce free radicals as byproducts. Unlike many other cell types, neurons do not get replaced, so they are believed to be susceptible to accumulated damage from these radicals over time.

The MIT team found that the activation of oxidative stress response pathways was higher in D1 neurons compared to D2 neurons, suggesting that they are indeed undergoing more oxidative damage. The researchers propose a model for the production of isolated 3′ UTRs involving an enzyme called ABCE1, which normally separates ribosomes from mRNA after translation is finished. This enzyme contains iron-sulfur clusters that can be damaged by free radicals, making it less effective at removing ribosomes, which then get stuck on the mRNA. This leads to cleavage of the RNA by a mechanism that operates upstream of stalled ribosomes.

“Sending neural signals takes a lot of energy,” Burge says. “Over time, that causes oxidative damage, and in our model one of the proteins that eventually gets damaged is ABCE1, and that triggers the production of 3’ UTRs.”

RNA buildup

The researchers also found the same accumulation in most parts of the human brain, including the frontal cortex, which is very metabolically active. They did not see it in most other types of human tissue, with the exception of liver tissue, which is exposed to high levels of potentially toxic molecules.

In human brain tissue, the researchers found that the amount of 3’ UTRs gradually increased with age, which fits their proposed model of gradual damage by oxidative stress. The researchers’ findings and model suggest that the production of these 3′ UTRs involves the destruction of normal mRNAs, reducing the amount of protein produced from the affected genes.  This buildup of 3′ UTRs with ribosomes stuck to them can also block ribosomes from producing other proteins.

It remains to be seen exactly what effect this would have on those neurons, Burge says, but it is possible that this kind of cellular damage could combine with genetic and environmental factors to produce a general decline in cognitive ability or even neurodegenerative conditions such as Parkinson’s disease. In future studies, the researchers hope to further explore the causes and consequences of the accumulation of 3’ UTRs.

The research was funded by the National Institutes of Health and the JPB Foundation.

Decoding patterns and meaning in biological data

Senior Anna Sappington found her perfect balance of “innovative computer science and innovative biology” as a member of the team mapping every cell in the human body.

Raleigh McElvery
December 5, 2018

When Anna Sappington was six years old, her parents gave her a black and white composition notebook. Together, they began jotting down observations to identify the patterns in their wooded backyard near the Chesapeake Bay. How would the harsh winters or the early springs affect the blooming trees? How many bluebirds nested each season and how many eggs would they lay? When would the cicada population cycle peak? Her father, the environmental scientist, taught her to sift through data to uncover the trends. Her mother, the journalist, gave her the words to describe her findings.

But it wasn’t until Sappington competed in the Intel International Science and Engineering Fair her junior year of high school that she probed one tiny niche of the natural world more keenly than she ever had before: the physiology of the water flea. Specifically, she investigated the developmental changes that these minute creatures experienced after being exposed to the antimicrobial compound triclosan, present in many soaps and toothpastes. She was surprised to learn that it required only a low concentration of triclosan (0.5 ppm) to cause developmental defects.

She’d been familiar with the concept of DNA since middle school, but her fellow science fair finalists were delving beyond their observations and into the letters of the genetic code. This gave her a new impetus: to understand how triclosan worked at the level of the genome and epigenome to engender the physical deformities she observed under the microscope. She just needed the proper tools, so she made some calls.

Environmental geneticist and water flea aficionado John Colbourne took an interest, and invited her to his lab at University of Birmingham in the U.K. the following summer so she could learn basic lab techniques. Although her friends and classmates didn’t quite get why she needed to travel to an entirely different country to study an organism they’d never heard of, as she puts it, she had burning scientific questions that needed answers.

“That was the experience that really turned me on to genomics,” says Sappington, now a senior and 6-7 (Computer Science and Molecular Biology) major. “I was finally getting the tools to dig through large amounts of data, using code to find patterns and meaning. I wanted to keep asking ‘why?’ and ‘how?’ all the way down to the molecular level.”

The summer before her freshman year of college, Sappington asked these questions in humans for the time as an intern at the National Human Genome Research Institute (NHGRI). There, she helped create a computational pipeline to identify the genomic changes associated with heightened risk of cardiovascular disease.

She enrolled at MIT the following fall, because she wanted to be around people from every scientific subfield imaginable. When she arrived, the joint major in computer science and biology was still relatively new.

“While a few of the required classes did meld the two, many of them offered training in each separately,” she says. “That approach really appealed to me because I was hoping to develop both skill sets independently. I wanted to learn code and write algorithms that could be applied to any field, and I also loved understanding the biological mechanisms behind different diseases and viruses.”

Before she’d even officially declared her major, Sappington was already running experiments in Sangeeta Bhatia’s lab. There, at the Koch Institute, she studied the effects of HPV infection on gene expression in liver cells. Sappington’s main role was data analysis, striving to determine which genes were amplified in response to disease.Despite their obvious differences, Sappington found the two areas to be more similar than she had initially anticipated. In her Introduction to Algorithms class, she leveraged an arsenal of algorithms with certain outputs, conditions, and run times to decode her problem sets. In Organic Chemistry, she deployed a list of foundational reactions to solve synthesis questions on her exams. “In each case, you have to combine your understanding of these fundamental rules and come up with a creative solution to decipher an unknown,” she says.

One year later, Sappington moved to Aviv Regev’s lab at the Broad Institute. There, she learned computational techniques for decoding protein interaction networks. After a year, she began working on an international project called the Human Cell Atlas as a member of the Regev and the Sanes lab collaboration.

“The overarching mission is to create a reference map of all human cells,” Sappington explains. “We want to add a layer of functional understanding on top of what we know about the genome, to understand how different cell types differ and how they interact to impact disease. This kind of endeavor has never been undertaken on such a large scale before, so it’s incredibly exciting.”

Even within a single cell type — say, retinal cells — there are about six main cell categories, each of which splinter into as many as 40 subtypes with distinct molecular profiles and roles.

Beyond the biological challenges that go along with trying to distinguish all these cell types, there are numerous computational hurdles as well. Sappington enjoys these the most — grappling with how best to analyze the gene expression of a single cell separated from its tissue of origin.

“Since you’re only working with single cells rather than entire groups of cells from a tissue, the data that you get are much more sparse,” she says. “You have to sequence a lot of individual cells and build up lots of statistical power before you can be confident that a given cell is expressing specific genes. Coming up with models to determine what constitutes a cell type — and map cell types between time points or between species — are broad problems in computer science that we’re now applying to this very specific type of data.”

Although she’s been at the Broad since her sophomore year, Sappington has supplemented her MIT research experiences with summer studies elsewhere: another stint at the NHGRI and an Amgen Scholars fellowship in Japan. She’s especially excited because her first co-authored paper will soon be published. As she puts it, she’s finally found her ideal balance of “innovative computer science and innovative biology.”

But Sappington’s time at MIT has been defined by more than just lab work. She is the co-president of the Biology Undergraduate Student Association, which serves as a liaison between the Department of Biology and the wider community. She’s also a member of MedLinks, a volunteer at the Massachusetts General Hospital Department of Radiology, former managing director of TechX, and a performer for several campus dance troupes. In 2018, Sappington earned the prestigious Barry Goldwater Scholarship Award, alongside fellow 6-7 major Meena Chakraborty.

She was recently awarded the Marshall Scholarship, which will fund her master’s degrees in machine learning at University College London and oncology at the University of Cambridge beginning in the fall of 2019. After two years, she plans to start her MD-PhD. That way, she can become a practicing physician without having to give up her computer science research.

Her advice to prospective students: “When you get to MIT, just explore. Try different academic disciplines, different extracurriculars, and talk to as many people as you can. The campus is full of passionate individuals in every field imaginable, whether that’s computer science or political science.”

Posted 12.5.18
Computer model offers more control over protein design

New approach generates a wider variety of protein sequences optimized to bind to drug targets.

Anne Trafton | MIT News Office
October 15, 2018

Designing synthetic proteins that can act as drugs for cancer or other diseases can be a tedious process: It generally involves creating a library of millions of proteins, then screening the library to find proteins that bind the correct target.

MIT biologists have now come up with a more refined approach in which they use computer modeling to predict how different protein sequences will interact with the target. This strategy generates a larger number of candidates and also offers greater control over a variety of protein traits, says Amy Keating, a professor of biology, a member of the Koch Institute, and the leader of the research team.

“Our method gives you a much bigger playing field where you can select solutions that are very different from one another and are going to have different strengths and liabilities,” she says. “Our hope is that we can provide a broader range of possible solutions to increase the throughput of those initial hits into useful, functional molecules.”

In a paper appearing in the Proceedings of the National Academy of Sciences the week of Oct. 15, Keating and her colleagues used this approach to generate several peptides that can target different members of a protein family called Bcl-2, which help to drive cancer growth.

Recent PhD recipients Justin Jenson and Vincent Xue are the lead authors of the paper. Other authors are postdoc Tirtha Mandal, former lab technician Lindsey Stretz, and former postdoc Lothar Reich.

Modeling interactions

Protein drugs, also called biopharmaceuticals, are a rapidly growing class of drugs that hold promise for treating a wide range of diseases. The usual method for identifying such drugs is to screen millions of proteins, either randomly chosen or selected by creating variants of protein sequences already shown to be promising candidates. This involves engineering viruses or yeast to produce each of the proteins, then exposing them to the target to see which ones bind the best.

“That is the standard approach: Either completely randomly, or with some prior knowledge, design a library of proteins, and then go fishing in the library to pull out the most promising members,” Keating says.

While that method works well, it usually produces proteins that are optimized for only a single trait: how well it binds to the target. It does not allow for any control over other features that could be useful, such as traits that contribute to a protein’s ability to get into cells or its tendency to provoke an immune response.

“There’s no obvious way to do that kind of thing — specify a positively charged peptide, for example — using the brute force library screening,” Keating says.

Another desirable feature is the ability to identify proteins that bind tightly to their target but not to similar targets, which helps to ensure that drugs do not have unintended side effects. The standard approach does allow researchers to do this, but the experiments become more cumbersome, Keating says.

The new strategy involves first creating a computer model that can relate peptide sequences to their binding affinity for the target protein. To create this model, the researchers first chose about 10,000 peptides, each 23 amino acids in length and helical in structure, and tested their binding to three different members of the Bcl-2 family. They intentionally chose some sequences they already knew would bind well, plus others they knew would not, so the model could incorporate data about a range of binding abilities.

From this set of data, the model can produce a “landscape” of how each peptide sequence interacts with each target. The researchers can then use the model to predict how other sequences will interact with the targets, and generate peptides that meet the desired criteria.

Using this model, the researchers produced 36 peptides that were predicted to tightly bind one family member but not the other two. All of the candidates performed extremely well when the researchers tested them experimentally, so they tried a more difficult problem: identifying proteins that bind to two of the members but not the third. Many of these proteins were also successful.

“This approach represents a shift from posing a very specific problem and then designing an experiment to solve it, to investing some work up front to generate this landscape of how sequence is related to function, capturing the landscape in a model, and then being able to explore it at will for multiple properties,” Keating says.

Sagar Khare, an associate professor of chemistry and chemical biology at Rutgers University, says the new approach is impressive in its ability to discriminate between closely related protein targets.

“Selectivity of drugs is critical for minimizing off-target effects, and often selectivity is very difficult to encode because there are so many similar-looking molecular competitors that will also bind the drug apart from the intended target. This work shows how to encode this selectivity in the design itself,” says Khare, who was not involved in the research. “Applications in the development of therapeutic peptides will almost certainly ensue.”

Selective drugs

Members of the Bcl-2 protein family play an important role in regulating programmed cell death. Dysregulation of these proteins can inhibit cell death, helping tumors to grow unchecked, so many drug companies have been working on developing drugs that target this protein family. For such drugs to be effective, it may be important for them to target just one of the proteins, because disrupting all of them could cause harmful side effects in healthy cells.

“In many cases, cancer cells seem to be using just one or two members of the family to promote cell survival,” Keating says. “In general, it is acknowledged that having a panel of selective agents would be much better than a crude tool that just knocked them all out.”

The researchers have filed for patents on the peptides they identified in this study, and they hope that they will be further tested as possible drugs. Keating’s lab is now working on applying this new modeling approach to other protein targets. This kind of modeling could be useful for not only developing potential drugs, but also generating proteins for use in agricultural or energy applications, she says.

The research was funded by the National Institute of General Medical Sciences, National Science Foundation Graduate Fellowships, and the National Institutes of Health.

Jarrett Smith receives Hanna Gray Fellowship from HHMI
Greta Friar | Whitehead Institute
September 12, 2018

Cambridge, Mass — Jarrett Smith, postdoctoral researcher in David Bartel’s lab at the Whitehead Institute, has been announced as a recipient of the Howard Hughes Medical Institute (HHMI)’s 2018 Hanna Gray fellowship. The fellowship supports outstanding early career scientists from groups underrepresented in the life sciences. Each of this year’s fifteen awardees will be given up to $1.4 million dollars in funding over the course of their postdoctoral program and beginning of a tenure-track faculty position.

“This program will help us retain the most diverse talent in science,” said HHMI President Erin O’Shea. “We feel it’s critically important in academia to have exceptional people from all walks of life, all cultures, and all backgrounds – people who can inspire the next generation of scientists.”

For Smith, who began his postdoctoral training in the Bartel lab in January, finding out he got the fellowship was a defining moment.

“I’m grateful for the support that the fellowship will provide during the formative years of my career,” Smith says. “This kind of opportunity gives you the confidence to set ambitious research goals and find out what you can accomplish.”

In the Bartel lab, Smith studies how cells respond to stress. When a cell is exposed to environmental stressors such as heat, UV radiation, or viral infection, proteins and RNAs in the cell may clump together into dense aggregates called stress granules. Several diseases are associated with altered stress granule formation, but the exact function of stress granules and their potential role in disease are unknown. Smith is investigating changes in the cell linked to their formation. His findings could shed light on a potential role for stress granules in cancer, viral infection, and neurodegenerative disease.

Growing up, Smith was always interested in science but no one in his family had ever received a PhD, making biology research feel like an unlikely career path for him. Nevertheless, he followed his passion, which led him to a PhD program at the Johns Hopkins University School of Medicine. Despite his strong academic performance, Smith began graduate school with doubts about his ability to become a scientist. His mentors were incredible teachers but their self-assuredness could be intimidating.

“They were absolutely my role models, but I didn’t think of them as having gone through what I was going through. In the first few years, I felt like I had a lot of catching up to do,” Smith said.

Smith says he was frequently inspired and guided by his graduate school mentor, Geraldine Seydoux. Under her tutorship he became more confident in his abilities.

“I try to pick mentors who are the kind of scientist I aspire to be,” Smith said.

With that tenet in mind, he set his sights on David Bartel’s lab for his postdoctoral research. He had heard that Bartel was a great mentor and knew the Bartel lab had expertise in all of the research techniques he wanted to learn. Since arriving at Whitehead Institute, Smith says he has experienced support not only from Bartel, but from the entire lab as well.

“Jarrett’s graduate experience with P granules in nematodes brings much appreciated expertise to our lab, and we are all excited about what he will discover here on stress-granule function,” Bartel says. “Receiving this fellowship is a well-deserved honor, and I am very happy for him.”

Smith noted that he is deeply grateful for the community he’s found at Whitehead Institute. However, he also noted that throughout his scientific career he has typically been the only black person in the room. One of the joys of applying for the fellowship was meeting the rest of the candidates, a diverse and impressive group of scientists, he says. He looks forward to seeing the other fellows again at meetings hosted by the HHMI.

“I’ve never really had a scientific role model that shared those experiences or that I could identify with in that way,” Smith says, but he hopes that future aspiring scientists won’t have to go through the same experience. His brother-in-law recently began an undergraduate major in biology. Smith enjoys being there to answer his questions about school work or life as a researcher.

“I’d never ask him if he thinks of me as a role model,” Smith says, laughing. “But I’m glad that I have the chance to help people who—like I did—might question whether they could be successful in the sciences.” With the support of the fellowship and his lab, and an exciting research question he is eager to tackle, Smith has never been more certain that he belongs right where he is.

The cartographer of cells

Aviv Regev helped pioneer single-cell genomics. Now she’s cochairing a massive effort to map the trillions of cells in the human body. Biology will never be the same.

Sam Apple | MIT Technology Review
August 23, 2018

Last October, Aviv Regev spoke to a gathering of international scientists at Israel’s Weizmann Institute of Science. For Regev, a computational and systems biologist at the Broad Institute of MIT and Harvard, the gathering was also a homecoming of sorts. Regev earned her PhD from nearby Tel Aviv University in 2002. Now, 15 years later, she was back to discuss one of the most ambitious projects in the history of biology.

The project, the Human Cell Atlas, aims to create a reference map that categorizes all the approximately 37 trillion cells that make up a human. The Human Cell Atlas is often compared to the Human Genome Project, the monumental scientific collaboration that gave us a complete readout of human DNA, or what might be considered the unabridged cookbook for human life. In a sense, the atlas is a continuation of that project’s work. But while the same DNA cookbook is found in every cell, each cell type reads only some of the recipes—that is, it expresses only certain genes, following their DNA instructions to produce the proteins that carry out a cell’s activities. The promise of the Human Cell Atlas is to reveal which specific genes are expressed in every cell type, and where the cells expressing those genes can be found.

Speaking to her colleagues at the meeting in Israel, Regev, who is cochairing the Human Cell Atlas Organizing Committee with Sarah Teichmann of the Wellcome Trust Sanger Institute, displayed the no-nonsense demeanor you might expect of someone at the helm of a massive scientific undertaking. The project had been under way for a year, and Regev, an MIT biology professor who is also chair of the faculty of the Broad and director of its Klarman Cell Observatory and Cell Circuits Program, was reviewing a newly published white paper detailing how the Human Cell Atlas is expected to change the way we diagnose, monitor, and treat disease.

As Regev made her way through the white paper, the possibilities began to seem almost endless. At the most basic level, as a reference map detailing the genes expressed by each different type of healthy cell, the Human Cell Atlas will make it easier to identify how gene expression and signaling go awry in the case of disease. The same map could also help drug developers avoid toxic side effects: researchers targeting a gene that’s harmful in one part of the body would know if the same gene is playing a vital role in another. And because the atlas is expected to reveal many new types of cells, it could also add much more sensitivity to a type of standard blood test, which simply counts different subsets of immune cells. Likewise, looking at individual intestinal cells might provide new insights into the specific cells responsible for inflammation and food allergies. And a better understanding of types of neurons could have far-reaching implications for brain science.

The final product, Regev says, will amount to nothing less than a “periodic table of our cells,” a tool that is designed not to answer one specific question but to make countless new discoveries possible. Eric Lander, the founding director and president of the Broad Institute and a member of the Human Cell Atlas Organizing Committee, likens it to genomics. “People thought at the beginning they might use genomics for this application or that application,” he says. “Nothing has failed to be transformed by genomics, and nothing will fail to be transformed by having a cell atlas.”

Cellular circuits

Regev’s interest in cells began at Tel Aviv University, where she was one of just 15 or so entering students in a highly selective program that gave them the freedom to take high-level courses in any subject. “You could go your first day as a freshman and decide to take a graduate class in political science,” she says.

Regev took a genetics class her first semester and got hooked on the computational challenge of finding order in the complex, interconnected networks of proteins and genes within each cell. She pursued that topic for her doctoral work, characterizing living systems in a mathematical language that had been designed to describe computer processes. As she finished her doctorate in 2002, she was accepted into a program at Harvard’s Bauer Center for Genomics Research that allowed her to start her own lab without first training as a postdoc.

Not long after, Lander, who’d begun his own career as a mathematician after studying algebraic coding theory and combinatorial mathematics at Oxford, was searching for star talent for the newly created Broad Institute, whose mission is to use genomics to study human disease and help advance its treatment. He first met Regev at a lunch at the Bauer Center during which the fellows took turns speaking about their research for five to 10 minutes. “By the time we got all the way around the table I had written down ‘Hire Aviv Regev,’” he recalls.

Convinced by Lander to join the Broad after “many cups of tea” at Cafe Algiers in Harvard Square, Regev continued to apply computational approaches to study the mind-bogglingly complicated machinery of the cell. A single cell is made up of millions of molecules that are in constant conversation as they work together to do all the things cells need to do: divide, grow, repair internal damage, and, in the case of immune cells, signal other cells about threats. Inside the nucleus, the DNA is transcribed into RNA. That in turn gives rise to proteins, the molecules that do the work inside a cell. Meanwhile, proteins on the surface of the cell are constantly receiving molecular messages from outside—glucose is available, an invader has arrived. These must be relayed back to proteins in the nucleus, which will respond by transcribing other DNA, giving rise to new proteins and still more signaling networks.

“It’s like a complex computer that is made of these many, many different parts that are interacting with each other and telling each other what to do,” says Regev. The protein signaling networks are like “circuits”—and you can think about the cell “almost like a wiring diagram,” she says. But using computational approaches to understand their activity first requires gathering an enormous amount of data, which Regev has long done through RNA sequencing. Unlike DNA sequencing, she says, it can tell her which genes are actually being expressed, so it offers a far more dynamic picture of a cell in action. But simply sequencing the RNA of the cells she’s studying can tell her only so much. To understand how the circuits change under different circumstances, Regev subjects cells to different stimuli, such as hormones or pathogens, to see how the resulting protein signals change.

Next comes what she calls “the modeling step”—creating algorithms that try to decipher the most likely sequence of molecular events following a stimulus. And just as someone might study a computer by cutting out circuits and seeing how that changes the machine’s operation, Regev tests her model by seeing if it can predict what will happen when she silences specific genes and then exposes the cells to the same stimulus.

In a 2009 study, Regev and her team examined how exposure to molecular components of pathogens like bacteria, viruses, or fungi affected the circuitry of the immune system’s dendritic cells. She turned to a technique known as RNA interference (she now uses CRISPR), which allowed her to systematically shut genes down. Then she looked at which genes were expressed to determine how the cells’ response changed in each case. Her team singled out 100 different genes that were involved in regulating the response to the pathogens—some of which weren’t previously known to be involved in immune function. The study, published in Science, generated headlines. But according to longtime colleague Dana Pe’er, now chair of computational and systems biology at the Sloan Kettering Institute at the Memorial Sloan Kettering Cancer Center and a member of the Human Cell Atlas Organizing Committee, what really sets Regev apart is the elegance of her work. Regev, says Pe’er, “has a rare, innate ability of seeing complex biology and simplifying it and formalizing it into beautiful, abstract, describable principles.”

From smoothies to fruit salad

There are lots of empty coffee mugs in Regev’s office at the Broad Institute, but very little in the way of decoration. She approaches her science with a businesslike efficiency. “There are many brilliant people,” says Lander. “She’s a brilliant person who can get things done.”

In the fast-changing arena of genomics (“2015 in my field is considered ancient history,” she says), she is known for making the most of the latest innovations—and for helping to spur the next ones. For years, she and others in the field struggled with a dirty secret of RNA sequencing: though its promise has always been precision—the power of knowing the exact code—the techniques produced results that were unspecific. Every cell has only a minuscule amount of RNA. For sequencing purposes, the RNA from millions of cells had to be pooled together. Bulk RNA sequencing left researchers with what she likens to a smoothie. Once it’s blended together, there’s no way to distinguish all the fruits—or in this case, the RNA from individual cells—that went into it. What researchers needed was something more like a fruit salad, a way to separate all the blueberries, raspberries, and blackberries.

In 2011, working with Broad Institute colleague Joshua Levin, PhD ’92, and postdocs Alex Shalek, now at MIT’s Institute for Medical Engineering and Science, and Rahul Satija, now at the New York Genome Center, Regev managed to obtain enough RNA from a single cell to sequence it. To test the method, they sequenced 18 individual dendritic cells from the bone marrow of a mouse. The cells were all obtained in the same way and were expected to be the same type. But to the researchers’ amazement, they were expressing different genes and could be classified into two distinct subtypes. It was like finding out the smoothie you’d been drinking for years had ingredients you’d never known about.

Regev and her colleagues weren’t the only ones figuring out how to sequence a single cell with such sensitivity, nor were they the very first to succeed. Other labs were making similar advances at approximately the same time, each using its own technology and algorithms. And they all faced the same problem: isolating and extracting enough RNA from individual cells was time consuming and expensive. Regev and her colleagues had spent many thousands of dollars to sequence only 18 cells. If the body was full of rare, undiscovered cells, it was going to take an extraordinarily long time to find them.

Skip ahead seven years and the cost of single-cell RNA sequencing is down to only pennies per cell. A critical breakthrough was Drop-Seq, a new technology developed by researchers at Harvard and the Broad Institute, including Regev and members of her lab. The device embeds individual cells into distinct oil droplets with a tiny “bar-coded” bead. When the cell is broken apart for sequencing, some of its RNA attaches to the bead in its droplet. This allows researchers to analyze thousands at once without getting their genetic material mixed up.

Cell theory 2.0

When cell theory was first proposed by German scientists some 180 years ago, it was hard to fathom that our tissues are built from “individual elementary units,” as Theodor Schwann, one of the two scientists credited with the theory, described cells. But it soon became a central tenet of biology, and over the decades and centuries, cells began to give up their secrets. Microscopes improved; new staining and sorting techniques became available. With each advance, new distinctions became possible. Muscle cells could be distinguished from neurons, and then categorized again as smooth or skeletal muscle cells. Cells, it became clear, were all fundamentally similar but came in different forms that had different properties.

By the 21st century, 200 to 300 major cell types had been identified. And while biologists have long recognized that the true number of cell types must be higher, the extent of their diversity is only now coming into full focus, thanks in large part to single-cell RNA sequencing. Regev says that the immune system alone can now be divided into more than 200 cell types and that even our retinas have 100 or more distinct types of neurons. She and her colleagues have discovered several of them.

The idea that knowing so much more about our cells could lead to medical breakthroughs is no longer hypothetical. By sequencing the RNA of individual cancer cells in recent years—“Every cell is an experiment now,” she says—she has found remarkable differences between the cells of a single tumor, even when they have the same mutations. (Last year that work led to Memorial Sloan Kettering’s Paul Marks Prize for Cancer Research.) She found that while some cancers are thought to develop resistance to therapy, a subset of melanoma cells were resistant from the start. And she discovered that two types of brain cancer, oligodendroglioma and astrocytoma, harbor the same cancer stem cells, which could have important implications for how they’re treated.

The excitement in the field has become tangible as more new cell types have been found. And yet Regev realized that if the aim was comprehensive knowledge, the approach needed to be coordinated. If each lab were to rely on its own techniques, it would be hard to standardize the computational tools and the resulting data. The new studies were producing “very nice glimmers of light,” Regev says—“a thing here, a thing there.” But she wanted to make sure those findings could be connected.Regev has also been busily mapping cells from the immune system, brain, gut, and elsewhere. She is not alone. Other labs have started their own mapping projects, each tackling a different part of the body. Last year researchers at the University of Washington attempted to classify every cell type in the microscopic worm C. elegans. “Every single field in biology is saying, ‘Of course we have to look at single-cell resolution,’” says Lander. “How did we ever imagine we were going to solve a problem without single-cell resolution?”

Regev began to advocate creating something more unified: a map that would allow researchers to chart gene expression and cell types across the entire body. Sarah Teichmann had been thinking along the same lines. When she reached out to Regev in late 2015 about the possibility of joining forces, Regev immediately said yes.

A Google Maps for our cells

The Human Cell Atlas is a collaboration among hundreds of biologists, technologists, and software engineers across the globe. Results from single-cell RNA sequencing will be combined with other data points to provide a comprehensive catalogue of all human cells.

But the many researchers involved won’t simply be compiling spreadsheets listing different cell types. The atlas will also reveal where the cells are located in the body, how many there are, what forms they can take, even the developmental history of different cell types as they differentiated from stem cells. And all of this will be made accessible through a data coordination platform and a rich visual interface that Regev compares to Google Maps. It will allow users to zoom in to the molecular level of our cells, but zooming out to the level of tissues and organs will be important too. As a 2017 overview of the Human Cell Atlas by the project’s organizing committee noted, an atlas “is a map that aims to show the relationships among its elements.” Just as corresponding coastlines seen in an atlas of Earth offer visual evidence of continental drift, compiling all the data about our cells in one place could reveal relationships among cells, tissues, and organs, including some that are entirely unexpected. And just as the periodic table made it possible to predict the existence of elements yet to be observed, the Human Cell Atlas, Regev says, could help us predict the existence of cells that haven’t been found.

The plan is not to sequence all 37 trillion cells but to sample from every part of the body. As Regev talks about the project, her enthusiasm evident, she digs up a slide to demonstrate how effective sampling can be. The slide, first only an empty frame of white, begins to fill in, pixel by pixel, with specks of blue and yellow. Soon, even though many of the pixels haven’t yet been filled, the image on the screen is unmistakable: it is Van Gogh’s Starry Night. Likewise, Regev explains, the Human Cell Atlas can give a complete picture even if not every single cell has been sequenced.

To do the sequencing, Regev and Teichmann have welcomed and recruited experts in each different tissue type. Though expected to take years, the project is moving ahead rapidly with such backers as NIH, the EU, the Wellcome Trust, the Manton Foundation, and the Chan Zuckerberg Initiative, which pledged to spend $3 billion to battle disease over the next decade; this year alone it will fund 85 Human Cell Atlas grants. Early results are already pouring in. In March, Swedish researchers working on cells related to human development announced they had sequenced 250,000 individual cells. In May, a team at the Broad made a data set of more than 500,000 immune cells available on a preview site. The goal, Regev says, is for researchers everywhere to be able to use the open-source platform of the Human Cell Atlas to perform joint analyses.

Plenty of challenges remain before the atlas can become a reality. New visualization software must be developed. Sequencing and computational approaches will need to be standardized across a huge number of labs. Conceptual issues, such as what distinguishes one cell type from another, have to be worked through. But the community behind the Human Cell Atlas—including more than 800 individuals as of June—has no shortage of motivation.

One of Regev’s own recent studies, published in August in Nature, is perhaps the best example of how the project could change biology. In mapping cells of the lungs, Regev and Jay Rajagopal’s lab at Massachusetts General Hospital found a new, very rare cell type that primarily expresses a gene linked to cystic fibrosis. Regev now thinks that these rare cells probably play a key role in the disease. More surprising yet, researchers had previously thought that a different cell type was expressing the gene.

“Imagine if somebody wanted to do gene therapy,” Regev says. “You have to fix the gene, but you have to fix it in the right cell.” The Human Cell Atlas could help researchers identify the right cell and understand how the gene in question is regulated by that cell’s extraordinarily complicated molecular networks.

For Regev, the importance of the Human Cell Atlas goes beyond its promise to revolutionize biology and medicine. As she once put it, without an atlas of our cells, “we don’t really know what we’re made of.”

Researchers discover new type of lung cell, critical insights for cystic fibrosis

A comprehensive single-cell analysis of airway cells in mice, validated in human tissue, reveals molecular details critical to understanding lung disease.

Karen Zusi
August 1, 2018

Researchers have identified a rare cell type in airway tissue, previously uncharacterized in the scientific literature, that appears to play a key role in the biology of cystic fibrosis. Using new technologies that enable scientists to study gene expression in thousands of individual cells, the team comprehensively analyzed the airway in mice and validated the results in human tissue.

Led by researchers from the Broad Institute of MIT and Harvard and Massachusetts General Hospital (MGH), the molecular survey also characterized gene expression patterns for other new cell subtypes. The work expands scientific and clinical understanding of lung biology, with broad implications for all diseases of the airway — including asthma, chronic obstructive pulmonary disease, and bronchitis.

Jayaraj Rajagopal, a physician in the Pulmonary and Critical Care Unit at MGH, associate member at the Broad Institute, and a Howard Hughes Medical Institute (HHMI) faculty scholar, and Broad core institute member Aviv Regev, director of the Klarman Cell Observatory at the Broad Institute, professor of biology at MIT, and an HHMI investigator, supervised the research. Daniel Montoro, a graduate student in Rajagopal’s lab, and postdoctoral fellows Adam Haber and Moshe Biton in the Regev lab are co-first authors on the paper published today in Nature.

“We have the framework now for a new cellular narrative of lung disease,” said Rajagopal, who is also a professor at Harvard Medical School and a principal faculty member at the Harvard Stem Cell Institute. “We’ve uncovered a whole distribution of cell types that seem to be functionally relevant. What’s more, genes associated with complex lung diseases can now be linked to specific cells that we’ve characterized. The data are starting to change the way we think about lung diseases like cystic fibrosis and asthma.”

“With single-cell sequencing technology, and dedicated efforts to map cell types in different tissues, we’re making new discoveries — new cells that we didn’t know existed, cell subtypes that are rare or haven’t been noticed before, even in systems that have been studied for decades,” said Regev, who is also co-chair of the international Human Cell Atlas consortium. “And for some of these, understanding and characterizing them sheds new light immediately on what’s happening inside the tissue.”

Using single-cell RNA sequencing, the researchers analyzed tens of thousands of cells from the mouse airway, mapping the physical locations of cell types and creating a cellular “atlas” of the tissue. They also developed a new method called pulse-seq to monitor development of cell types from their progenitors in the mouse airway. The findings were validated in human tissue.

One extremely rare cell type, making up roughly one percent of the cell population in mice and humans, appeared radically different from other known cells in the dataset. The team dubbed this cell the “pulmonary ionocyte” because its gene expression pattern was similar to ionocytes — specialized cells that regulate ion transport and hydration in fish gills and frog skin.

Strikingly, at levels higher than any other cell type, these ionocytes expressed the gene CFTR — which, when mutated, causes cystic fibrosis in humans. CFTR is critical for airway function, and for decades researchers and clinicians have assumed that it is frequently expressed at low levels in ciliated cells, a common cell type spread throughout the entire airway.

But according to the new data, the majority of CFTR expression occurs in only a few cells, which researchers didn’t even know existed until now.

When the researchers disrupted a critical molecular process in pulmonary ionocytes in mice, they observed the onset of key features associated with cystic fibrosis — most notably, the formation of dense mucus. This finding underscores how important these cells are to airway-surface regulation.

“Cystic fibrosis is an amazingly well-studied disease, and we’re still discovering completely new biology that may alter the way we approach it,” said Rajagopal. “At first, we couldn’t believe that the majority of CFTR expression was located in these rare cells, but the graduate students and postdocs on this project really brought us along with their data.”

The results may also have implications for developing targeted cystic fibrosis therapies, according to the team. For example, a gene therapy that corrects for a mutation in CFTRwould need to be delivered to the right cells, and a cell atlas of the tissue could provide a reference map to guide that process.

The study further highlighted where other disease-associated genes are expressed in the airway. For example, asthma development has been previously linked with a gene that encodes a sensor for rhinoviruses, and the data now indicate that this gene is expressed by ciliated cells. Another gene linked with asthma is expressed in tuft cells, which separated into at least two groups — one that senses chemicals in the airway and one that produces inflammation. The results suggest that a whole ensemble of cells may be responsible for different aspects of asthma.

Using the pulse-seq assay, the researchers tracked how the newly characterized cells and subtypes in the mouse airway develop. They demonstrated that mature cells in the airway arise from a common progenitor: the basal cells. The team also discovered a previously undescribed cellular structure in the tissue. These structures, which the researchers called “hillocks,” are unique zones of rapid cell turnover, and their function is not yet understood.

“The atlas that we’ve created is already starting to drastically re-shape our understanding of airway and lung biology,” said Regev. “And, for this and other organ systems being studied at the single-cell level, we’ll have to drape everything we know on top of this new cellular diversity to understand human health and disease.”

Funding for this study was provided in part by the Klarman Cell Observatory at the Broad Institute, Manton Foundation, HHMI, New York Stem Cell Foundation, Harvard Stem Cell Institute, Human Frontiers Science Program, and National Institutes of Health.

Paper(s) cited:

Montoro DT, Haber AL, Biton M et al. A revised airway epithelial hierarchy includes CFTR-expressing ionocytesNature. Online August 1, 2018. DOI: 10.1038/s41586-018-0393-7

Decoding RNA-protein interactions

Scientists leverage one step, unbiased method to characterize the binding preferences of more than 70 human RNA-binding proteins.

Raleigh McElvery
June 7, 2018

Thanks to continued advances in genetic sequencing, scientists have identified virtually every A, T, C, and G nucleotide in our genetic code. But to fully understand how the human genome encodes us, we need to go one step further, mapping the function of each base. That is the goal of the Encyclopedia of DNA Elements (ENCODE) project, funded by the National Human Genome Research Institute and launched on the heels of the Human Genome Project in 2003. Although much has already been accomplished — mapping protein-DNA interactions and the inheritance of different epigenetic states — understanding the function of a DNA sequence also requires deciphering the purpose of the RNAs encoded by it, as well as which proteins bind to those RNAs.

Such RNA-binding proteins (RBPs) regulate gene expression by controlling various post-transcriptional processes — directing where the RNAs go in the cell, how stable they are, and which proteins will be synthesized. Yet these vital RNA-protein relationships remain difficult to catalog, since most of the necessary experiments are arduous to complete and difficult to interpret accurately.

In a new study, a team of MIT biologists and their collaborators describes the binding specificity of 78 human RBPs, using a one-step, unbiased method that efficiently and precisely determines the spectrum of RNA sequences and structures these proteins prefer. Their findings suggest that RBPs don’t just recognize specific RNA segments, but are often influenced by contextual features as well — like the folded structures of the RNA in question, or the nucleotides flanking the RNA-binding sequence.

“RNA is never naked in the cell because there are always proteins binding, guiding, and modifying it,” says Christopher Burge, director of the Computational and Systems Biology PhD Program, professor of biology and biological engineering, extramural member of the Koch Institute for Integrative Cancer Research, associate member of the Broad Institute of MIT and Harvard, and senior author of the study. “If you really want to understand post-transcriptional gene regulation, then you need to characterize those interactions. Here, we take advantage of deep sequencing to give a more nuanced picture of exactly what RNAs the proteins bind and where.”

MIT postdoc Daniel Dominguez, former graduate student Peter Freese, and current graduate student Maria Alexis are the lead authors of the study, which is part of the ENCODE project and appears in Molecular Cell on June 7.

A method for the madness

From the moment an RNA is born, it is coated by RBPs that control nearly every aspect of its lifecycle. RBPs generally contain a binding domain, a three-dimensional folded structure that can attach to a specific nucleotide sequence on the RNA called a motif. Because there are over 1,500 different RBPs found in the human genome, the biologists needed a way to systematically determine which of those proteins bound to which RNA motifs.

After considering a number of different approaches to analyze RNA-protein interactions both directly in the cell (in vivo) and isolated in a test tube (in vitro), the biologists settled on an in vitro method known as RNA Bind-n-Seq (RBNS), developed four years ago by former Burge lab postdoc and co-author Nicole Lambert.

Although Lambert had previously tested only a small subset of proteins, RBNS surpassed other approaches because it was a quantitative method that revealed both low and high affinity RNA-protein interactions, required only a single procedural step, and screened nearly every possible RNA motif. This new study improved the assay’s throughput, systematically exploring the binding specificities of more than 70 human RBPs at a high resolution.

“Even with that initial small sample, it was clear RBNS was the way to go, and over the last three-and-a-half years we’ve been gradually building on this approach,” Dominguez says. “Since a single RBP can select from billions of unique RNA molecules, our approach gives you a lot more power to detect the all those possible targets, taking into account RNA secondary structure and contextual features. It’s an extremely deep and detailed assay.”

First, the researchers purified the human RBPs, mixing them with randomly-generated synthetic RNAs roughly 20 nucleotides long, which represented virtually all the RNAs an RBP could bind to. Next, they extracted the RBPs along with their bound RNAs and sequenced them. With the help of their collaborators from the University of California at San Diego and University of Connecticut Health, the team conducted additional assays to glean what these RNA-protein interactions might look like in an actual cell, and infer the cellular function of the RBPs.

The researchers expected most RBPs to bind to a unique RNA motif, but to their surprise they found the opposite: Many of the proteins, regardless of structural class, seemed to prefer similar short, unfolded nucleotide sequence motifs.

“Human cells express hundreds of thousands of distinct transcripts, so you might think that each RBP would bind a slightly different RNA sequence in order to distinguish between targets,” Alexis says. “In fact, one might assume that having distinct RBP motifs would ensure maximum flexibility. But, as it turns out, nature has built in substantial redundancy; multiple proteins seem to bind the same short, linear sequences.”

Redundant motifs with distinct targets and functions

This overlap in RBP binding preference suggested to the scientists that there must be some other indicator besides the sequence of the motif that signaled RBPs which RNA to target. Those signals, it turned out, stemmed from the spacing of the motifs as well as which nucleotide bases flank its binding sites. For the less common RBPs that targeted non-linear RNA sequences, the precise way the RNA folded also seemed to influence binding specificity.

The obvious question, then, is: Why might RBPs have evolved to rely on contextual features instead of just giving them distinct motifs?

Accessibility seems like one of the more plausible arguments. The researchers reasoned that linear RNA segments are physically easier to reach because they are not obstructed by other RNA strands, and they found that more accessible motifs are more likely to be bound. Another possibility is that having many proteins target the same motif creates some inter-protein competition. If one protein increases RNA stability and another decreases it, whichever binds the strongest will prevent the other from binding at all, enabling more pronounced changes in gene activity between cells or cell states. In other scenarios, proteins with similar functions that target the same motif could provide redundancy to ensure that regulation occurs in the cell.

“It’s definitely a difficult question, and one that we may never truly be able to answer,” Dominguez says. “As RBPs duplicated over evolutionary time, perhaps altering recognition of the contextual features around the RNA motif was easier than changing the entire RNA motif. And that would give new opportunities for RBPs to select different cellular targets.”

This study marks one of the first in vitro contributions to the ENCODE Project. While in vivo assays reveal information specific to the particular cell line or tissue in which they were conducted, RBNS will help define the basic rules of RNA-protein interactions — so fundamental they are likely to apply across many cell types and tissues.

The research was funded by the National Institutes of Health ENCODE Project, an NIH/NIGMS grant, the National Defense Science and Engineering Graduate Fellowship, Kirschstein National Research Service Award, Burroughs Wellcome Postdoctoral Fund, and an NIH Individual Postdoctoral Fellowship.