Aviv Regev helped pioneer single-cell genomics. Now she’s cochairing a massive effort to map the trillions of cells in the human body. Biology will never be the same.
Sam Apple | MIT Technology Review
August 23, 2018
Last October, Aviv Regev spoke to a gathering of international scientists at Israel’s Weizmann Institute of Science. For Regev, a computational and systems biologist at the Broad Institute of MIT and Harvard, the gathering was also a homecoming of sorts. Regev earned her PhD from nearby Tel Aviv University in 2002. Now, 15 years later, she was back to discuss one of the most ambitious projects in the history of biology.
The project, the Human Cell Atlas, aims to create a reference map that categorizes all the approximately 37 trillion cells that make up a human. The Human Cell Atlas is often compared to the Human Genome Project, the monumental scientific collaboration that gave us a complete readout of human DNA, or what might be considered the unabridged cookbook for human life. In a sense, the atlas is a continuation of that project’s work. But while the same DNA cookbook is found in every cell, each cell type reads only some of the recipes—that is, it expresses only certain genes, following their DNA instructions to produce the proteins that carry out a cell’s activities. The promise of the Human Cell Atlas is to reveal which specific genes are expressed in every cell type, and where the cells expressing those genes can be found.
Speaking to her colleagues at the meeting in Israel, Regev, who is cochairing the Human Cell Atlas Organizing Committee with Sarah Teichmann of the Wellcome Trust Sanger Institute, displayed the no-nonsense demeanor you might expect of someone at the helm of a massive scientific undertaking. The project had been under way for a year, and Regev, an MIT biology professor who is also chair of the faculty of the Broad and director of its Klarman Cell Observatory and Cell Circuits Program, was reviewing a newly published white paper detailing how the Human Cell Atlas is expected to change the way we diagnose, monitor, and treat disease.
As Regev made her way through the white paper, the possibilities began to seem almost endless. At the most basic level, as a reference map detailing the genes expressed by each different type of healthy cell, the Human Cell Atlas will make it easier to identify how gene expression and signaling go awry in the case of disease. The same map could also help drug developers avoid toxic side effects: researchers targeting a gene that’s harmful in one part of the body would know if the same gene is playing a vital role in another. And because the atlas is expected to reveal many new types of cells, it could also add much more sensitivity to a type of standard blood test, which simply counts different subsets of immune cells. Likewise, looking at individual intestinal cells might provide new insights into the specific cells responsible for inflammation and food allergies. And a better understanding of types of neurons could have far-reaching implications for brain science.
The final product, Regev says, will amount to nothing less than a “periodic table of our cells,” a tool that is designed not to answer one specific question but to make countless new discoveries possible. Eric Lander, the founding director and president of the Broad Institute and a member of the Human Cell Atlas Organizing Committee, likens it to genomics. “People thought at the beginning they might use genomics for this application or that application,” he says. “Nothing has failed to be transformed by genomics, and nothing will fail to be transformed by having a cell atlas.”
Cellular circuits
Regev’s interest in cells began at Tel Aviv University, where she was one of just 15 or so entering students in a highly selective program that gave them the freedom to take high-level courses in any subject. “You could go your first day as a freshman and decide to take a graduate class in political science,” she says.
Regev took a genetics class her first semester and got hooked on the computational challenge of finding order in the complex, interconnected networks of proteins and genes within each cell. She pursued that topic for her doctoral work, characterizing living systems in a mathematical language that had been designed to describe computer processes. As she finished her doctorate in 2002, she was accepted into a program at Harvard’s Bauer Center for Genomics Research that allowed her to start her own lab without first training as a postdoc.
Not long after, Lander, who’d begun his own career as a mathematician after studying algebraic coding theory and combinatorial mathematics at Oxford, was searching for star talent for the newly created Broad Institute, whose mission is to use genomics to study human disease and help advance its treatment. He first met Regev at a lunch at the Bauer Center during which the fellows took turns speaking about their research for five to 10 minutes. “By the time we got all the way around the table I had written down ‘Hire Aviv Regev,’” he recalls.
Convinced by Lander to join the Broad after “many cups of tea” at Cafe Algiers in Harvard Square, Regev continued to apply computational approaches to study the mind-bogglingly complicated machinery of the cell. A single cell is made up of millions of molecules that are in constant conversation as they work together to do all the things cells need to do: divide, grow, repair internal damage, and, in the case of immune cells, signal other cells about threats. Inside the nucleus, the DNA is transcribed into RNA. That in turn gives rise to proteins, the molecules that do the work inside a cell. Meanwhile, proteins on the surface of the cell are constantly receiving molecular messages from outside—glucose is available, an invader has arrived. These must be relayed back to proteins in the nucleus, which will respond by transcribing other DNA, giving rise to new proteins and still more signaling networks.
“It’s like a complex computer that is made of these many, many different parts that are interacting with each other and telling each other what to do,” says Regev. The protein signaling networks are like “circuits”—and you can think about the cell “almost like a wiring diagram,” she says. But using computational approaches to understand their activity first requires gathering an enormous amount of data, which Regev has long done through RNA sequencing. Unlike DNA sequencing, she says, it can tell her which genes are actually being expressed, so it offers a far more dynamic picture of a cell in action. But simply sequencing the RNA of the cells she’s studying can tell her only so much. To understand how the circuits change under different circumstances, Regev subjects cells to different stimuli, such as hormones or pathogens, to see how the resulting protein signals change.
Next comes what she calls “the modeling step”—creating algorithms that try to decipher the most likely sequence of molecular events following a stimulus. And just as someone might study a computer by cutting out circuits and seeing how that changes the machine’s operation, Regev tests her model by seeing if it can predict what will happen when she silences specific genes and then exposes the cells to the same stimulus.
In a 2009 study, Regev and her team examined how exposure to molecular components of pathogens like bacteria, viruses, or fungi affected the circuitry of the immune system’s dendritic cells. She turned to a technique known as RNA interference (she now uses CRISPR), which allowed her to systematically shut genes down. Then she looked at which genes were expressed to determine how the cells’ response changed in each case. Her team singled out 100 different genes that were involved in regulating the response to the pathogens—some of which weren’t previously known to be involved in immune function. The study, published in Science, generated headlines. But according to longtime colleague Dana Pe’er, now chair of computational and systems biology at the Sloan Kettering Institute at the Memorial Sloan Kettering Cancer Center and a member of the Human Cell Atlas Organizing Committee, what really sets Regev apart is the elegance of her work. Regev, says Pe’er, “has a rare, innate ability of seeing complex biology and simplifying it and formalizing it into beautiful, abstract, describable principles.”
From smoothies to fruit salad
There are lots of empty coffee mugs in Regev’s office at the Broad Institute, but very little in the way of decoration. She approaches her science with a businesslike efficiency. “There are many brilliant people,” says Lander. “She’s a brilliant person who can get things done.”
In the fast-changing arena of genomics (“2015 in my field is considered ancient history,” she says), she is known for making the most of the latest innovations—and for helping to spur the next ones. For years, she and others in the field struggled with a dirty secret of RNA sequencing: though its promise has always been precision—the power of knowing the exact code—the techniques produced results that were unspecific. Every cell has only a minuscule amount of RNA. For sequencing purposes, the RNA from millions of cells had to be pooled together. Bulk RNA sequencing left researchers with what she likens to a smoothie. Once it’s blended together, there’s no way to distinguish all the fruits—or in this case, the RNA from individual cells—that went into it. What researchers needed was something more like a fruit salad, a way to separate all the blueberries, raspberries, and blackberries.
In 2011, working with Broad Institute colleague Joshua Levin, PhD ’92, and postdocs Alex Shalek, now at MIT’s Institute for Medical Engineering and Science, and Rahul Satija, now at the New York Genome Center, Regev managed to obtain enough RNA from a single cell to sequence it. To test the method, they sequenced 18 individual dendritic cells from the bone marrow of a mouse. The cells were all obtained in the same way and were expected to be the same type. But to the researchers’ amazement, they were expressing different genes and could be classified into two distinct subtypes. It was like finding out the smoothie you’d been drinking for years had ingredients you’d never known about.
Regev and her colleagues weren’t the only ones figuring out how to sequence a single cell with such sensitivity, nor were they the very first to succeed. Other labs were making similar advances at approximately the same time, each using its own technology and algorithms. And they all faced the same problem: isolating and extracting enough RNA from individual cells was time consuming and expensive. Regev and her colleagues had spent many thousands of dollars to sequence only 18 cells. If the body was full of rare, undiscovered cells, it was going to take an extraordinarily long time to find them.
Skip ahead seven years and the cost of single-cell RNA sequencing is down to only pennies per cell. A critical breakthrough was Drop-Seq, a new technology developed by researchers at Harvard and the Broad Institute, including Regev and members of her lab. The device embeds individual cells into distinct oil droplets with a tiny “bar-coded” bead. When the cell is broken apart for sequencing, some of its RNA attaches to the bead in its droplet. This allows researchers to analyze thousands at once without getting their genetic material mixed up.
Cell theory 2.0
When cell theory was first proposed by German scientists some 180 years ago, it was hard to fathom that our tissues are built from “individual elementary units,” as Theodor Schwann, one of the two scientists credited with the theory, described cells. But it soon became a central tenet of biology, and over the decades and centuries, cells began to give up their secrets. Microscopes improved; new staining and sorting techniques became available. With each advance, new distinctions became possible. Muscle cells could be distinguished from neurons, and then categorized again as smooth or skeletal muscle cells. Cells, it became clear, were all fundamentally similar but came in different forms that had different properties.
By the 21st century, 200 to 300 major cell types had been identified. And while biologists have long recognized that the true number of cell types must be higher, the extent of their diversity is only now coming into full focus, thanks in large part to single-cell RNA sequencing. Regev says that the immune system alone can now be divided into more than 200 cell types and that even our retinas have 100 or more distinct types of neurons. She and her colleagues have discovered several of them.
The idea that knowing so much more about our cells could lead to medical breakthroughs is no longer hypothetical. By sequencing the RNA of individual cancer cells in recent years—“Every cell is an experiment now,” she says—she has found remarkable differences between the cells of a single tumor, even when they have the same mutations. (Last year that work led to Memorial Sloan Kettering’s Paul Marks Prize for Cancer Research.) She found that while some cancers are thought to develop resistance to therapy, a subset of melanoma cells were resistant from the start. And she discovered that two types of brain cancer, oligodendroglioma and astrocytoma, harbor the same cancer stem cells, which could have important implications for how they’re treated.
The excitement in the field has become tangible as more new cell types have been found. And yet Regev realized that if the aim was comprehensive knowledge, the approach needed to be coordinated. If each lab were to rely on its own techniques, it would be hard to standardize the computational tools and the resulting data. The new studies were producing “very nice glimmers of light,” Regev says—“a thing here, a thing there.” But she wanted to make sure those findings could be connected.Regev has also been busily mapping cells from the immune system, brain, gut, and elsewhere. She is not alone. Other labs have started their own mapping projects, each tackling a different part of the body. Last year researchers at the University of Washington attempted to classify every cell type in the microscopic worm C. elegans. “Every single field in biology is saying, ‘Of course we have to look at single-cell resolution,’” says Lander. “How did we ever imagine we were going to solve a problem without single-cell resolution?”
Regev began to advocate creating something more unified: a map that would allow researchers to chart gene expression and cell types across the entire body. Sarah Teichmann had been thinking along the same lines. When she reached out to Regev in late 2015 about the possibility of joining forces, Regev immediately said yes.
A Google Maps for our cells
The Human Cell Atlas is a collaboration among hundreds of biologists, technologists, and software engineers across the globe. Results from single-cell RNA sequencing will be combined with other data points to provide a comprehensive catalogue of all human cells.
But the many researchers involved won’t simply be compiling spreadsheets listing different cell types. The atlas will also reveal where the cells are located in the body, how many there are, what forms they can take, even the developmental history of different cell types as they differentiated from stem cells. And all of this will be made accessible through a data coordination platform and a rich visual interface that Regev compares to Google Maps. It will allow users to zoom in to the molecular level of our cells, but zooming out to the level of tissues and organs will be important too. As a 2017 overview of the Human Cell Atlas by the project’s organizing committee noted, an atlas “is a map that aims to show the relationships among its elements.” Just as corresponding coastlines seen in an atlas of Earth offer visual evidence of continental drift, compiling all the data about our cells in one place could reveal relationships among cells, tissues, and organs, including some that are entirely unexpected. And just as the periodic table made it possible to predict the existence of elements yet to be observed, the Human Cell Atlas, Regev says, could help us predict the existence of cells that haven’t been found.
The plan is not to sequence all 37 trillion cells but to sample from every part of the body. As Regev talks about the project, her enthusiasm evident, she digs up a slide to demonstrate how effective sampling can be. The slide, first only an empty frame of white, begins to fill in, pixel by pixel, with specks of blue and yellow. Soon, even though many of the pixels haven’t yet been filled, the image on the screen is unmistakable: it is Van Gogh’s Starry Night. Likewise, Regev explains, the Human Cell Atlas can give a complete picture even if not every single cell has been sequenced.
To do the sequencing, Regev and Teichmann have welcomed and recruited experts in each different tissue type. Though expected to take years, the project is moving ahead rapidly with such backers as NIH, the EU, the Wellcome Trust, the Manton Foundation, and the Chan Zuckerberg Initiative, which pledged to spend $3 billion to battle disease over the next decade; this year alone it will fund 85 Human Cell Atlas grants. Early results are already pouring in. In March, Swedish researchers working on cells related to human development announced they had sequenced 250,000 individual cells. In May, a team at the Broad made a data set of more than 500,000 immune cells available on a preview site. The goal, Regev says, is for researchers everywhere to be able to use the open-source platform of the Human Cell Atlas to perform joint analyses.
Plenty of challenges remain before the atlas can become a reality. New visualization software must be developed. Sequencing and computational approaches will need to be standardized across a huge number of labs. Conceptual issues, such as what distinguishes one cell type from another, have to be worked through. But the community behind the Human Cell Atlas—including more than 800 individuals as of June—has no shortage of motivation.
One of Regev’s own recent studies, published in August in Nature, is perhaps the best example of how the project could change biology. In mapping cells of the lungs, Regev and Jay Rajagopal’s lab at Massachusetts General Hospital found a new, very rare cell type that primarily expresses a gene linked to cystic fibrosis. Regev now thinks that these rare cells probably play a key role in the disease. More surprising yet, researchers had previously thought that a different cell type was expressing the gene.
“Imagine if somebody wanted to do gene therapy,” Regev says. “You have to fix the gene, but you have to fix it in the right cell.” The Human Cell Atlas could help researchers identify the right cell and understand how the gene in question is regulated by that cell’s extraordinarily complicated molecular networks.
For Regev, the importance of the Human Cell Atlas goes beyond its promise to revolutionize biology and medicine. As she once put it, without an atlas of our cells, “we don’t really know what we’re made of.”