Focus on function helps identify the changes that made us human

It can be difficult to tell which of the many small genetic differences between us and chimps have been significant to our evolution. New research from Jonathan Weissman and colleagues narrowed in on the key differences in how humans and chimps rely on certain genes, including how humans became able to grow comparatively large brains.

Greta Friar | Whitehead Institute
June 22, 2023

Humans split away from our closest animal relatives, chimpanzees, and formed our own branch on the evolutionary tree about seven million years ago. In the time since—brief, from an evolutionary perspective—our ancestors evolved the traits that make us human, including a much bigger brain than chimpanzees and bodies that are better suited to walking on two feet. These physical differences are underpinned by subtle changes at the level of our DNA. However, it can be hard to tell which of the many small genetic differences between us and chimps have been significant to our evolution.

New research from Whitehead Institute Member Jonathan Weissman; University of California, San Francisco Assistant Professor Alex Pollen; Weissman lab postdoc Richard She; Pollen lab graduate student Tyler Fair; and colleagues uses cutting edge tools developed in the Weissman lab to narrow in on the key differences in how humans and chimps rely on certain genes. Their findings, published in the journal Cell on June 20, may provide unique clues into how humans and chimps have evolved, including how humans became able to grow comparatively large brains.

Studying function rather than genetic code

Only a handful of genes are fundamentally different between humans and chimps; the rest of the two species’ genes are typically nearly identical. Differences between the species often come down to when and how cells use those nearly identical genes. However, only some of the many differences in gene use between the two species underlie big changes in physical traits. The researchers developed an approach to narrow in on these impactful differences.

Their approach, using stem cells derived from human and chimp skin samples, relies on a tool called CRISPR interference (CRISPRi) that Weissman’s lab developed. CRISPRi uses a modified version of the CRISPR/Cas9 gene editing system to effectively turn off individual genes. The researchers used CRISPRi to turn off each gene one at a time in a group of human stem cells and a group of chimp stem cells. Then they looked to see whether or not the cells multiplied at their normal rate. If the cells stopped multiplying as quickly or stopped altogether, then the gene that had been turned off was considered essential: a gene that the cells need to be active–producing a protein product–in order to thrive. The researchers looked for instances in which a gene was essential in one species but not the other as a way of exploring if and how there were fundamental differences in the basic ways that human and chimp cells function.

By looking for differences in how cells function with particular genes disabled, rather than looking at differences in the DNA sequence or expression of genes, the approach ignores differences that do not appear to impact cells. If a difference in gene use between species has a large, measurable effect at the level of the cell, this likely reflects a meaningful difference between the species at a larger physical scale, and so the genes identified in this way are likely to be relevant to the distinguishing features that have emerged over human and chimp evolution.

“The problem with looking at expression changes or changes in DNA sequences is that there are many of them and their functional importance is unclear,” says Weissman, who is also a professor of biology at the Massachusetts Institute of Technology and an Investigator with the Howard Hughes Medical Institute. “This approach looks at changes in how genes interact to perform key biological processes, and what we see by doing that is that, even on the short timescale of human evolution, there has been fundamental rewiring of cells.”

After the CRISPRi experiments were completed, She compiled a list of the genes that appeared to be essential in one species but not the other. Then he looked for patterns. Many of the 75 genes identified by the experiments clustered together in the same pathways, meaning the clusters were involved in the same biological processes. This is what the researchers hoped to see. Individual small changes in gene use may not have much of an effect, but when those changes accumulate in the same biological pathway or process, collectively they can cause a substantive change in the species. When the researchers’ approach identified genes that cluster in the same processes, this suggested to them that their approach had worked and that the genes were likely involved in human and chimp evolution.

“Isolating the genetic changes that made us human has been compared to searching for needles in a haystack because there are millions of genetic differences, and most are likely to have negligible effects on traits,” Pollen says. “However, we know that there are lots of small effect mutations that in aggregate may account for many species differences. This new approach allows us to study these aggregate effects, enabling us to weigh the impact of the haystack on cellular functions.”

Researchers think bigger brains may rely on genes regulating how quickly cells divide

One cluster on the list stood out to the researchers: a group of genes essential to chimps, but not to humans, that help to control the cell cycle, which regulates when and how cells decide to divide. Cell cycle regulation has long been hypothesized to play a role in the evolution of humans’ large brains. The hypothesis goes like this: Neural progenitors are the cells that will become neurons and other brain cells. Before becoming mature brain cells, neural progenitors divide multiple times to make more of themselves. The more divisions that the neural progenitors undergo, the more cells the brain will ultimately contain—and so, the bigger it will be. Researchers think that something changed during human evolution to allow neural progenitors to spend less time in a non-dividing phase of the cell cycle and transition more quickly towards division. This simple difference would lead to additional divisions, each of which could essentially double the final number of brain cells.

Consistent with the popular hypothesis that human neural progenitors may undergo more divisions, resulting in a larger brain, the researchers found that several genes that help cells to transition more quickly through the cell cycle are essential in chimp neural progenitor cells but not in human cells. When chimp neural progenitor cells lose these genes, they linger in a non-dividing phase, but when human cells lose them, they keep cycling and dividing. These findings suggest that human neural progenitors may be better able to withstand stresses—such as the loss of cell cycle genes—that would limit the number of divisions the cells undergo, enabling humans to produce enough cells to build a larger brain.

“This hypothesis has been around for a long time, and I think our study is among the first to show that there is in fact a species difference in how the cell cycle is regulated in neural progenitors,” She says. “We had no idea going in which genes our approach would highlight, and it was really exciting when we saw that one of our strongest findings matched and expanded on this existing hypothesis.”

More subjects lead to more robust results

Research comparing chimps to humans often uses samples from only one or two individuals from each species, but this study used samples from six humans and six chimps. By making sure that the patterns they observed were consistent across multiple individuals of each species, the researchers could avoid mistaking the naturally occurring genetic variation between individuals as representative of the whole species. This allowed them to be confident that the differences they identified were truly differences between species.

The researchers also compared their findings for chimps and humans to orangutans, which split from the other species earlier in our shared evolutionary history. This allowed them to figure out where on the evolutionary tree a change in gene use most likely occurred. If a gene is essential in both chimps and orangutans, then it was likely essential in the shared ancestor of all three species; it’s more likely for a particular difference to have evolved once, in a common ancestor, than to have evolved independently multiple times. If the same gene is no longer essential in humans, then its role most likely shifted after humans split from chimps. Using this system, the researchers showed that the changes in cell cycle regulation occurred during human evolution, consistent with the proposal that they contributed to the expansion of the brain in humans.

The researchers hope that their work not only improves our understanding of human and chimp evolution, but also demonstrates the strength of the CRISPRi approach for studying human evolution and other areas of human biology. Researchers in the Weissman and Pollen labs are now using the approach to better understand human diseases—looking for the subtle differences in gene use that may underlie important traits such as whether someone is at risk of developing a disease, or how they will respond to a medication. The researchers anticipate that their approach will enable them to sort through many small genetic differences between people to narrow in on impactful ones underlying traits in health and disease, just as the approach enabled them to narrow in on the evolutionary changes that helped make us human.

Seychelle Vos and Hernandez Moura Silva named HHMI Freeman Hrabowski Scholars

The program supports early-career faculty who have strong potential to become leaders in their fields and to advance diversity, equity, and inclusion.

Lillian Eden | Department of Biology
May 9, 2023

Two faculty members from the MIT Department of Biology have been selected by the Howard Hughes Medical Institute (HHMI) for the inaugural cohort of HHMI Freeman Hrabowski Scholars.

Seychelle Vos, the Robert A. Swanson Career Development Professor of Life Sciences, and Hernandez Moura Silva, an assistant professor of biology and core member of the Ragon Institute of MGH, MIT and Harvard, are among 31 early-career faculty selected for their potential to become leaders in their research fields and to create diverse and inclusive lab environments in which everyone can thrive, according to a press release.

Freeman Hrabowski Scholars are appointed to a five-year term, renewable for a second five-year term after a successful progress evaluation. Each scholar will receive up to $8.6 million over 10 years, including full salary, benefits, a research budget, and scientific equipment. In addition, they will participate in professional development to advance their leadership and mentorship skills.

The Freeman Hrabowski Scholars Program represents a key component of HHMI’s diversity, equity, and inclusion goals. Over the next 20 years, HHMI expects to hire and support up to 150 Freeman Hrabowski Scholars — appointing roughly 30 scholars every other year for the next 10 years. The institute has committed up to $1.5 billion for the Freeman Hrabowski Scholars to be selected over the next decade. The program was named for Freeman A. Hrabowski III, president emeritus of the University of Maryland at Baltimore County, who played a major role in increasing the number of scientists, engineers, and physicians from backgrounds underrepresented in science in the United States.

Seychelle Vos

Seychelle Vos studies how DNA organization impacts gene expression at the atomic level, using cryogenic electron microscopy (cryo-EM), X-ray crystallography, biochemistry, and genetics. Human cells contain about 2 meters of DNA, which is packed so tightly that its entirety is contained within the nucleus, which is only a few microns across. Although DNA needs to be compacted, it also needs to be accessible to, and readable by, the cell’s molecular machinery.

Vos received a BS in genetics from the University of Georgia in 2008 and a PhD from University of California at Berkeley in 2013. During her postdoctoral research at the Max Planck Institute for Biophysical Chemistry in Germany, she determined how the molecular machine responsible for gene expression is regulated near gene promoters.

Vos joined MIT as an assistant professor of biology in fall 2019.

“I am very humbled and honored to have been named a HHMI Freeman Hrabowski Scholar,” Vos says. “It would not have been possible without the hard work of my lab and the help of my colleagues. It provides us with the support to achieve our ambitious research goals.”

Hernandez Moura Silva

Hernandez Moura Silva studies the role of immune cells in the maintenance and normal function of our bodies and tissues, beyond their role in battling infection. Specifically, he looks at a specific type of immune cell called a macrophage and its role in the proper function of white adipose tissue — our fat. White adipose tissue in a healthy state is highly populated by macrophages, including very abundant ones known as “vasculature-associated adipose tissue macrophages,” which are located around the blood vessels. When the activity of these adipose macrophages is disrupted, there are changes in the proper function of the white adipose tissue, which may ultimately link to disease. By understanding macrophage function in healthy tissues, Hernandez hopes to learn how to restore tissue homeostasis in disease.

Hernandez Moura Silva received a BS in biology in 2005 and an MSc in molecular biology in 2008 from the University of Brazil. He received his PhD in 2011 from the University of São Paulo Heart Institute. Silva pursued his postdoctoral work as the Bernard Levine Postdoctoral Fellow in immunology and immuno-metabolism at the New York University School of Medicine Skirball Institute of Biomolecular Medicine.

He joined MIT as an assistant professor of biology in 2022. He is also a core member of the Ragon Institute.

“For an immigrant coming from an underrepresented group, it’s a huge privilege to be granted this opportunity from HHMI that will empower me and my lab to shape the next generation of scientists and provide an environment where people can feel welcome and encouraged to do the science that they love and be successful,” Silva says. “It also aligns with MIT’s commitment to increase diversity and opportunity across the Institute and to become a place where all people can thrive.”

New peptide modulators of the pro-apoptotic protein BAK

Biophysical characteristics such as peptide binding affinity and kinetics do not determine cell death function

Lillian Eden | Department of Biology
May 9, 2023

Billions of times a day, every day of our lives, cells receive signals to initiate the process of cell death. This strategic cell death, also called apoptosis, is one of the tools multicellular organisms use to maintain tissues and regulate immune responses: damaged, old, or superfluous cells are given the green light to, as it were, turn out the lights for the last time.

Programmed cell death is both extremely powerful and extremely regulated: for example, the careful culling of cells between our digits during embryonic development reveals fingers and toes. When programmed cell death goes awry, however, it can have serious consequences. Cells left unchecked can divide unstoppably and aggressively, leading to cancer. Dysregulated apoptotic pathways have also been implicated in neurodegenerative diseases like Alzheimer’s, where unrestrained cell death may play a part in the severity of the disease.

MIT Professor H. Robert Horvitz ‘68 shared a Nobel prize in 2002 for his foundational research on the genetics of programmed cell death and organ development in the nematode, a microscopic roundworm. Horvitz discovered that ced-9, a key gene in programmed cell death in nematodes, was similar in structure and function to the human gene bcl-2.

Targeting members of the BCL-2 protein family has already shown promise in the fight against cancer. For example, approved by the FDA in 2016, the oral drug Venetoclax is a BCL-2 inhibitor used to treat certain types of leukemia.

In a study published online Jan. 26 in Structure, Fiona Aguilar PhD ‘22 (Keating lab) and collaborators focused on a member of the BCL-2 protein family called BAK. When it is active, BAK promotes mitochondrial outer membrane disruption, leading to cell death, and is therefore referred to as a pro-apoptotic protein. But precisely how BAK becomes activated – or inhibited – is unknown.

“A greater understanding of BAK activation is interesting both from a fundamental biochemical and biophysical perspective as well as from the more translational one of BAK as a potential therapeutic target,” says lead author Fiona Aguilar.

BAK exists in two different forms: an inactive monomer and an active oligomer. A few activators of BAK (BIM, truncated BID, and PUMA) have already been identified and these proteins bind directly to BAK, leading to the model that binding of activators trigger changes in protein shape that allow BAK to transition from the inactive to active forms. To further explore this idea, Aguilar identified and characterized a number of other peptides that bind to and regulate BAK. To identify new peptide binders, the team used cell-surface display screening and computational protein design methods, including techniques developed by Keating lab alum Gevorg Grigoryan– dTERMen and TERMify – that use protein structural data to generate new protein sequences likely to bind a protein of interest.

In total, Aguilar et al. discovered 10 diverse new peptide binders of BAK that regulate its function.

Interestingly, some of the BAK-binding peptides inhibited activation rather than promoting it. Aguilar et al. found that inhibitors and activators of BAK shared many characteristics including structure as well as binding affinity and kinetics – the strength and rate that binders associate with and dissociate from BAK.

Newly identified activators had sequences both dissimilar from one another and from the previously known BAK activators BIM, truncated BID, and PUMA. The similarity of the sequence was not necessarily a good indicator of activation or inhibition. For example, an inhibitor and an activator differed by just two amino acids.

Aguilar and colleagues solved the crystal structures of two inhibitor-BAK complexes and one activator-BAK complex and found that the activator interacted with BAK with similar geometry as the two inhibitors. Also, the two inhibitors have only about 40% sequence identity, but bind very similarly to BAK.

Amy Keating, the senior author on the study, says “Fiona was tireless in identifying new peptides, testing their interactions with BAK, determining their functions, and solving structures to look for differences between activators and inhibitors. We were surprised that peptides with such different behaviors shared such common interaction properties.”

Although the puzzle is not yet solved, Aguilar believes the “transition state” between inactive and active forms of BAK is key.

“We think of activators as peptides that preferentially bind to the BAK transition state, whereas inhibitors are those that preferentially bind to the monomeric state,” Aguilar says. “Overall, we should be thinking more about the transition state, what steps are necessary to reach the transition state, and how to target the transition state.”

This study also added two sequences in the human proteome – BNIP5 and PXT1 – to the repertoire of known BAK binders. Not much is known about these sequences, Aguilar says, but the fact that they activate BAK could indicate that they may play a role in apoptotic pathways that have not yet been determined.

“The finding is something that people in the field are pretty excited about,” Aguilar says.

Ultimately, work remains to establish what characteristics of the binders determine their function, and how binding to BAK triggers the conformational changes that activate or inhibit this complex protein.

“It’s still unclear what it is about these sequences that trigger the allosteric network leading to BAK activation, but at least for now we can rule out the hypothesis that binding mode, affinity, and kinetics fully determine how this occurs,” Aguilar says.

Aguilar suggests that it will be interesting also to explore how these peptides interact with BAX, another pro-apoptotic protein in the BCL-2 family that is both structurally and functionally similar to BAK.

Fiona Aguilar is lead author and Amy Keating is senior author; Bob Grant and graduate students Sebastian Swanson, Dia Ghose, and Bonnie Su contributed. Collaborators Stacey Yu and Kristopher Sarosiek, from the Harvard T.H. Chan School of Public Health, helped with cell-based experiments. The research was funded by a National Institute of General Medical Sciences award, the MIT School of Science Fellowship in Cancer Research award, the John W. Jarve (1978) Seed Fund for Science Innovation (MIT) award, an award from the National Cancer Institute, a National Institute of Diabetes and Digestive and Kidney Diseases award, and Alex’s Lemonade Stand Foundation for Childhood Cancers award.

Biologists glean insight into repetitive protein sequences

A computational analysis reveals that many repetitive sequences are shared across proteins and are similar in species from bacteria to humans.

Anne Trafton | MIT News Office
September 13, 2022

About 70 percent of all human proteins include at least one sequence consisting of a single amino acid repeated many times, with a few other amino acids sprinkled in. These “low-complexity regions” are also found in most other organisms.

The proteins that contain these sequences have many different functions, but MIT biologists have now come up with a way to identify and study them as a unified group. Their technique allows them to analyze similarities and differences between LCRs from different species, and helps them to determine the functions of these sequences and the proteins in which they are found.

Using their technique, the researchers have analyzed all of the proteins found in eight different species, from bacteria to humans. They found that while LCRs can vary between proteins and species, they often share a similar role — helping the protein in which they’re found to join a larger-scale assembly such as the nucleolus, an organelle found in nearly all human cells.

“Instead of looking at specific LCRs and their functions, which might seem separate because they’re involved in different processes, our broader approach allows us to see similarities between their properties, suggesting that maybe the functions of LCRs aren’t so disparate after all,” says Byron Lee, an MIT graduate student.

The researchers also found some differences between LCRs of different species and showed that these species-specific LCR sequences correspond to species-specific functions, such as forming plant cell walls.

Lee and graduate student Nima Jaberi-Lashkari are the lead authors of the study, which appears today in eLife. Eliezer Calo, an assistant professor of biology at MIT, is the senior author of the paper.

Large-scale study

Previous research has revealed that LCRs are involved in a variety of cellular processes, including cell adhesion and DNA binding. These LCRs are often rich in a single amino acid such as alanine, lysine, or glutamic acid.

Finding these sequences and then studying their functions individually is a time-consuming process, so the MIT team decided to use bioinformatics — an approach that uses computational methods to analyze large sets of biological data — to evaluate them as a larger group.

“What we wanted to do is take a step back and instead of looking at individual LCRs, to try to take a look at all of them and to see if we could observe some patterns on a larger scale that might help us figure out what the ones that have assigned functions are doing, and also help us learn a bit about what the ones that don’t have assigned functions are doing,” Jaberi-Lashkari says.

To do that, the researchers used a technique called dotplot matrix, which is a way to visually represent amino acid sequences, to generate images of each protein under study. They then used computational image processing methods to compare thousands of these matrices at the same time.

Using this technique, the researchers were able to categorize LCRs based on which amino acids were most frequently repeated in the LCR. They also grouped LCR-containing proteins by the number of copies of each LCR type found in the protein. Analyzing these traits helped the researchers to learn more about the functions of these LCRs.

As one demonstration, the researchers picked out a human protein, known as RPA43, that has three lysine-rich LCRs. This protein is one of many subunits that make up an enzyme called RNA polymerase 1, which synthesizes ribosomal RNA. The researchers found that the copy number of lysine-rich LCRs is important for helping the protein integrate into the nucleolus, the organelle responsible for synthesizing ribosomes.

Biological assemblies

In a comparison of the proteins found in eight different species, the researchers found that some LCR types are highly conserved between species, meaning that the sequences have changed very little over evolutionary timescales. These sequences tend to be found in proteins and cell structures that are also highly conserved, such as the nucleolus.

“These sequences seem to be important for the assembly of certain parts of the nucleolus,” Lee says. “Some of the principles that are known to be important for higher order assembly seem to be at play because the copy number, which might control how many interactions a protein can make, is important for the protein to integrate into that compartment.”

The researchers also found differences between LCRs seen in two different types of proteins that are involved in nucleolus assembly. They discovered that a nucleolar protein known as TCOF contains many glutamine-rich LCRs that can help scaffold the formation of assemblies, while nucleolar proteins with only a few of these glutamic acid-rich LCRs could be recruited as clients (proteins that interact with the scaffold).

Another structure that appears to have many conserved LCRs is the nuclear speckle, which is found inside the cell nucleus. The researchers also found many similarities between LCRs that are involved in forming larger-scale assemblies such as the extracellular matrix, a network of molecules that provides structural support to cells in plants and animals.

The research team also found examples of structures with LCRs that seem to have diverged between species. For example, plants have distinctive LCR sequences in the proteins that they use to scaffold their cell walls, and these LCRs are not seen in other types of organisms.

The researchers now plan to expand their LCR analysis to additional species.

“There’s so much to explore, because we can expand this map to essentially any species,” Lee says. “That gives us the opportunity and the framework to identify new biological assemblies.”

The research was funded by the National Institute of General Medical Sciences, National Cancer Institute, the Ludwig Center at MIT, a National Institutes of Health Pre-Doctoral Training Grant, and the Pew Charitable Trusts.

Brandon (Brady) Weissbourd

Education

  • Graduate: PhD, 2016, Stanford University
  • Undergraduate: BA, 2009, Human Evolutionary Biology, Harvard University

Research Summary

We use the tiny, transparent jellyfish, Clytia hemisphaerica, to ask questions at the interface of nervous system evolution, development, regeneration, and function. Our foundation is in systems neuroscience, where we use genetic and optical techniques to examine how behavior arises from the activity of networks of neurons. Building from this work, we investigate how the Clytia nervous system is so robust, both to the constant integration of newborn neurons and following large-scale injury. Lastly, we use Clytia’s evolutionary position to study principles of nervous system evolution and make inferences about the ultimate origins of nervous systems.

Awards

  • Searle Scholar Award, 2024
  • Klingenstein-Simons Fellowship Award in Neuroscience, 2023
  • Pathway to Independence Award (K99/R00), National Institute of Neurological Disorders and Stroke, 2020
  • Life Sciences Research Foundation Fellow, 2017
New CRISPR-based map ties every human gene to its function

Jonathan Weissman and collaborators used their single-cell sequencing tool Perturb-seq on every expressed gene in the human genome, linking each to its job in the cell.

Eva Frederick | Whitehead Institute
June 9, 2022

The Human Genome Project was an ambitious initiative to sequence every piece of human DNA. The project drew together collaborators from research institutions around the world, including MIT’s Whitehead Institute for Biomedical Research, and was finally completed in 2003. Now, over two decades later, MIT Professor Jonathan Weissman and colleagues have gone beyond the sequence to present the first comprehensive functional map of genes that are expressed in human cells. The data from this project, published online June 9 in Cell, ties each gene to its job in the cell, and is the culmination of years of collaboration on the single-cell sequencing method Perturb-seq.

The data are available for other scientists to use. “It’s a big resource in the way the human genome is a big resource, in that you can go in and do discovery-based research,” says Weissman, who is also a member of the Whitehead Institute and an investigator with the Howard Hughes Medical Institute. “Rather than defining ahead of time what biology you’re going to be looking at, you have this map of the genotype-phenotype relationships and you can go in and screen the database without having to do any experiments.”

The screen allowed the researchers to delve into diverse biological questions. They used it to explore the cellular effects of genes with unknown functions, to investigate the response of mitochondria to stress, and to screen for genes that cause chromosomes to be lost or gained, a phenotype that has proved difficult to study in the past. “I think this dataset is going to enable all sorts of analyses that we haven’t even thought up yet by people who come from other parts of biology, and suddenly they just have this available to draw on,” says former Weissman Lab postdoc Tom Norman, a co-senior author of the paper.

Pioneering Perturb-seq

The project takes advantage of the Perturb-seq approach that makes it possible to follow the impact of turning on or off genes with unprecedented depth. This method was first published in 2016 by a group of researchers including Weissman and fellow MIT professor Aviv Regev, but could only be used on small sets of genes and at great expense.

The massive Perturb-seq map was made possible by foundational work from Joseph Replogle, an MD-PhD student in Weissman’s lab and co-first author of the present paper. Replogle, in collaboration with Norman, who now leads a lab at Memorial Sloan Kettering Cancer Center; Britt Adamson, an assistant professor in the Department of Molecular Biology at Princeton University; and a group at 10x Genomics, set out to create a new version of Perturb-seq that could be scaled up. The researchers published a proof-of-concept paper in Nature Biotechnology in 2020.

The Perturb-seq method uses CRISPR-Cas9 genome editing to introduce genetic changes into cells, and then uses single-cell RNA sequencing to capture information about the RNAs that are expressed resulting from a given genetic change. Because RNAs control all aspects of how cells behave, this method can help decode the many cellular effects of genetic changes.

Since their initial proof-of-concept paper, Weissman, Regev, and others have used this sequencing method on smaller scales. For example, the researchers used Perturb-seq in 2021 to explore how human and viral genes interact over the course of an infection with HCMV, a common herpesvirus.

In the new study, Replogle and collaborators including Reuben Saunders, a graduate student in Weissman’s lab and co-first author of the paper, scaled up the method to the entire genome. Using human blood cancer cell lines as well noncancerous cells derived from the retina, he performed Perturb-seq across more than 2.5 million cells, and used the data to build a comprehensive map tying genotypes to phenotypes.

Delving into the data

Upon completing the screen, the researchers decided to put their new dataset to use and examine a few biological questions. “The advantage of Perturb-seq is it lets you get a big dataset in an unbiased way,” says Tom Norman. “No one knows entirely what the limits are of what you can get out of that kind of dataset. Now, the question is, what do you actually do with it?”

The first, most obvious application was to look into genes with unknown functions. Because the screen also read out phenotypes of many known genes, the researchers could use the data to compare unknown genes to known ones and look for similar transcriptional outcomes, which could suggest the gene products worked together as part of a larger complex.

The mutation of one gene called C7orf26 in particular stood out. Researchers noticed that genes whose removal led to a similar phenotype were part of a protein complex called Integrator that played a role in creating small nuclear RNAs. The Integrator complex is made up of many smaller subunits — previous studies had suggested 14 individual proteins — and the researchers were able to confirm that C7orf26 made up a 15th component of the complex.

They also discovered that the 15 subunits worked together in smaller modules to perform specific functions within the Integrator complex. “Absent this thousand-foot-high view of the situation, it was not so clear that these different modules were so functionally distinct,” says Saunders.

Another perk of Perturb-seq is that because the assay focuses on single cells, the researchers could use the data to look at more complex phenotypes that become muddied when they are studied together with data from other cells. “We often take all the cells where ‘gene X’ is knocked down and average them together to look at how they changed,” Weissman says. “But sometimes when you knock down a gene, different cells that are losing that same gene behave differently, and that behavior may be missed by the average.”

The researchers found that a subset of genes whose removal led to different outcomes from cell to cell were responsible for chromosome segregation. Their removal was causing cells to lose a chromosome or pick up an extra one, a condition known as aneuploidy. “You couldn’t predict what the transcriptional response to losing this gene was because it depended on the secondary effect of what chromosome you gained or lost,” Weissman says. “We realized we could then turn this around and create this composite phenotype looking for signatures of chromosomes being gained and lost. In this way, we’ve done the first genome-wide screen for factors that are required for the correct segregation of DNA.”

“I think the aneuploidy study is the most interesting application of this data so far,” Norman says. “It captures a phenotype that you can only get using a single-cell readout. You can’t go after it any other way.”

The researchers also used their dataset to study how mitochondria responded to stress. Mitochondria, which evolved from free-living bacteria, carry 13 genes in their genomes. Within the nuclear DNA, around 1,000 genes are somehow related to mitochondrial function. “People have been interested for a long time in how nuclear and mitochondrial DNA are coordinated and regulated in different cellular conditions, especially when a cell is stressed,” Replogle says.

The researchers found that when they perturbed different mitochondria-related genes, the nuclear genome responded similarly to many different genetic changes. However, the mitochondrial genome responses were much more variable.

“There’s still an open question of why mitochondria still have their own DNA,” said Replogle. “A big-picture takeaway from our work is that one benefit of having a separate mitochondrial genome might be having localized or very specific genetic regulation in response to different stressors.”

“If you have one mitochondria that’s broken, and another one that is broken in a different way, those mitochondria could be responding differentially,” Weissman says.

In the future, the researchers hope to use Perturb-seq on different types of cells besides the cancer cell line they started in. They also hope to continue to explore their map of gene functions, and hope others will do the same. “This really is the culmination of many years of work by the authors and other collaborators, and I’m really pleased to see it continue to succeed and expand,” says Norman.

Tracing a cancer’s family tree to its roots reveals how tumors grow

Family trees of lung cancer cells reveal how cancer evolves from its earliest stages to an aggressive form capable of spreading throughout the body.

Greta Friar | Whitehead Institute
May 5, 2022

Over time, cancer cells can evolve to become resistant to treatment, more aggressive, and metastatic — capable of spreading to additional sites in the body and forming new tumors. The more of these traits that a cancer evolves, the more deadly it becomes. Researchers want to understand how cancers evolve these traits in order to prevent and treat deadly cancers, but by the time cancer is discovered in a patient, it has typically existed for years or even decades. The key evolutionary moments have come and gone unobserved.

MIT Professor Jonathan Weissman and collaborators have developed an approach to track cancer cells through the generations, allowing researchers to follow their evolutionary history. This lineage-tracing approach uses CRISPR technology to embed each cell with an inheritable and evolvable DNA barcode. Each time a cell divides, its barcode gets slightly modified. When the researchers eventually harvest the descendants of the original cells, they can compare the cells’ barcodes to reconstruct a family tree of every individual cell, just like an evolutionary tree of related species. Then researchers can use the cells’ relationships to reconstruct how and when the cells evolved important traits. Researchers have used similar approaches to follow the evolution of the virus that causes Covid-19, in order to track the origins of variants of concern.

Weissman and collaborators have used their lineage-tracing approach before to study how metastatic cancer spreads throughout the body. In their latest work, Weissman; Tyler Jacks, the Daniel K. Ludwig Scholar and David H. Koch Professor of Biology at MIT; and computer scientist Nir Yosef, associate professor at the University of California at Berkeley and the Weizmann Institute of Science, record their most comprehensive cancer cell history to date. The research, published today in Cell, tracks lung cancer cells from the very first activation of cancer-causing mutations. This detailed tumor history reveals new insights into how lung cancer progresses and metastasizes, demonstrating the wealth of understanding that lineage tracing can provide.

“This is a new way of looking at cancer evolution with much higher resolution,” says Weissman, who is a professor of biology at MIT, a member of the Whitehead Institute for Biomedical Research, and an investigator with Howard Hughes Medical Institute. “Previously, the critical events that cause a tumor to become life-threatening have been opaque because they are lost in a tumor’s distant past, but this gives us a window into that history.”

In order to track cancer from its very beginning, the researchers developed an approach to simultaneously trigger cancer-causing mutations in cells and start recording the cells’ history. They engineered mice such that when their lung cells were exposed to a tailor-made virus, that exposure activated a cancer-causing mutation in the Kras gene and deactivated tumor suppressing gene Trp53 in the cells, as well as activating the lineage tracing technology. The mouse model, developed in Jacks’ lab, was also engineered so that lung cancer would develop in it very similarly to how it would in humans.

“In this model, cancer cells develop from normal cells and tumor progression occurs over an extended time in its native environment. This closely replicates what occurs in patients,” Jacks says. Indeed, the researchers’ findings closely align with data about disease progression in lung cancer patients.

The researchers let the cancer cells evolve for several months before harvesting them. They then used a computational approach developed in their previous work to reconstruct the cells’ family trees from their modified DNA barcodes. They also measured gene expression in the cells using RNA sequencing to characterize each individual cell’s state. With this information, they began to piece together how this type of lung cancer becomes aggressive and metastatic.

“Revealing the relationships between cells in a tumor is key to making sense of their gene expression profiles and gaining insight into the emergence of aggressive states,” says Yosef, who is a co-corresponding author on both the current work and the previous lineage tracing paper.

The results showed significant diversity between subpopulations of cells within the same tumor. In this model, cancer cells evolved primarily through inheritable changes to their gene expression, rather than through genetic mutations. Certain subpopulations had evolved to become more fit — better at growth and survival — and more aggressive, and over time they dominated the tumor. Genes that the researchers identified as commonly expressed in the fittest cells could be good candidates for possible therapeutic targets in future research. The researchers also discovered that metastases originated only from these groups of dominant cells, and only late in their evolution. This is different from what has been proposed for some other cancers, in which cells may gain the ability to metastasize early in their evolution. This insight could be important for cancer treatment; metastasis is often when cancers become deadly, and if researchers know which types of cancer develop the ability to metastasize in this stepwise manner, they can design interventions to stop the progression.

“In order to develop better therapies, it’s important to understand the fundamental principles that tumors adopt to develop,” says co-first author Dian Yang, a Damon Runyon Postdoctoral Fellow in Weissman’s lab. “In the future, we want to be able to look at the state of the cancer cells when a patient comes in, and be able to predict how that cancer’s going to evolve, what the risks are, and what is the best treatment to stop that evolution.”

The researchers also figured out important details of the evolutionary paths that cancer subpopulations take to become fit and aggressive. Cells evolve through different states, defined by key characteristics that the cell has at that point in time. In this cancer model the researchers found that early on, cells in a tumor quickly diversified, switching between many different states. However, once a subpopulation landed in a particularly fit and aggressive state, it stayed there, dominating the tumor from that stable state. Furthermore, the ultimately dominant cells seemed to follow one of two distinct paths through different cell states. Either of those paths could then lead to further progression that enabled cancers to enter aggressive “mesenchymal” cell states, which are linked to metastasis.

After the researchers thoroughly mapped the cancer cells’ evolutionary paths, they wondered how those paths would be affected if the cells experienced additional cancer-linked mutations, so they deactivated one of two additional tumor suppressors. One of these affected which state cells stabilized in, while the other led cells to follow a completely new evolutionary pathway to fitness.

The researchers hope that others will use their approach to study all kinds of questions about cancer evolution, and they already have a number of questions in mind for themselves. One goal is to study the evolution of therapeutic resistance, by seeing how cancers evolve in response to different treatments. Another is to study how cancer cells’ local environments shape their evolution.

“The strength of this approach is that it lets us study the evolution of cancers with fine-grained detail,” says co-first author Matthew Jones, a graduate student in the Weissman and Yosef labs. “Every time there is a shift from bulk to single-cell analysis in a technology or approach, it dramatically widens the scope of the biological insights we can attain, and I think we are seeing something like that here.”

An ‘oracle’ for predicting the evolution of gene regulation

Researchers created a mathematical framework to examine the genome and detect signatures of natural selection, deciphering the evolutionary past and future of non-coding DNA.

Raleigh McElvery
March 9, 2022

Despite the sheer number of genes that each human cell contains, these so-called “coding” DNA sequences comprise just 1% of our entire genome. The remaining 99% is made up of “non-coding” DNA — which, unlike coding DNA, does not carry the instructions to build proteins.

One vital function of this non-coding DNA, also called “regulatory” DNA, is to help turn genes on and off, controlling how much (if any) of a protein is made. Over time, as cells replicate their DNA to grow and divide, mutations often crop up in these non-coding regions — sometimes tweaking their function and changing the way they control gene expression. Many of these mutations are trivial, and some are even beneficial. Occasionally, though, they can be associated with increased risk of common diseases, such as type 2 diabetes, or more life-threatening ones, including cancer.

To better understand the repercussions of such mutations, researchers have been hard at work on mathematical maps that allow them to look at an organism’s genome, predict which genes will be expressed, and determine how that expression will affect the organism’s observable traits. These maps, called fitness landscapes, were conceptualized roughly a century ago to understand how genetic makeup influences one common measure of organismal fitness in particular: reproductive success. Early fitness landscapes were very simple, often focusing on a limited number of mutations. Much richer data sets are now available, but researchers still require additional tools to characterize and visualize such complex data. This ability would not only facilitate a better understanding of how individual genes have evolved over time, but would also help to predict what sequence and expression changes might occur in the future.

In a new study published on March 9 in Nature, a team of scientists has developed a framework for studying the fitness landscapes of regulatory DNA. They created a neural network model that, when trained on hundreds of millions of experimental measurements, was capable of predicting how changes to these non-coding sequences in yeast affected gene expression. They also devised a unique way of representing the landscapes in two dimensions, making it easy to understand the past and forecast the future evolution of non-coding sequences in organisms beyond yeast — and even design custom gene expression patterns for gene therapies and industrial applications.

“We now have an ‘oracle’ that can be queried to ask: What if we tried all possible mutations of this sequence? Or, what new sequence should we design to give us a desired expression?” says Aviv Regev, a professor of biology at MIT (on leave), core member of the Broad Institute of Harvard and MIT (on leave), head of Genentech Research and Early Development, and the study’s senior author. “Scientists can now use the model for their own evolutionary question or scenario, and for other problems like making sequences that control gene expression in desired ways. I am also excited about the possibilities for machine learning researchers interested in interpretability; they can ask their questions in reverse, to better understand the underlying biology.”

Prior to this study, many researchers had simply trained their models on known mutations (or slight variations thereof) that exist in nature. However, Regev’s team wanted to go a step further by creating their own unbiased models capable of predicting an organism’s fitness and gene expression based on any possible DNA sequence — even sequences they’d never seen before. This would also enable researchers to use such models to engineer cells for pharmaceutical purposes, including new treatments for cancer and autoimmune disorders.

To accomplish this goal, Eeshit Dhaval Vaishnav, a graduate student at MIT and co-first author, Carl de Boer, now an assistant professor at the University of British Columbia, and their colleagues created a neural network model to predict gene expression. They trained it on a dataset generated by inserting millions of totally random non-coding DNA sequences into yeast, and observing how each random sequence affected gene expression. They focused on a particular subset of non-coding DNA sequences called promoters, which serve as binding sites for proteins that can switch nearby genes on or off.

“This work highlights what possibilities open up when we design new kinds of experiments to generate the right data to train models,” Regev says. “In the broader sense, I believe these kinds of approaches will be important for many problems — like understanding genetic variants in regulatory regions that confer disease risk in the human genome, but also for predicting the impact of combinations of mutations, or designing new molecules.”

Regev, Vaishnav, de Boer, and their coauthors went on to test their model’s predictive abilities in a variety of ways, in order to show how it could help demystify the evolutionary past — and possible future — of certain promoters. “Creating an accurate model was certainly an accomplishment, but, to me, it was really just a starting point,” Vaishnav explains.

First, to determine whether their model could help with synthetic biology applications like producing antibiotics, enzymes, and food, the researchers practiced using it to design promoters that could generate desired expression levels for any gene of interest. They then scoured other scientific papers to identify fundamental evolutionary questions, in order to see if their model could help answer them. The team even went so far as to feed their model a real-world population data set from one existing study, which contained genetic information from yeast strains around the world. In doing so, they were able to delineate thousands of years of past selection pressures that sculpted the genomes of today’s yeast.

But, in order to create a powerful tool that could probe any genome, the researchers knew they’d need to find a way to forecast the evolution of non-coding sequences even without such a comprehensive population data set. To address this goal, Vaishnav and his colleagues devised a computational technique that allowed them to plot the predictions from their framework onto a two-dimensional graph. This helped them show, in a remarkably simple manner, how any non-coding DNA sequence would affect gene expression and fitness, without needing to conduct any time-consuming experiments at the lab bench.

“One of the unsolved problems in fitness landscapes was that we didn’t have an approach for visualizing them in a way that meaningfully captured the evolutionary properties of sequences,” Vaishnav explains. “I really wanted to find a way to fill that gap, and contribute to the longstanding vision of creating a complete fitness landscape.”

Martin Taylor, a professor of genetics at the University of Edinburgh’s Medical Research Council Human Genetics Unit who was not involved in the research, says the study shows that artificial intelligence can not only predict the effect of regulatory DNA changes, but also reveal the underlying principles that govern millions of years of evolution.

Despite the fact that the model was trained on just a fraction of yeast regulatory DNA in a few growth conditions, he’s impressed that it’s capable of making such useful predictions about the evolution of gene regulation in mammals.

“There are obvious near-term applications, such as the custom design of regulatory DNA for yeast in brewing, baking, and biotechnology,” he explains. “But extensions of this work could also help identify disease mutations in human regulatory DNA that are currently difficult to find and largely overlooked in the clinic. This work suggests there is a bright future for AI models of gene regulation trained on richer, more complex, and more diverse data sets.”

Even before the study was formally published, Vaishnav began receiving queries from other researchers hoping to use the model to devise non-coding DNA sequences for use in gene therapies.

“People have been studying regulatory evolution and fitness landscapes for decades now,” Vaishnav says. “I think our framework will go a long way in answering fundamental, open questions about the evolution and evolvability of gene regulatory DNA — and even help us design biological sequences for exciting new applications.”

New high-throughput method greatly expands view of how mutations impact cells

Broad scientists have developed a new approach for studying the functional effects of the millions of mutations associated with cancer and other diseases

Tom Ulrich | Broad Institute
January 27, 2022

There are millions of mutations and other genetic variations in cancer. Understanding which of these mutations is an impactful tumor “driver” compared to an innocuous “passenger”, and what each of the drivers does to the cancer cell, however, has been a challenging undertaking. Many studies rely on bespoke, time-consuming, gene-specific approaches that provide one-dimensional views into a given mutation’s broader functional impacts. Alternatively, computational predictions can provide functional insights, but those findings must then be confirmed through experiments.

Now, in a report published in Nature Biotechnology, a research team at the Broad Institute of MIT and Harvard has unveiled a massive-scale, high resolution method for functionally assessing large numbers of protein-coding mutations simultaneously, one that returns rich phenotypic information and which could potentially be used to study any mutation in any gene in cancer and perhaps other diseases. Their results, gained through proof-of-concept experiments with cancer cell lines, also show that individual mutations can have a spectrum of effects not only on their impacted genes but also on molecular pathways and cell state as a whole, and add nuance to the long-accepted practice of dividing cancer mutations into so-called “drivers” and “passengers.”

“When you look at the genetic data from patients’ tumors, you see that the majority of cancer-associated mutations are actually quite rare, which means we have few insights into what these mutations do,” said Jesse Boehm of the Broad’s Cancer Program, who was co-senior author of the study with Aviv Regev, a Broad core institute member now at Genentech, a member of the Roche Group. “For cancer precision medicine to become a reality, we need a firm understanding of the function of each mutation, but a major challenge has been defining an experimental approach that could be implemented in the lab at the scale required. This new method may be the tool we need.”

The new method, called single-cell expression-based variant impact phenotyping (sc-eVIP), builds on Perturb-seq — an approach developed in 2016 by Regev and colleagues for manipulating genes and exploring the consequences of those manipulations using high-throughput single-cell RNA sequencing —  and eVIP, a method also developed in 2016 by Boehm and colleagues for profiling cancer variants at low scale using RNA measurements. While Perturb-seq assays originally relied on CRISPR to introduce mutations into cells, the sc-eVIP team adopted an overexpression-based approach, engineering DNA-barcoded gene constructs for each mutation of interest and introducing them into pools of cells in such a way that the cells expressed the mutated genes at higher-than-normal levels.

By then recording each perturbed cell’s expression profile using single cell RNA sequencing, the team could both identify which mutation a given cell carried (based on the constructs’ unique barcodes) and examine the mutation’s broader impact on the cell’s overall expression state. This approach provides a highly detailed view of a mutation’s impact on a variety of molecular pathways and circuits, and does not need to be adapted for each new gene studied.

“In a sense, we’re using the cell as a biosensor,” said Oana Ursu, a postdoctoral fellow in the Regev lab, formerly within the Broad’s Klarman Cell Observatory and now at Genentech, and co-first author of the study with JT Neal, a senior group leader in the Broad’s Cancer Program. “By looking at the expression changes that take place when we overexpress a mutated gene, we can learn whether it has a meaningful impact. But also, we can compare and categorize variants based on the changes they trigger, and look for patterns in the biology they affect.”

“Most of the technologies developed for interpreting coding variants up to now have been very scalable, but have had relatively simple readouts like cell viability or maybe looked at a single trait. Their information content has been low, and it takes a lot of work to optimize them,” said Neal. “With sc-eVIP, we’ve engineered a comprehensive approach that’s high throughput and information-rich, which could be a real boon for large-scale variant-to-function studies.”

To test sc-eVIP’s potential, the team chose to study TP53 — the most commonly mutated gene in cancer — and KRAS — which encodes a key oncogene responsible for abnormal growth of many cancers. Neal, Ursu, and their collaborators generated constructs containing 200 known TP53 and KRAS mutations (including cancer-associated mutations and control mutations known to leave gene function unaffected) and introduced them into 300,000 lung cancer cells, and captured each cell’s individual expression profile. Based on those profiles, the team categorized each mutation as either “wildtype-like” (that is, effectively functionally indistinguishable from the unmutated gene) or “putatively impactful,” from there further defining mutations based on whether they reduced or enhanced the gene’s function.

The profiles also revealed each mutation’s broader impact on cell state, based on how the activity of a variety of pathways changed across single cells. For instance, the sc-eVIP data revealed KRAS mutations that fall along a continuum in how they impact cell state at the population level, from having no impact to influencing subtle shifts in cellular abundances to causing outright activation or repression of key pathways in a majority of cells. These findings suggest that different mutations within the same gene can influence cell state along a spectrum of impact.

“The cancer community has long embraced a binary conceptual framework of ‘driver’ mutations, ones that promote cancer development and progression, versus ‘passenger’ mutations, which are completely inert and just happened to arise along the way,” Boehm noted. “These initial findings suggest that biologically those categories are likely overly simplistic, that there’s actually a continuum of functional impact from inert to completely tumorigenic.”

While the team focused on cancer-associated genes and mutations for this study, they noted that sc-eVIP is gene-agnostic, highly scalable, and that using single cell RNA sequencing as a readout offers an efficient and generalizable approach to producing rich phenotypic data. They also calculated that it should be possible to thoroughly characterize most mutations with only 20 to a few hundred cells. Based on those numbers, it may be possible with sc-eVIP to generate a first-draft functional map of more than 2 million variants in approximately 200 known cancer genes with 71 million cells.

“If we can map where every cancer-associated variant fits on the continuum of impact in a variety of cancers and cell types,” Boehm said, “we’ll have a much better grasp of how the interplay of variants affects cell state, which in turn affects cancer development, growth, and response. Such knowledge would represent a true advance toward cancer precision medicine.”

Support for this study came from the National Cancer Institute, the National Human Genome Research Institute, the Mark Foundation for Cancer Research, the Howard Hughes Medical Institute, the Broadnext10 and Variant to Function programs and the Klarman Cell Observatory at the Broad Institute, and other sources.

Paper(s) cited:

Ursu O, Neal JT, et al. Massively parallel phenotyping of coding variants in cancer with Perturb-seqNature Biotechnology. Online January 20, 2022. DOI:10.1038/s41587-021-01160-7.

Blending machine learning and biology to predict cell fates and other changes
Greta Friar | Whitehead Institute
February 1, 2022

Imagine a ball thrown in the air: it curves up, then down, tracing an arc to a point on the ground some distance away. The path of the ball can be described with a simple mathematical equation, and if you know the equation, you can figure out where the ball is going to land. Biological systems tend to be harder to forecast, but Whitehead Institute Member Jonathan Weissman, postdoc in his lab Xiaojie Qiu, and collaborators at the University of Pittsburgh School of Medicine are working on making the path taken by cells as predictable as the arc of a ball. Rather than looking at how cells move through space, they are considering how cells change with time.

Weissman, Qiu, and collaborators Jianhua Xing, professor of computational and systems biology at the University of Pittsburgh School of Medicine, and Xing lab graduate student Yan Zhang have built a machine learning framework that can define the mathematical equations describing a cell’s trajectory from one state to another, such as its development from a stem cell into one of several different types of mature cell. The framework, called dynamo, can also be used to figure out the underlying mechanisms—the specific cocktail of gene activity—driving changes in the cell. Researchers could potentially use these insights to manipulate cells into taking one path instead of another, a common goal in biomedical research and regenerative medicine.  

The researchers describe dynamo in a paper published in the journal Cell on February 1. They explain the framework’s many analytical capabilities and use it to help understand mechanisms of human blood cell production, such as why one type of blood cell forms first (appears more rapidly than others).

“Our goal is to move towards a more quantitative version of single cell biology,” Qiu says. “We want to be able to map how a cell changes in relation to the interplay of regulatory genes as accurately as an astronomer can chart a planet’s movement in relation to gravity, and then we want to understand and be able to control those changes.”

How to map a cell’s future journey

 Dynamo uses data from many individual cells to come up with its equations. The main information that it requires is how the expression of different genes in a cell changes from moment to moment. The researchers estimate this by looking at changes in the amount of RNA over time, because RNA is a measurable product of gene expression. In the same way that knowing the starting position and velocity of a ball is necessary to understand the arc it will follow, researchers use the starting levels of RNAs and how those RNA levels are changing to predict the path of the cell. However, calculating changes in the amount of RNA from single cell sequencing data is challenging, because sequencing only measures RNA once. Researchers must then use clues like RNA-being-made at the time of sequencing and equations for RNA turnover to estimate how RNA levels were changing. Qiu and colleagues had to improve on previous methods in several ways in order to get clean enough measurements for dynamo to work. In particular, they used a recently developed experimental method that tags new RNA to distinguish it from old RNA, and combined this with sophisticated mathematical modeling, to overcome limitations of older estimation approaches.

The researchers’ next challenge was to move from observing cells at discrete points in time to a continuous picture of how cells change. The difference is like switching from a map showing only landmarks to a map that shows the uninterrupted landscape, making it possible to trace the paths between landmarks. Led by Qiu and Zhang, the group used machine learning to reveal continuous functions that define these spaces. 

“There have been tremendous advances in methods for broadly profiling transcriptomes and other ‘omic’ information with single-cell resolution. The analytical tools for exploring these data, however, to date have been descriptive instead of predictive. With a continuous function, you can start to do things that weren’t possible with just accurately sampled cells at different states. For example, you can ask: if I changed one transcription factor, how is it going to change the expression of the other genes?” says Weissman, who is also a professor of biology at the Massachusetts Institute of Technology (MIT), a member of the Koch Institute for Integrative Biology Research at MIT, and an investigator of the Howard Hughes Medical Institute.

Dynamo can visualize these functions by turning them into math-based maps. The terrain of each map is determined by factors like the relative expression of key genes. A cell’s starting place on the map is determined by its current gene expression dynamics. Once you know where the cell starts, you can trace the path from that spot to find out where the cell will end up.

The researchers confirmed dynamo’s cell fate predictions by testing it against cloned cells–cells that share the same genetics and ancestry. One of two nearly-identical clones would be sequenced while the other clone went on to differentiate. Dynamo’s predictions for what would have happened to each sequenced cell matched what happened to its clone.

Moving from math to biological insight and non-trivial predictions

With a continuous function for a cell’s path over time determined, dynamo can then gain insights into the underlying biological mechanisms. Calculating derivatives of the function provides a wealth of information, for example by allowing researchers to determine the functional relationships between genes—whether and how they regulate each other. Calculating acceleration can show that a gene’s expression is growing or shrinking quickly even when its current level is low, and can be used to reveal which genes play key roles in determining a cell’s fate very early in the cell’s trajectory. The researchers tested their tools on blood cells, which have a large and branching differentiation tree. Together with blood cell expert Vijay Sankaran of Boston Children’s Hospital, the Dana-Farber Cancer Institute, Harvard Medical School, and Broad Institute of MIT and Harvard, and Eric Lander of Broad Institute, they found that dynamo accurately mapped blood cell differentiation and confirmed a recent finding that one type of blood cell, megakaryocytes, forms earlier than others. Dynamo also discovered the mechanism behind this early differentiation: the gene that drives megakaryocyte differentiation, FLI1, can self-activate, and because of this is present at relatively high levels early on in progenitor cells. This predisposes the progenitors to differentiate into megakaryocytes first.

The researchers hope that dynamo could not only help them understand how cells transition from one state to another, but also guide researchers in controlling this. To this end, dynamo includes tools to simulate how cells will change based on different manipulations, and a method to find the most efficient path from one cell state to another. These tools provide a powerful framework for researchers to predict how to optimally reprogram any cell type to another, a fundamental challenge in stem cell biology and regenerative medicine, as well as to generate hypotheses of how other genetic changes will alter cells’ fate. There are a variety of possible applications.

“If we devise a set of equations that can describe how genes within a cell regulate each other, we can computationally describe how to transform terminally differentiated cells into stem cells, or predict how a cancer cell may respond to various combinations of drugs that would be impractical to test experimentally,” Xing says.

Dynamo’s computational modeling can be used to predict the most likely path that a cell will follow when reprogramming one cell type to another, as well as the path that a cell will take after specific genetic manipulations. 

Dynamo moves beyond merely descriptive and statistical analyses of single cell sequencing data to derive a predictive theory of cell fate transitions. The dynamo toolset can provide deep insights into how cells change over time, hopefully making cells’ trajectories as predictable for researchers as the arc of a ball, and therefore also as easy to change as switching up a pitch.