A team of more than 400 researchers has looked at the genomewide binding of more than 100 regulatory elements in nearly 150 different cell lines through more than 20 different types of experiments, and in doing so, has assigned some form of biochemical function to 80 percent of the genome.
The results were published on Sept. 5, 2012, in more than 30 papers.
What they got for all that money and effort, National Human Genome Research Institute director Eric Green told reporters at a press conference announcing the work, is "the first comprehensive views of the functional landscape of the human genome."
The work is the logical extension of the Human Genome Project, which provided the first draft sequence of the entire human genome roughly a decade ago. "Even as the finish line of the human genome project came into view at the turn of the century," Green said, "those of us at the National Human Genome Research Institute, or NHGRI, recognized that just knowing the sequence of the human genome would not be enough."
While the sequence was enough to identify most protein-coding genes, "we understood precious little about the signals that turned genes on or off, or that controlled the amount of protein produced in different tissues. In short, we had more questions than answers about how the genome actually worked." ENCODE was designed to find some of those answers.
The findings that were reported in yesterday's data deluge are as multifaceted as one might expect from a project of that scale. And as might be expected from any scientific endeavor, there are still more questions than answers—though Green told reporters that "the questions we can now ask are more sophisticated and will give us better answers."
Still, at the press conference, two themes emerged.
One—an always-useful lesson in science and elsewhere—is that not everything incomprehensible is junk.
The phrase "junk DNA" has been bandied about for decades to describe DNA that did not appear to have any useful function. At one point, some researchers believed that the majority of human DNA might be such junk DNA.
The concept of junk DNA has always had its critics, and the amount of purportedly junk DNA has decreased as scientists have found more sophisticated ways to look for and at DNA function. (See BioWorld Today, June 14, 2007, and Sept. 19, 2011.)
And the new studies, Ewan Birney of the European Bioinformatics Institute said, provide further evidence that the concept of junk DNA is mostly inaccurate. Not completely so—Birney also said that some true junk or parasite DNA exists. "I find it hard to believe," he said, "that everything is really important."
But the studies published yesterday have assigned some sort of biochemical function to more than 80 percent of the bases in the human genome, Junk DNA, Birney said, is for the most part "an outdated metaphor."
The researchers also expressed hope that the ENCODE consortium, in doing what it was designed to do—namely, shed light on how gene expression is regulated—will increase the utility of sequencing data that preceded it.
Genomewide association studies (GWAS) based on such sequencing data have been successful at finding mutations in protein-coding regions that are behind rare diseases. But the method has had little success in finding so-called actionable variants for common diseases. The hope that GWAS would open up a plethora of drug targets has not been borne out to date, and many researchers now assume that common diseases are due to many different rare genetic variants instead of a few common ones—a scenario that does not lend itself easily to pharmacological exploitation.
But John Stamatoyannopoulos, of the University of Washington, said one conclusion from the ENCODE data is that "the GWAS studies that have been performed contain far more information than was previously believed, because hundreds of additional changes that were not thought to be important also appear to affect these gene-controlling switches."
By identifying large numbers of regulatory switches, the ENCODE project has made it realistic to associate risk variants in noncoding regions with specific genes that they regulate. Ninety-five percent of the information from GWAS studies has so far pointed to such intergenic regions, he said, and the data created by the ENCODE project provide "a new type of lens" through which to view that information.
And, he added, the findings will lead to "a major paradigm shift in how we use the genome to understand the genetic causes of disease, which will open up new avenues for the development of diagnostics and therapies."