The ENCODE and GENCODE Projects: ENCyclopedia Of DNA Elements
The ENCyclopedia Of DNA Elements (ENCODE) Project aims to identify all functional elements in the human genome sequence. The pilot phase of the Project is focused on a specified 30 megabases (approximately 1%) of the human genome sequence and is organized as an international consortium of computational and laboratory-based scientists working to develop and apply high-throughput approaches for detecting all sequence elements that confer biological function. The results of this pilot phase will guide future efforts to analyze the entire human genome (Science 2004 Oct 22; 306(5696):636-40).
GENCODE is a sub-project of ENCODE (PI: Roderic Guigo, IMIM, Barcelona). The overall goal of the proposal is to identify all protein-coding genes in the regions of the human genome selected within the ENCODE project. The U.C. Berkeley
component of the project is focused
on generating high quality alignments of sequences orthologous to the ENCODE regions for the purposes of improving gene predictions and for identifying
conserved functional elements.
|
Homology mapping for the ENCODE sequence freeze
|
In collaboration with Daryl Thomas at UCSC, we have been working on identifying orthologous regions to the ENCODE sequences in assemblies of vertebrate genomes. A comparison of maps produced by the UCSC LiftOver
tool, and Berkeley Mercator program are available at the UCSC ENCODE Ortholog site. The regions identified are being combined with comparative sequence data from NISC to produce the ENCODE sequence freezes which contain orthologous sequence for each ENCODE region.
|
Berkeley ENCODE alignments
|
The ENCODE sequence freezes are being aligned at U.C. Berkeley using the
MAVID alignment program.
1. Alignment of the September 2005 freeze. [Download]
2. Alignment of the October 2004 freeze (including cow and ratB) with contigs ordered. [Download]
3. Alignment of the Stanford re-ordering of the October 2004 freeze (missing ratB). [Download]
We are also working on comparisons of multiple alignments produced by MAVID, MLAGAN and TBA.
We have predicted genes in the human ENCODE regions using the SLAM program. These are being combined with other predictions for testing and validation as part of the GENCODE project.
1. SLAM predictions using mouse. [Download GFF]
2. SLAM predictions using rat. [Download GFF]
Mathias Drton, Nicholas Eriksson and Garmay Leung investigated
ultra conserved elements in the ENCODE regions. Results appear in Chapter 22 of Algebraic Statistics for Computational Biology.
Lior Pachter (co-PI GENCODE and group leader)
Nicolas Bray
Sourav Chatterji
Colin Dewey
Ariel Schwartz
|