Berkeley ENCODE resources
 
 

 

 
Overview

 
The ENCODE and GENCODE Projects: ENCyclopedia Of DNA Elements

The ENCyclopedia Of DNA Elements (ENCODE) Project aims to identify all functional elements in the human genome sequence. The pilot phase of the Project is focused on a specified 30 megabases (approximately 1%) of the human genome sequence and is organized as an international consortium of computational and laboratory-based scientists working to develop and apply high-throughput approaches for detecting all sequence elements that confer biological function. The results of this pilot phase will guide future efforts to analyze the entire human genome (Science 2004 Oct 22; 306(5696):636-40).

GENCODE is a sub-project of ENCODE (PI: Roderic Guigo, IMIM, Barcelona). The overall goal of the proposal is to identify all protein-coding genes in the regions of the human genome selected within the ENCODE project. The U.C. Berkeley component of the project is focused on generating high quality alignments of sequences orthologous to the ENCODE regions for the purposes of improving gene predictions and for identifying conserved functional elements.

Homology mapping for the ENCODE sequence freeze

 
In collaboration with Daryl Thomas at UCSC, we have been working on identifying orthologous regions to the ENCODE sequences in assemblies of vertebrate genomes. A comparison of maps produced by the UCSC LiftOver tool, and Berkeley Mercator program are available at the UCSC ENCODE Ortholog site. The regions identified are being combined with comparative sequence data from NISC to produce the ENCODE sequence freezes which contain orthologous sequence for each ENCODE region.

Berkeley ENCODE alignments

 
The ENCODE sequence freezes are being aligned at U.C. Berkeley using the MAVID alignment program.

1. Alignment of the September 2005 freeze. [Download]
2. Alignment of the October 2004 freeze (including cow and ratB) with contigs ordered. [Download]
3. Alignment of the Stanford re-ordering of the October 2004 freeze (missing ratB). [Download]

We are also working on comparisons of multiple alignments produced by MAVID, MLAGAN and TBA.

Gene predictions

 
We have predicted genes in the human ENCODE regions using the SLAM program. These are being combined with other predictions for testing and validation as part of the GENCODE project.

1. SLAM predictions using mouse. [Download GFF]
2. SLAM predictions using rat. [Download GFF]


Related projects

 
Mathias Drton, Nicholas Eriksson and Garmay Leung investigated ultra conserved elements in the ENCODE regions. Results appear in Chapter 22 of Algebraic Statistics for Computational Biology.

Participants

 
Lior Pachter (co-PI GENCODE and group leader)

Nicolas Bray
Sourav Chatterji
Colin Dewey
Ariel Schwartz