[Introduction] [Methods] [Results] [Discussion] [Downloads] [Acknowledgements] [Citations]

Browse the SLAM gene predictions

Introduction

We have run the SLAM program on the human (NCBI Build 30, June 2002) and mouse (MGSC v3, February 2002) genomes. Orthologous regions from the two genomes as specified by a symmetric synteny map were used as input to SLAM. Because the map used was symmetric and nonoverlapping, the annotations produced by SLAM are also symmetric and nonoverlapping. For each gene prediction made in the human genome there is a corresponding gene prediction in the mouse genome with identical exon structure. In addition to predicting genes, SLAM also outputs regions that it considers to be conserved non-coding sequence (CNS). Here we present the annotations made by SLAM on the whole human and mouse genomes.

Methods

Results

Summary Statistics
Number of syntenic segments 342
Number of syntenic pieces 10,613
Number of predicted genes 29,283
Number of predicted exons 178,750
Number of predicted CNS (Conserved non-coding sequence) 511,895

Discussion

The de novo SLAM predictions are orthologous predictions in the sense that SLAM predictions are symmetric, and there is a bijective correspondence between human and mouse gene predictions and their structures. The symmetry of SLAM predictions increases confidence in the predictions, because human gene predictions must have consistent ORFs in mouse, splice sites and exon lengths, and vice versa. At the exon level, SLAM covers 79.8% of the RefSeq human exons and 77.5% of the exons in the ENSEMBL human gene set. These numbers are only slightly lower than Genscan and Twinscan coverage of these gene sets. This is because orthologous predictions are not possible in cases where there have been local rearrangements <300kb in size, or in cases where the synteny map is wrong (either mapped to the wrong place or to a paralogous region). As expected, SLAM is specific, with fewer coding exon predictions than other programs.

It is interesting to note that 151,770 ENSEMBL exons are covered by SLAM in human and 152,548 in mouse, suggesting that the sensitivity of ENSEMBL is very consistent in human and mouse. On the other hand, only 119,275 SLAM exons are covered by ENSEMBL in mouse, versus 125,773 in human, implying a small (but not insignificant) difference in specificity. Twinscan and Genscan display similar discrepancies between sensitivity/specificity in human and mouse.

In summary, the SLAM whole genome human/mouse run demonstrates the feasibility of de novo prediction of orthologous genes in the human and mouse genomes and results in thousands of new coding exons not predicted using other methods. The SLAM CNS set is the first de novo prediction of non-coding conserved regions in the human and mouse genome, and should be useful for many applications. In addition, the symmetric nature of SLAM allows for inferences about problems in the existing human and/or mouse gene sets.

Downloads

Acknowledgements

Citations

If you use any of the results on these pages in a publication, please cite the following papers:
bio.math.berkeley.edu