|
Ciliates are a group of protists that, together with their neighboring parasitic apicomplexans, diverged from the rest of the eukaryotes at a very early stage during evolution (Figure 1). Besides serving as an alternative eukaryotic branch, ciliates have offered many interesting discoveries in biology, from the presence of (at least 3) nonstandard genetic codes, to group I self-splicing RNA and telomerase in Tetrahymena, to programmed elimination of massive amounts of genomic DNA and scrambled DNA rearrangements in stichotrichs. Therefore, the recent completion of the MAC genomic sequences of two ciliates, Tetrahymena thermophila and Paramecium tetraurelia, as well as numerous ongoing genome and EST projects for other ciliates (such as Oxytricha and Euplotes) bring unprecedented opportunity to not only the ciliate community but also the general molecular and evolutionary biology community.
[From Doak, Cavalcanti, Landweber, et al, Trends in Genetics (19) 603-607, 2003.]
The Ciliate Ortholog Database (COD) is designed to compute and store the ortholog groups among different ciliate species as complete genome or transcriptome sequences from various ciliates become available. Currently, the database contains the information of ortholog groups between the completed ciliate genomes: T. thermophila and P. tetraurelia, computed using Inparanoid. There are totally 4909 ortholog groups detected. The distribution of the number of orthologs from the two ciliates is shown in Figure 2. Consistent with the whole genome duplication hypothesis in P. tetraurelia, most of the ortholog groups contain only one gene from T. thermophila but more than one gene from P. tetraurelia.
Apart from the ortholog groups from completed ciliate genomes, COD also serves as an integrative platform for other publicly available ciliate sequences. Those include the sequences from ongoing (unfinished) ciliate genome projects, such as Oxytricha trifallax, and GenBank sequences deposited from individual studies. These sequences are collected and grouped with the ortholog group containing their best Blast hit, suggesting the putative ortholog information in the absence of a completed genome.
Besides the standard features such as name and key word searches, sequence retrieval tools, and hyperlinks to other related databases, a key innovation of COD is that it offers a novel search method, which we call BlastO, that treats each ortholog group as a unit. BlastO organizes and ranks the BLAST results by ortholog groups, which is often a much more desirable format for evolutionary studies than a ranked list of individual sequence hits, biased by their redundancy in the database (for more details, please see FAQ). COD also incorporates ClustalW to allow multiple sequence alignment and phylogenetic tree construction among orthologs and additional user-defined sequences.
Edited on June 28th, 2006
|