Sliding Window Analysis of KA and KS Web Server

Overview

 

 

Background

Mutations and substitutions are fundamental changes at the molecular level during evolutionary time. Among the well-established methods of studying evolution in protein-coding genes, the ratio of non-synonymous substitution rate (KA, amino acid replacing) to synonymous substitution rate (KS, silent) is the most powerful measure of selective pressure on a protein. Since non-synonymous and synonymous substitution sites are interspersed in the same DNA segment, this approach literally compares the amino acid replacing rate with the underlying mutation rate. Traditionally, if KA/KS < 1, the gene is inferred to be under negative (purifying) selection; if KA/KS = 1, the gene is under neural evolution; if KA/KS > 1; the gene is identified under positive (adaptive) selection, since mutations in the region have higher probabilities of being fixed in the population than expected from neural evolution.

However, this approach in effect averages the substitution rate over all amino acid sites in the sequence. Because most amino acids are expected to be under purifying selection and positive selection most likely only affects very few sites, this approach often loses the power to detect positive selection. To gain the detecting efficiency, the sliding window along the primary sequence was introduced. Recent studies further indicate that when the three-dimensional (3D) protein structure is available, it is much more sensitive to detect positive selection if windows in 3D space are employed in the analysis. For example, in the analysis of the major histocompatibility complex (MHC) alleles, positive selection was detected at the antigen recognition sites (ARS) but not the whole gene. These sites are close in the tertiary space but discontinuous in the primary sequence.

We developed a bioinformatic web server (SWAKK) whose primary purpose is to detect the regions under positive selection using the sliding window KA/KS analysis: with the input of two coding DNA sequences, one reference protein 3D structure and other user-defined parameters, the web server will automatically align the sequences, calculate KA/KS in each 3D window and display the results on the 3D structure. Additionally, if the structure is not available, the server also can perform the analysis on the primary sequence.

 

Methods

SWAKK accepts input as a pair of coding DNA sequences and a reference protein structure (PDB file). The DNA sequences are translated to amino acids and aligned with the amino acid sequence parsed from the PDB file using ClustalW. The alignment is then reversely translated to obtain a codon-based sequence alignment. We have made different genetic code tables available to account for the variability of the genetic code. Each amino acid in the reference structure is represented as the C¦Á atom. SWAKK constructs the 3D windows by placing each amino acid at the center and including all amino acids within a pre-specified distance from the center. For each window, all the corresponding codons within it are extracted to form the sub-alignment, and the KA/KS score is calculated using the PAML package. Finally, amino acid sites under different selective pressures are colored differently and visualized on the 3D structure using the Chime plug-in component. If the reference structure is not available, the server also can perform the analysis on the primary sequence. In this situation, the window size is defined as the distance in 1D sequence rather than in 3D space, and the results are displayed in the graph drawn by the GNUPLOT software.

If you have any questions about this web server, please email Han Liang at the University of Chicago.

Back home

Copyright@2006