The <mds_ies_db> is a dynamic, interactive, and mobile friendly database that features an assortment of searches and visualizations. This document serves as a user's manual for searching the database and controlling the interactive displays.
This manual is organized as follows:
- Search - contains information about the navigation bar quick searches, and the search tab advanced search forms.
- Display - contains information about the contig display pages that show the association of the MAC and MIC genomes with MDS, IES, and pointer annotations and chord diagram visualizations.
- Data - contains information about data processing methods and internal naming convention for stored sequences.
- Technical Notes - contains information about the database architecture and the software libraries used to create the website.
Questions not covered in this documentation should be directed to Jonathan Burns (firstname.lastname@example.org).
This database showcases annotations for features of ciliate genome rearrangement, namely, macronuclear destined segments (MDSs), internal eliminated segments (IES), and pointer sequences. Additionally, the <mds_ies_db> contains genome and proteome assemblies, and their corresponding domains and gene annotations, for the macronucleus (MAC) and mirconucleus (MIC) of Oxytricha trifallax, Tetrahymena thermophila, and other ciliates.
The Quick Name and Quick Sequence navigation searches provide a method to quickly access the data of a contig by directly inputting the its ID or alias. The Contig Search, Gene Search, and Sequence Search advanced searches return contig and gene names that satisfy the criteria specified in the search form. The sections below describe the navigation and advanced searches in greater detail.
In addition to navigation searches, several custom searches are also available. The advanced search forms can be used without knowledge of contig and gene ID numbers, so we recommend these searches to our first-time visitors.
The Contig Search can be used to browse through all stored contigs and scaffolds, and filter by chromosomal properties, e.g., organism, nucleus type, and length. The search form contains filtering options at the top of the page, and displays the search results at the bottom of the page. There are four filter panels: Organism, Contig, Gene, and Macronuclear Destined Sequence (MDS). By default, the Gene and MDS filter panels are initially hidden, but they can be shown by clicking on either the "Genes" or "MDSs" buttons found at the top of the Contig Search panel.
Organism: The Organism filter panel consists of two fields: Organism drop down list and Nucleus checkboxes. Organism drop down list allows to filter for sequences that are specific to selected ciliate organisms. Nucleus checkboxes allow to filter for sequences that come from MAC, MIC, or both.
Contig: The contig filter panel consists of two fields: Length filed and Telomeres checkboxes. Length field allows to filter for sequences according to their length in base pairs. There is a text box for specifying a length number and a drop down list to choose whether the length is at least ("≥") the specified number, is at most ("≤") the specified number, or is equal ("=") to the specified number. The telomeres checkboxes allow to specify whether desired sequences should have both 5′ and 3′ telomeres, only 5′ telomere, only 3′ telomere, or no telomeres at all.
Gene: The gene panel consist of the Count field which limits the number of genes on a contig. A user can specify the exact ("=" option) number of genes, the largest ("≤" option) number of genes, or the smallest ("≥" option) number of genes sequences can have.
Macronuclear Destined Sequence: The MDS panel is similar to the Gene panel, and allows a user to specify the exact, largest, or the smallest number of MDSs that a contig can have.
Filtered results can be ordered by one of the columns: Name, Length, Genes, or MDSs. Clicking on a column title orders the results in ascending (or alphabetical) order according to the column selected, and clicking the column title again orders the results in descending (or reverse alphabetical) order. Users can also specify the number of entries shown by clicking on the "Show" drop down list, and selecting a value number. Clicking on the desired contig name links to the contig information display page.
Click the "Search" button below to go to the Contig Search form.
The gene search is helpful for finding genes that are present on either a MAC or MIC contig. This search form contains all filtering options at the top of the page, and the search results at the bottom of the page. There are two panels for gene filters: Organism and Gene.
Organism: The Organism filter panel consists of two fields: the Organism dropdown list and the Nucleus checkboxes. The Organism drop-down list allows to filter for genes that are specific to selected ciliate organisms. Nucleus checkboxes allow to filter for MIC specific genes, MAC specific genes, or both.
Gene: The Gene filter panel consists of two fields: Description text field and Restriction checkboxes. The description test field allows to filter for genes containing specific description. The restriction chekboxes allow to filter for nucleus (MIC or MAC) limited genes. When "Limited" checkbox is checked, genes that appear only in MAC (or only in MIC) will be added to the search results. If "Both" checkbox is checked, then genes that appear in both MAC and MIC will be added to the search results.
Search results can be ordered by one of the following columns: Organism, Nucleus, Name, or Description. Click on the column name to order results in the alphabetical order. Click on the same column name again to order results in the reverse alphabetical order. User can also specify the number of shown entries by clicking on the drop down list, in the "Show" field, and selecting a different number. Click on the desired gene to open a detailed information about it.
Click the "Search" button below to go to the Gene Search window.Search
This search is designed to interface with the wwwblast to match a provided sequence of nucleotides or peptides to the one stored in the database. The search consists of the Database panel for selecting a database to blast against, text field for inserting the sequence for blasting, and Advanced Settings panel. The later one is minimized by default. Click on the "Advanced settings" link to maximize Advanced Settings panel.
Database: The Database panel consists of the three drop down lists: Organism, Nucleus, and Sequence. Organism list allows to select what organism is used to blast against. Nucleous list is for specifying whether a sequence lies on the MIC or on the MAC nucleus. Sequence list allows to specify whether a given sequence is a nucleotide one or a protein one.
Advanced settings: The Advanced settings panel allows the user to specify additional blast parameters. Low complexity checkbox is for specifying whether a low complexity segments of the query sequence should be masked off or not. Ungapped alignment checkbox is to specify whether blast will perform alignment with gaps being allowed or not. User can also specify the evalue threshold and a substitution matrix by using the corresponding drop down lists.
Once all search parameters are specified and the sequence is inserted into the sequence text box, press "BLAST" button to run blast.
The results panel will appear on the bottom with all matched contigs/scaffolds. As with all other search results, user can sort them by one of the shown column: Hit Name, From, To, Length, Identity%, Bitscore, or Evalue. Click on the column name once to order results in ascending (or alphabetical) order, click on the column name twice to order results in descending (or reverse alphabetical) order. Click on the desired contig/scaffold to open a detailed information about it.
Click the "Search" button below to go to the Sequence Search window.Search
Once the desired sequence is found, the sequence page can be opened to read the further details about it. The sequence information page contains a Genome Browser, Chord Diagram window, Hits Table, Download Data window, and DNA, MDS, and Gene Information fields.
On the top of the sequence information page there is a genoverese display. This display shows the nucleotide sequence of the MIC/MAC contig/scaffold together with any present gene information of the sequence and blast high scoring pair information. If selected sequence is from the MAC nucleus, then genoverse displays MAC nucleotide sequence, MDSs, MAC genes, and high scoring pairs of MIC contigs/scaffolds that match segments of selected MAC sequence. If selected sequence is from the MIC nucleus, then genoverse displays nucleotide sequence and high scoring pairs of MAC contigs/scaffolds that match segments of selected MIC sequence.
Use the scroll wheel of the mouse to zoom in and out of nucleotide regions. Certain tracks can be hidden or shown by clicking on the "Tracks" button in the top left corner and using the opened menu. For more information on how to use genoverse, please refer to this tutorial.
Chord diagram (a.k.a Circos Plot) is a way of visual representation of sequence alignment information. Selected MAC/MIC sequence and all matching MIC/MAC sequences are placed on the circle. The matching nucleotide segments of MAC and MIC are connected with an arc. Different MAC to MIC matches are connected with arcs of unique color.
Chord diagram is a great visual tool that allows to see what regions of MIC are mapped to what regions of MAC and vice versa. It is possible to look at only one particular set of MIC to MAC (or vice versa) arcs. Hover your mouse over the circle segment belonging to the desired MIC/MAC to see other arcs disappearing.
To see chord diagram which shows the rearrangement map for the contig, click on the "Chord Diagram" button under the genome browser.
MDS-IES annotations window displays information about MDS, IES, and pointer sequences in a table format. On the top of the window, there is "Annotations" drop down list that allows to mark what information to display (MDS, IES, and/or pointer). There are five buttons that allow to extract/save data in the table. "Copy" button will copy the table content into the clipboard. "CSV", "Excel", and "PDF" buttons will generate a corresponding file with all table data. "Print" button will let to print the table.
The table consists of five colums: entry name(MDS, IES, or Pointer), start position, end position, length, and sequence corresponding to the entry. It is possible to order the table in ascending/descending (alphabetical/reverse alphabetical) order according to any of the column by clicking on the column name. To close MDS-IES window, click on the "Close" button at the bottom.
Click on the "MDS-IES" button under the genoverse display to open MDS-IES annotations window.
Hits table displays a high scoring pair information in a table format. There is a drop down list on the top of the table to select from what MIC/MAC sequences the hits are shown. The drop down list also displays the number of hits from each MIC/MAC sequence. Below the drop down list, there are "Copy", "CSV", "Excel", "PDF", and "Print" buttons that can be used to extract the table data.
The hits table itself contains 10 columns: the number of the hit in the table, MAC name, start position in MAC, end position in MAC, MIC name, start position in MIC, end position in MIC, length, number of errors, and the button that opens a nucleotide sequence of the hit. It is possible to order the table in ascending/descending (alphabetical/reverse alphabetical) order according to any of the column by clicking on the column name.
Click on the "Hits Table" button under the genoverse display to see the hits table.
Download data window allows to download sequences, annotations, domains, and other information of the currently displayed contig. There are three download categories: Sequences, Annotations, Other. Click on the "Downloads" button under the genoverse display to open the download window.
Sequences: On the top of the download window, there is a Sequences field that consists of Nucleotide and Protein checkboxes and a Format field. Check Nucleotides and/or Protein for downloading corresponding sequences. The only format for this category is fasta, and it is selected by default.
Annotations: Annotations is the next download category that includes Genes, Domains, MDSs, and Telomeres check boxes. The data can be downloaded in gff3 or bed formats.
Other: The last category is Other located at the bottom of the download window. It contains RNA Expressions and MIC Arrangements check boxes. The data can be downloaded in csv or excel format.
Download: Once all desired information is checked for download click on "Download" button. The database will generate a zip archive with all requested data and allow it to be downloaded.
Next, there are different information fields related to selected sequence.
DNA Information field provides the length of the sequence (in nucleotides), information about telomeres (for MAC contigs), and cross reference links to other databases.
MDS Information field shows MDS count and number of MIC matches for the MAC sequence, and MAC matches for the MIC sequence. The last number can be clicked to open Arrangement Maps list.
Gene Information section contains a table of all genes that are present on the displayed sequence. It is possible to get expanded information of an individual gene by clicking on a green "+" symbol. It is possible to filter for a particular gene name or gene description by typing text into the Search field.
This section talks about the sources of data for the <mds_ies_db>.
The MDS-IES annotation comes from the MDS/IES Annotation sequence software (MI-ASS) which is a free and open source program developed by USF Math-Bio Lab. The annotation process consists of blasting MAC contigs/scaffolds against MIC contigs/scaffolds and using high score pair information for identifying MDSs on the MAC (and then later on the MIC). Using MIC's MDS information, MI-ASS also identifies IESs for each MIC.Besides MDS-IES annotation, the program also produces MAC telomeric sequence information and MIC's MDS arrangement pattern information. Both types of information are currently stored in the <mds_ies_db>.
The <mds_ies_db> assigns its own name to every sequence that is stored in the database. The naming convention is described as follows:
- 6 uppercase digits that are related to the organism name (ex. Oxytricha trifallax - OXYTRI)
- Underscore symbol "_"
- "MIC" or "MAC" string to indicate whether this is a MAC or MIC nucleus
- Underscore symbol "_"
- Unique number that is assigned to the sequence
An example of the assigned MAC contig name for oxytricha trifallax is OXYTRI_MAC_1001, and for the MIC contig of tetrahymena thermophila is TTHERM_MIC_1464
The genomes, proteomes, gene expression, and other annotations found in the <mds_ies_db> were collected from external databases such as GenBank, TGD, TFGD, Broad, etc., each with their own naming schemes.
This section talks about the database architecture and lists software and libraries used during the development of <mds_ies_db>.
Currently, there are more than 20 tables in the database and the largest ones are listed below.
- Alias table - contains information about sequence alias names found in different databases.
- Arrangement table - contains information about MDS arrangement patterns for each MIC.
- Domain table - contains domain information.
- Gene table - contains information about genes that are found on MAC and MIC sequences.
- HSP table - contains information about high scoring pairs that were produced by BLAST during MDS-IES annotation process.
- MDS-IES table - contains information about MDSs and IESs that were identified during MDS-IES annotation process.
- Nucleotide table - contains information about each MAC and MIC sequence.
- Protein table - contains information about MAC proteins.
- Telomere table - contains information about each MAC telomere that was identified during MDS-IES annotation process.
The <mds_ies_db> uses a number of open source software programs and libraries: