Help Documentation

The <mds_ies_db> is a dynamic, interactive, and mobile friendly database that features an assortment of searches and visualizations. This document serves as a user's manual for searching the database and controlling the interactive displays.

This manual is organized as follows:

  • Search - contains information about the navigation bar quick searches, and the search tab advanced search forms.
  • Display - contains information about the contig display pages that show the association of the MAC and MIC genomes with MDS, IES, and pointer annotations and chord diagram visualizations.
  • Data - contains information about data processing methods and internal naming convention for stored sequences.
  • Technical Notes - contains information about the database architecture and the software libraries used to create the website.

Questions not covered in this documentation should be directed to Jonathan Burns (jtburns@mail.usf.edu).

Display

Once the desired sequence is found, the sequence page can be opened to read the further details about it. The sequence information page contains a Genome Browser, Chord Diagram window, Hits Table, Download Data window, and DNA, MDS, and Gene Information fields.

Genome Browser

On the top of the sequence information page there is a genoverese display. This display shows the nucleotide sequence of the MIC/MAC contig/scaffold together with any present gene information of the sequence and blast high scoring pair information. If selected sequence is from the MAC nucleus, then genoverse displays MAC nucleotide sequence, MDSs, MAC genes, and high scoring pairs of MIC contigs/scaffolds that match segments of selected MAC sequence. If selected sequence is from the MIC nucleus, then genoverse displays nucleotide sequence and high scoring pairs of MAC contigs/scaffolds that match segments of selected MIC sequence.

Figure 12: Genoverse browser containing information of MAC sequence, MDS annotation, MIC Hits, and MAC genes.

Use the scroll wheel of the mouse to zoom in and out of nucleotide regions. Certain tracks can be hidden or shown by clicking on the "Tracks" button in the top left corner and using the opened menu. For more information on how to use genoverse, please refer to this tutorial.

Chord Diagram

Chord diagram (a.k.a Circos Plot) is a way of visual representation of sequence alignment information. Selected MAC/MIC sequence and all matching MIC/MAC sequences are placed on the circle. The matching nucleotide segments of MAC and MIC are connected with an arc. Different MAC to MIC matches are connected with arcs of unique color.

Chord diagram is a great visual tool that allows to see what regions of MIC are mapped to what regions of MAC and vice versa. It is possible to look at only one particular set of MIC to MAC (or vice versa) arcs. Hover your mouse over the circle segment belonging to the desired MIC/MAC to see other arcs disappearing.

Figure 13: Chord diagram picture.
Figure 14: In this picture, the mouse is over MAC15801 contig. As a result, only arcs from this contig to MIC70082 are shown.

To see chord diagram which shows the rearrangement map for the contig, click on the "Chord Diagram" button under the genome browser.

MDS-IES Annotations

MDS-IES annotations window displays information about MDS, IES, and pointer sequences in a table format. On the top of the window, there is "Annotations" drop down list that allows to mark what information to display (MDS, IES, and/or pointer). There are five buttons that allow to extract/save data in the table. "Copy" button will copy the table content into the clipboard. "CSV", "Excel", and "PDF" buttons will generate a corresponding file with all table data. "Print" button will let to print the table.

The table consists of five colums: entry name(MDS, IES, or Pointer), start position, end position, length, and sequence corresponding to the entry. It is possible to order the table in ascending/descending (alphabetical/reverse alphabetical) order according to any of the column by clicking on the column name. To close MDS-IES window, click on the "Close" button at the bottom.

Figure 15: MDS-IES table window. Note that inferred MDSs are marked with the "†" symbol. In this case MDS 3 is an inferred one.

Click on the "MDS-IES" button under the genoverse display to open MDS-IES annotations window.

Hits Table

Hits table displays a high scoring pair information in a table format. There is a drop down list on the top of the table to select from what MIC/MAC sequences the hits are shown. The drop down list also displays the number of hits from each MIC/MAC sequence. Below the drop down list, there are "Copy", "CSV", "Excel", "PDF", and "Print" buttons that can be used to extract the table data.

The hits table itself contains 10 columns: the number of the hit in the table, MAC name, start position in MAC, end position in MAC, MIC name, start position in MIC, end position in MIC, length, number of errors, and the button that opens a nucleotide sequence of the hit. It is possible to order the table in ascending/descending (alphabetical/reverse alphabetical) order according to any of the column by clicking on the column name.

Figure 16: Hits table window.

Click on the "Hits Table" button under the genoverse display to see the hits table.

Download Data

Download data window allows to download sequences, annotations, domains, and other information of the currently displayed contig. There are three download categories: Sequences, Annotations, Other. Click on the "Downloads" button under the genoverse display to open the download window.

Sequences: On the top of the download window, there is a Sequences field that consists of Nucleotide and Protein checkboxes and a Format field. Check Nucleotides and/or Protein for downloading corresponding sequences. The only format for this category is fasta, and it is selected by default.

Annotations: Annotations is the next download category that includes Genes, Domains, MDSs, and Telomeres check boxes. The data can be downloaded in gff3 or bed formats.

Other: The last category is Other located at the bottom of the download window. It contains RNA Expressions and MIC Arrangements check boxes. The data can be downloaded in csv or excel format.

Download: Once all desired information is checked for download click on "Download" button. The database will generate a zip archive with all requested data and allow it to be downloaded.

Figure 17: Download data window.

Information fields

Next, there are different information fields related to selected sequence.

DNA Information field provides the length of the sequence (in nucleotides), information about telomeres (for MAC contigs), and cross reference links to other databases.

MDS Information field shows MDS count and number of MIC matches for the MAC sequence, and MAC matches for the MIC sequence. The last number can be clicked to open Arrangement Maps list.

Figure 18: DNA Information and MDS Information fields.
Figure 19: Arrangement Maps list which contains information about MDSs' positions and orientations of particular MAC relative to particular MIC.

Gene Information section contains a table of all genes that are present on the displayed sequence. It is possible to get expanded information of an individual gene by clicking on a green "+" symbol. It is possible to filter for a particular gene name or gene description by typing text into the Search field.

Figure 20: Gene table that lists all gene names and gene descriptions.
Figure 21: Expanded information field of a gene inside the gene table.

Data

This section talks about the sources of data for the <mds_ies_db>.

MDS-IES Annotation

The MDS-IES annotation comes from the MDS/IES Annotation sequence software (MI-ASS) which is a free and open source program developed by USF Math-Bio Lab. The annotation process consists of blasting MAC contigs/scaffolds against MIC contigs/scaffolds and using high score pair information for identifying MDSs on the MAC (and then later on the MIC). Using MIC's MDS information, MI-ASS also identifies IESs for each MIC.

Besides MDS-IES annotation, the program also produces MAC telomeric sequence information and MIC's MDS arrangement pattern information. Both types of information are currently stored in the <mds_ies_db>.

Notation

The <mds_ies_db> assigns its own name to every sequence that is stored in the database. The naming convention is described as follows:

  • 6 uppercase digits that are related to the organism name (ex. Oxytricha trifallax - OXYTRI)
  • Underscore symbol "_"
  • "MIC" or "MAC" string to indicate whether this is a MAC or MIC nucleus
  • Underscore symbol "_"
  • Unique number that is assigned to the sequence

An example of the assigned MAC contig name for oxytricha trifallax is OXYTRI_MAC_1001, and for the MIC contig of tetrahymena thermophila is TTHERM_MIC_1464

The genomes, proteomes, gene expression, and other annotations found in the <mds_ies_db> were collected from external databases such as GenBank, TGD, TFGD, Broad, etc., each with their own naming schemes.

Technical Notes

This section talks about the database architecture and lists software and libraries used during the development of <mds_ies_db>.

Database Architecture

The <mds_ies_db> is a relational database powered by MySQL database management system.

Currently, there are more than 20 tables in the database and the largest ones are listed below.

  • Alias table - contains information about sequence alias names found in different databases.
  • Arrangement table - contains information about MDS arrangement patterns for each MIC.
  • Domain table - contains domain information.
  • Gene table - contains information about genes that are found on MAC and MIC sequences.
  • HSP table - contains information about high scoring pairs that were produced by BLAST during MDS-IES annotation process.
  • MDS-IES table - contains information about MDSs and IESs that were identified during MDS-IES annotation process.
  • Nucleotide table - contains information about each MAC and MIC sequence.
  • Protein table - contains information about MAC proteins.
  • Telomere table - contains information about each MAC telomere that was identified during MDS-IES annotation process.

Software

The <mds_ies_db> uses a number of open source software programs and libraries:

  • Bootstrap

    Bootstrap is the most popular HTML, CSS, and JS framework for developing responsive, mobile first projects on the web.
  • D3.js - Data-Driven Documents

    A JavaScript library for manipulating documents based on data using HTML, SVG and CSS.
  • DataTables

    DataTables is a plug-in for the jQuery Javascript library. It is a highly flexible tool, based upon the foundations of progressive enhancement, and will add interation controls to any HTML table.
  • Genoverse

    Genoverse is a portable, customizable, back-end independent JavaScript and HTML5 based genome browser which allows the user to explore data interactively.
  • MDS / IES - DNA Annotation Software

    MDS / IES - DNA Annotation Software (MIDAS) is a Python program that aligns a ciliate's macronuclear and micronuclear genome assemblies, annotates the corresponding MDS, IES, and pointer sequences, and generates the resulting rearrangement maps between the two assemblies.
  • wwwblast

    The term wwwblast refers to a suite of standalone BLAST programs produced by NCBI.