MSAVis Code and Data

Overview

MSAVis is a multiple sequence alignment visualization system that integrates the display of conserved domain data. It is written in the Python programming language and uses BioPython and OpenGL.

Installation

To use MSAVis, one needs the following:

The BioPython project has installation instructions for installing Python, BioPython, and Numpy; the PyOpenGL site provides instructions for installing PyOpenGL and GLUT. The wxPython download page above also provides installation instructions.

For installing MSAVis, unzip the directory contents to a directory of your choice.

Using MSAVis

Starting MSAvis

Assuming everything is installed properly, start MSAvis at the command line from the directory you unzipped the MSAVis file:

  python msavis.py

On Windows or Mac OS X, double-clicking in the msavis.py file from Explorer/Finder should also start MSAVis.

Reading Alignments

You can then load your own alignment file or the provided DNMT2chick.aln alignment used for the paper’s figures. Selection "File > Load Alignment in Domains View..." from the menu to choose the alignment. MSAVis will then load the conserved domain details from the NCBI CDD for the given proteins. The progress loading the proteins is indicated by a progress bar.

MSAVis currently supports two protein alignment file formats (Clustal and PHYLIP). Support for additional formats is underway.

The MSAVis Display

MSAVis displays all the conserves domains (CDs) for the given alignment file one at a time. For each CD, the proteins are displayed as rows from left to right. Each position is shaded from light to dark depending upon the amount of amino acid conservation at that position (this amount is calculated using BLOSUM64 sum-of-pairs); a lighter color indicates less conservation whereas a darker color indicates more conservation. A column position for a given protein is colored with a unique hue if the given conserved domain occurs in the sequence position for the given CD. Gaps of no color (white and grey) indicate that the conserved domain is not found over that sequence position.

Navigation

Initially, MSAVis provides a summary of the entire sequences. To zoom into a subset of the sequence, drag over the area with the left-mouse button held; subsequent drag operations will zoom further. To scroll through the alignment, use the mouse-wheel: Scrolling up will move left through the alignment, scrolling down will move right. To return to the overview, right click.

The sequence position range can also be set manually: Clicking on the "Zoom to Region" button will prompt the user for the sequence range. Similarly, clicking on "Zoom Out to Full Length" will return the user to the full sequence display.

Other Interactions

Manipulating the Conserved Domains The order of the conserved domains may be changed by clicking and dragging the CD to its new location. A CD may be hidden from the display by unchecking the checkbox with its name above the sequences; checking this box will return it to the display. Finally, clicking on the name of the CD will take you to the NCBI CDD entry on the domain.

Manipulating the Proteins The order of the proteins may be manipulated by clicking and dragging the protein's color box or name at the top the display to its new location. Clicking on the color or name once will cause the protein to flash in all occurrences in the display for easy identification.

Contact

If you find MSAVis to be useful, or have any questions, contact Dr. T.J. Jankun-Kelly, tjk@acm.org.

License

MSAVis is licensed under a BSD-like license.

Copyright 2009 Andrew Lindeman, Susan Bridges, and T.J. Jankun-Kelly <tjk@acm.org>

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:

  1. Redistributions of source code must retain the above copyright
  notice, this list of conditions and the following disclaimer.

  2. Redistributions in binary form must reproduce the above
  copyright notice, this list of conditions and the following
  disclaimer in the documentation and/or other materials provided
  with the distribution.

  3. The name of the authors may not be used to endorse or promote
  products derived from this software without specific prior written
  permission.

THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT,
INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
T. J. Jankun-Kelly, Andrew Lindeman, Susan M. Bridges. 2009. Exploratory visual analysis of conserved domains on multiple sequence alignment. BMC Bioinformatics In Press.