Saturday, January 8, 2011


Just this week I had a 'methodology' article published in BMC Bioinformatics.  Robert Lücking of the Field Museum was the first author, and we worked with Alexis Stamatakis (of RAxML fame) and Reed Cartwright (creator of Ngila and Dawg).  The paper is entitled "PICS-Ord: Unlimited Coding of Ambiguous Regions by Pairwise Identity and Cost Scores Ordination" and it presents a method for encoding data found in ambiguously-aligned regions of multiple sequence alignments in a way that makes it possible to integrate such data into standard molecular phylogenetic analyses.  Most researchers simply exclude data found in ambiguously-aligned regions of nucleotide or amino-acid alignments when conducting phylogenetic inferences.  While such practices are perfectly sound, a large amount of potentially informative data is subsequently left out of downstream analyses.  However, using a method to recode these regions and integrate the data into phylogenetic analyses allows one to consider all of the data present in the larger molecular regions being analyzed.

Until PICS-Ord, no method had been devised for properly integrating this type of data into likelihood-based analyses (e.g., ML, Bayesian).  INAASE (Lutzoni et al. 2000) is a program that recodes ambiguously-aligned regions, but since the distances between different sequence types are encoded as cost matrices, its utility is limited to parsimony-based analyses.  It also has a finite number of symbols, making it impractical for large data sets.  For each ambiguously-aligned region, PICS-Ord uses ordination of scores (which are based on pairwise alignments between the sequences for each taxon) to create a series of axes that are converted to discreet characters which can be appended to a multiple sequence alignment.  The matrix of the sequence alignment plus the recoded characters can then be analyzed phylogenetically based on any number of criteria, including maximum likelihood (ML) and Bayesian inference.

PICS-Ord is available here as an R-based program.  As academic software goes, it's pretty friendly, but please let us know if you run into any troubles.  The publication of this method along with a program for implementation represents a great leap forward in phylogenetics, with the ability to finally integrate data from ambiguously-aligned regions into likelihood-based analyses!


P.S. Find out more about PICS-Ord here on Reed Cartwright's blog:

Works Cited:

Lücking, R., B. P. Hodkinson, A. Stamatakis, and R. A. Cartwright. 2011. PICS-Ord: Unlimited Coding of Ambiguous Regions by Pairwise Identity and Cost Scores Ordination. BMC Bioinformatics 12: 10.
Download publication (PDF file)
Download R-based PICS-Ord program (zipped program package)
View program wiki (website)

Lutzoni, F., P. Wagner, V. Reeb, and S. Zoller. 2000. Integrating ambiguously aligned regions of DNA sequences in phylogenetic analyses without violating positional homology. Systematic Biology 49: 628-651.
Download publication (PDF file)
Download Java-based INAASE program (zipped program package)

No comments:

Post a Comment