Friday, December 16, 2011

Using R for community analyses

For the central chapter of my doctoral dissertation (published in Environmental Microbiology), I wanted to visualize similarities and differences between lichen-associated bacterial communities, find which factors were most strongly correlated with community differences, and determine the significance of these correlations.  When communities are characterized using large sets of DNA sequence data, this can be achieved by using both genetic measures (e.g., UniFrac distance) and taxon-based measures (e.g., Bray-Curtis OTU-based dissimilarity).

This post gives an example of a script (or a series of commands) to be run with the R statistical package (R Development Core Team 2010) for performing ANOSIM and NMDS analyses of (1) an OTU-based relative abundance matrix derived from a set of clone libraries and (2) a UniFrac distance matrix derived from a 454 amplicon library (see example files at the bottom of this post).  These examples are derived from Hodkinson et al. (2012), although the scripts and example files shown here are abbreviated in order to save space.  Full example files can be downloaded from the Dryad data package associated with this publication [Hodkinson et al. 2011; http://dx.doi.org/10.5061/dryad.t99b1].  The matrices were obtained using the Fast UniFrac program (see instructions here) and Mothur (using the get.relabund command; for the full pipeline that I authored to process these data, see the Dryad data package).  Some minor modifications were required to get the matrices into the precise format needed for R (edits were performed in Microsoft Excel and files were saved in .csv format... note that all of the spaces/tabs in the examples below would actually be commas in the raw text versions of the files).

Running the script as it is presented below (with hash marks in front of the 'plot' commands) will allow one to obtain an output file with the numerical results of all analyses.  However, in order to visualize the NMDS plots, one can run each of the commands line-by-line in the R-GUI (of course, all 'anosim' commands can be ignored if the only goal is NMDS), removing the hash marks in front of the plot commands (note that R has many fancy options for visualization using 'plot'; these may be worth investigating further for your own purposes).  Here is a link to the version of R that I used for these analyses: http://cran.r-project.org/bin/windows/base/old/2.12.0/.  Once R is installed, you can call the script with a command like this one:
C:\\"Program Files"\R\R-2.12.0\bin\i386\Rterm.exe --verbose --no-restore --file=C:\\ANOSIM_NMDS_Clon_454_win.R > C:\\ANOSIM_NMDS_Clon_454_win.out

Here is the script, named 'ANOSIM_NMDS_Clon_454_win.R' (abbreviated example script; full file available at http://dx.doi.org/10.5061/dryad.t99b1):

# load the necessary libraries
library(ecodist)
library(vegan)
# set the output file
sink("ANOSIM_nmds.out", append=TRUE, split=TRUE)
# load the clone data set
samplesClon<-read.csv("C:\\SamplesClon.csv", header=TRUE)
samplesClon
# load the Bray-Curtis OTU-based matrix
OTUsClon<-read.csv("C:\\HodkinsonClon16S_OTUs_1per_171.csv", head = FALSE, row.names = 1)
# transform the OTU matrix into a Bray-Curtis matrix
OTUsClon.dist<-vegdist(OTUsClon, "bray")
# run Bray-Curtis OTU-based ANOSIM
anosim(dis = OTUsClon.dist, grouping = samplesClon$Photobiont, strata = samplesClon$Site)
anosim(dis = OTUsClon.dist, grouping = samplesClon$Mycobiont, strata = samplesClon$Site)
anosim(dis = OTUsClon.dist, grouping = samplesClon$Mycobiont, strata = samplesClon$Photobiont)
anosim(dis = OTUsClon.dist, grouping = samplesClon$Site, strata = samplesClon$Photobiont)
anosim(dis = OTUsClon.dist, grouping = samplesClon$Site, strata = samplesClon$Mycobiont)
# run Bray-Curtis OTU-based NMDS
nmds <- nmds(as.dist(OTUsClon.dist))
nmin <- nmds.min(nmds)
nmin
#plot(nmin, pch=as.numeric(samplesClon$Photobiont))
#plot(nmin, pch=as.numeric(samplesClon$Mycobiont))
#plot(nmin, pch=as.numeric(samplesClon$Site))
# load the 454 data set
samples454<-read.csv("C:\\Samples454.csv", header=TRUE)
samples454
# load the normalized weighted UniFrac matrix
unifracnw454<-read.csv("C:\\DistUFnw454.csv", head = FALSE, row.names = 1)
# run normalized weighted UniFrac ANOSIM
anosim(dat = as.dist(unifracnw454), grouping = samples454$Photobiont, strata = samples454$Site)
anosim(dat = as.dist(unifracnw454), grouping = samples454$Site, strata = samples454$Mycobiont)
anosim(dat = as.dist(unifracnw454), grouping = samples454$Site, strata = samples454$Photobiont)
anosim(dat = as.dist(unifracnw454), grouping = samples454$Mycobiont, strata = samples454$Photobiont)
anosim(dat = as.dist(unifracnw454), grouping = samples454$Mycobiont, strata = samples454$Site)
# run normalized weighted UniFrac NMDS
nmds <- nmds(as.dist(unifracnw454))
nmin <- nmds.min(nmds)
nmin
#plot(nmin, pch=as.numeric(samples454$Photobiont))
#plot(nmin, pch=as.numeric(samples454$Mycobiont))
#plot(nmin, pch=as.numeric(samples454$Site))
# close the output file
sink()
# unload the libraries
detach("package:ecodist")
detach("package:vegan")


Samples454.csv (referenced in the above script)

SampleID Photobiont Site Mycobiont
CLCl Ch C Cl
CLDi Cy C Di
CLLe Cy C Le
CLPe Cy C Pe
CLSt Cy C St
CLUs Ch C Us
ELCl Ch E Cl
ELFl Ch E Fl
ELOp Ch E Op
ELPe Cy E Pe
ELUm Ch E Um
HLCl Ch H Cl
HLLe Cy H Le
HLPe Cy H Pe
HLSt Cy H St
HLUs Ch H Us
NLCl Ch N Cl
NLFl Ch N Fl
NLOp Ch N Op
NLPe Cy N Pe
NLUm Ch N Um


DistUFnw454.csv (referenced in the above script; full file available at http://dx.doi.org/10.5061/dryad.t99b1)

CLCl
0.16 0.17 0.11 0.08 0.12 ...
CLDi 0.16
0.24 0.18 0.13 0.18 ...
CLLe 0.17 0.24
0.11 0.16 0.20 ...
CLPe 0.11 0.18 0.11
0.10 0.18 ...
CLSt 0.08 0.13 0.16 0.10
0.13 ...
CLUs 0.12 0.18 0.20 0.18 0.13
...
ELCl 0.20 0.22 0.30 0.25 0.19 0.17
ELFl 0.14 0.18 0.24 0.19 0.14 0.15 ...
ELOp 0.33 0.35 0.40 0.37 0.34 0.31 ...
ELPe 0.16 0.21 0.24 0.19 0.15 0.19 ...
ELUm 0.22 0.24 0.32 0.27 0.22 0.22 ...
HLCl 0.19 0.18 0.27 0.24 0.17 0.15 ...
HLLe 0.13 0.16 0.20 0.15 0.11 0.12 ...
HLPe 0.09 0.17 0.15 0.11 0.11 0.16 ...
HLSt 0.14 0.16 0.21 0.18 0.12 0.13 ...
HLUs 0.10 0.20 0.18 0.15 0.14 0.11 ...
NLCl 0.19 0.21 0.28 0.25 0.19 0.15 ...
NLFl 0.18 0.21 0.27 0.24 0.19 0.15 ...
NLOp 0.24 0.27 0.32 0.29 0.25 0.25 ...
NLPe 0.10 0.18 0.13 0.09 0.10 0.14 ...
NLUm 0.09 0.18 0.21 0.15 0.13 0.16 ...


SamplesClon.csv (referenced in the above script)

SampleID Photobiont Site Mycobiont
CL01 Cy C Peltigera
CL02 Ch C Alectoria
CL03 Ch C Cladonia
CL04 Tr C Placopsis
CL05 Cy C Erioderma
CL06 Cy C Sticta
CL07 Cy C Dictyonema
CL08 Cy C Leptogium
CL09 Tr C Stereocaulon
CL10 Ch C Usnea
EL01 Cy E Peltigera
EL01t Tr E Peltigera
EL02 Ch E Alectoria
EL03 Ch E Cladonia
EL04 Ch E Rhizocarpon
EL05 Ch E Ophioparma
EL06 Ch E Flavocetraria
EL07 Ch E Arctoparmelia
EL08 Ch E Umbilicaria
EL09 Ch E Masonhalea
EL10 Ch E Dactylina
HL01 Cy H Peltigera
HL02 Ch H Platismatia
HL03 Ch H Cladonia
HL04 Ch H Parmotrema
HL05 Ch H Flavoparmelia
HL06 Cy H Sticta
HL07 Ch H Xanthoparmelia
HL08 Cy H Leptogium
HL09 Tr H Stereocaulon
HL10 Ch H Usnea
NL01 Cy N Peltigera
NL01t Tr N Peltigera
NL03 Ch N Cladonia
NL04 Tr N Amygdalaria
NL05 Ch N Ophioparma
NL06 Ch N Flavocetraria
NL07 Ch N Parmelia
NL08 Ch N Umbilicaria
NL09 Tr N Stereocaulon
NL10 Ch N Cetraria


HodkinsonClon16S_OTUS_1per_171.csv (referenced in the above script; full file available at http://dx.doi.org/10.5061/dryad.t99b1)

CL01 0.00 0.00 0.00 0.00 0.00 0.00 ...
CL02 0.00 0.00 0.00 0.00 0.00 0.00 ...
CL03 0.00 0.25 0.00 0.00 0.00 0.00 ...
CL04 0.11 0.00 0.00 0.00 0.00 0.00 ...
CL05 0.00 0.03 0.34 0.05 0.00 0.00 ...
CL06 0.00 0.10 0.02 0.03 0.00 0.02 ...
CL07 0.00 0.56 0.00 0.00 0.00 0.19 ...
CL08 0.00 0.03 0.00 0.00 0.00 0.00 ...
CL09 0.02 0.13 0.06 0.15 0.00 0.00 ...
CL10 0.00 0.00 0.00 0.00 0.00 0.00 ...
EL01 0.00 0.00 0.00 0.00 0.00 0.17 ...
EL01t 0.00 0.00 0.00 0.00 0.00 0.31 ...
EL02 0.00 0.26 0.00 0.00 0.00 0.00 ...
EL03 0.00 0.03 0.00 0.00 0.00 0.00 ...
EL04 0.00 0.10 0.00 0.00 0.00 0.00 ...
EL05 0.00 0.04 0.00 0.00 0.00 0.00 ...
EL06 0.00 0.07 0.00 0.00 0.00 0.00 ...
EL07 0.00 0.00 0.00 0.00 0.00 0.00 ...
EL08 0.00 0.34 0.00 0.00 0.00 0.00 ...
EL09 0.00 0.21 0.00 0.00 0.00 0.00 ...
EL10 0.00 0.21 0.00 0.00 0.04 0.00 ...
HL01 0.00 0.00 0.00 0.00 0.00 0.00 ...
HL02 0.13 0.00 0.00 0.00 0.00 0.00 ...
HL03 0.00 0.00 0.00 0.00 0.00 0.00 ...
HL04 0.06 0.00 0.00 0.00 0.00 0.00 ...
HL05 0.00 0.03 0.00 0.00 0.00 0.00 ...
HL06 0.00 0.00 0.00 0.06 0.00 0.00 ...
HL07 0.00 0.14 0.00 0.00 0.00 0.00 ...
HL08 0.00 0.00 0.00 0.00 0.00 0.00 ...
HL09 0.00 0.00 0.00 0.00 0.00 0.00 ...
HL10 0.00 0.02 0.00 0.00 0.00 0.00 ...
NL01 0.00 0.15 0.00 0.00 0.00 0.00 ...
NL01t 0.00 0.00 0.00 0.00 0.00 0.15 ...
NL03 0.00 0.00 0.00 0.00 0.00 0.00 ...
NL04 0.00 0.15 0.00 0.00 0.00 0.00 ...
NL05 0.00 0.17 0.00 0.00 0.00 0.00 ...
NL06 0.00 0.02 0.00 0.00 0.00 0.00 ...
NL07 0.00 0.33 0.00 0.00 0.00 0.00 ...
NL08 0.00 0.80 0.00 0.00 0.00 0.00 ...
NL09 0.00 0.00 0.00 0.00 0.00 0.00 ...
NL10 0.00 0.05 0.00 0.00 0.00 0.00 ...


Here are some examples of non-metric multidimensional scaling plots produced from OTU-based Bray-Curtis dissimilarities between lichen-associated bacterial communities (Hodkinson et al. 2012). Photobiont is indicated by color (light green = green algae; dark blue = Cyanobacteria; black = Tripartite), while the site is indicated with symbols (• = Eagle Summit, AK; + = Nome, AK; x = Highlands, NC, and * = Cerro de la Muerte, CR). Numbers indicate the identity of different mycobiont types (see associated publication for details). Continuous lines act as visual aids to delimit communities associated with the two major photobiont-types, whereas dashed lines delimit communities associated with chlorolichens from northern versus southern sites.



















Plot A shows results obtained from clone library data of 16S sequences from the order Rhizobiales.



Plot B was produced from 454 barcoded 16S amplicon data (representing approximately half the number of samples as the clone library set but ~100 times as many sequences from a much wider range of bacterial diversity; since this plot was produced through OTU-based methods it does not precisely correlate with what would be produced through the abbreviated script above, which uses UniFrac distances to generate NMDS plots based on 454 data).

For additional information and documentation, including results of ANOSIM analyses, please see the references below!

- Brendan

---------------------------------------

References

Hodkinson, B. P. 2011. A phylogenetic, ecological, and functional characterization of non-photoautotrophic bacteria in the lichen microbiome. Doctoral Dissertation, Duke University, Durham, NC.
Download Dissertation (PDF file)

Hodkinson, B. P., N. R. Gottel, C. W. Schadt, and F. Lutzoni. 2012. Photoautotrophic symbiont and geography are major factors affecting highly structured and diverse bacterial communities in the lichen microbiome. Environmental Microbiology 14(1): 147-161. [doi:10.1111/j.1462-2920.2011.02560.x]
Download publication (PDF file)
Download supplementary phylogeny (PDF file)
View data and analysis file webportal (website)
Download data and analysis file archive (ZIP file)

Hodkinson, B. P., N. R. Gottel, C. W. Schadt, and F. Lutzoni. 2011. Data from: Photoautotrophic symbiont and geography are major factors affecting highly structured and diverse bacterial communities in the lichen microbiome. Dryad Digital Repository. doi:10.5061/dryad.t99b1

R Development Core Team. 2010. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. http://www.r-project.org/

Thursday, December 8, 2011

Editor's Pick

I recently received an email notifying me that the journal The Lichenologist had just released a compilation of papers entitled 'Editor's Pick of Papers: from Genome to Ecosystems.' This group consists of the editor's top 15 articles published in the journal over the past two years (volumes 42 and 43), during which well over 100 works have been published in The Lichenologist. I was excited to see that one of my papers was selected! To read my previous blog post outlining the content of the paper, please click here. Many thanks to Peter Crittenden for his work as editor-in-chief and to the British Lichen Society for publishing the journal!

- Brendan

Tuesday, December 6, 2011

Two new orders of fungi

This week I had a manuscript published in which two entirely new orders of fungi were established! They are Sarrameanales and Trapeliales, and they both belong to the subclass Ostropomycetidae in the class Lecanoromycetes (the main class of lichen-forming fungi). To read more about the taxonomy of these two orders, see the publication in the December 2011 issue of Phytologia. This research can partly be seen as a taxonomic extension of the NSF-funded Assembling the Fungal Tree of Life project, since the distinctness of these orders was only recognized as a result of multi-gene phylogenetic analyses conducted as part of this effort.

We are making progress taxonomically, but even the higher-level work is not done yet; there are still many groups of 'incertae sedis' fungi out there and probably many more legitimate orders to be described... which means that there's plenty more exciting work to be done in the future!

- Brendan

------------------------------------

Reference

Hodkinson, B. P., and J. C. Lendemer. 2011. The orders of Ostropomycetidae (Lecanoromycetes, Ascomycota): Recognition of Sarrameanales and Trapeliales with a request to retain Pertusariales over Agyriales. Phytologia 93(3): 407-412.
Download publication (PDF file)

Sunday, November 20, 2011

My New Supercomputing Center

Recently I applied for and received an XSEDE (Extreme Science and Engineering Discovery Environment) Startup Allocation Award (DEB-110024), allowing me to use the San Diego Supercomputing Center (SDSC) Trestles cluster. The cluster itself contains over 10,000 processor cores; you can read more about it here. While there are many different types of research being conducted using this cluster, my work focuses on ecological and evolutionary analyses of molecular sequence data. Previously I conducted my computationally-intensive analyses using the Duke Shared Cluster Resource (DSCR). However, since leaving Duke University a few months ago, I have been without a cluster. I am happy to say that I now have access to a cluster once more! Let the supercomputing resume!

- Brendan

Thursday, November 10, 2011

Outreach and training through Youtube videos

Over the past few months I have acted as the New York Botanical Garden's Workflow Coordinator for the NSF-funded "Digitization TCN Collaborative Research: North American Lichens and Bryophytes: Sensitive Indicators of Environmental Quality and Change" (EF-1115086).  This project is a collaboration between multiple institutions across North America and is aimed at cataloging label data from the vast majority of North American lichen and bryophyte specimens.  Recently, as part of this project, we at NYBG released two videos on Youtube.  The first acts as an introduction to the project for the general public and gives some of the rationale behind it:


The second is a training video that can be used by members of the partner institutions or others who are thinking about taking on similar projects:


As you will probably notice from the videos themselves, Charlie Zimmerman, the imager that I have supervised for this project here at the garden, was the one who did most of the writing, filming and editing... and we greatly appreciate all of his work!

Please enjoy the videos!

- Brendan

Sunday, November 6, 2011

Building linkage-probability-based RNA secondary structure models for phylogenetic inference

RNA secondary structure models are increasingly being integrated into likelihood-based phylogenetic inferences, but the dynamic structure of functional RNA molecules makes any single structural inference necessarily inaccurate. In this post I present an objective method for determining which elements of secondary structure are most stable based on the statistical significance of linkage probabilities between sites on a given RNA molecule. I briefly outline how this information can be integrated into a phylogenetic analysis by creating an input file that contains these statistically significant structural elements.

For some additional background on RNA secondary structure, see this previous post:
http://squamules.blogspot.com/2011/08/its-rna-secondary-structure.html

Functional RNA molecules include pairs of nucleotide sites that are linked to one another physically, resulting in specific secondary structures that define the shape of each molecule.  This linkage causes certain sites to evolve in tandem with their counterparts.  As such, the secondary structure of RNA molecules has been recognized for some time as a significant consideration in the inference of phylogenies from functional RNA-encoding genes (Kimura 1985, Tillier and Collins 1995).  Typically, RNA secondary structure is used to optimize multiple-sequence-alignment accuracy for functional RNA-encoding genes (Gutell et al. 1992, Kjer 1995, Lendemer and Hodkinson 2009).  However, in recent years, the use of secondary structure in modeling evolution for likelihood-based phylogenetic inferences has begun to gain popularity (Hodkinson and Lendemer in review, Savill et al. 2001, Telford et al. 2005).  This approach requires defining the pairs of linked sites on the encoded RNA molecule and treating these pairs as states that are separate from the standard independent nucleotide states (A, C, T, G) in a phylogenetic inference.  This method allows one to properly account for the interdependency of interacting nucleotides, since paired RNA nucleotides are no longer required to be treated as independent sites, leading to a more accurate approach for modeling sequence evolution.

Current protocols for integrating RNA secondary structure data into phylogenetic analyses require a single hypothetical structure to be used as an input.  Structures are typically inferred using algorithms that minimize free energy or use other thermodynamic considerations to produce the best single structural inference (Mathews & Turner 2006).  However, RNA secondary structure is dynamic, frequently changing in the cell as RNA catalyzes reactions and performs various cellular functions.  When RNA molecules encounter certain enzymes and cellular components, the thermodynamic rules that previously favored one structure might strongly favor another.  Additionally, different methods of structural inference are not always comparable, and small differences in algorithms can favor significantly different structures; the use of differing structural models in a phylogenetic context can have consequences in terms of both topology inference and the calculation of support (Ullrich et al. 2010).

These problems can largely be solved by removing statistically non-significant linkages from phylogenetic analyses, leaving only the most probable structural elements to be incorporated into downstream inferences.  The determination of which RNA secondary structural elements are supported with statistical significance is often overlooked and is certainly not a standard part of the current work flow for scientists integrating secondary structural data into phylogenetic analyses.

Since RNA secondary structure can serve as such a useful tool for revealing the evolutionary history of certain groups, it is essential that objective criteria be established for incorporating structural elements into phylogenetic inferences.  The simple method outlined here allows one (a) to evaluate the probability that each site on an RNA-encoding gene is linked to each other site and (b) to produce an 'elemental' secondary structure model for phylogenetic inference containing only the statistically-supported elements of the structure.

The UNAFold package provides a particularly useful set of tools for exploring various aspects of RNA secondary structure (Markham and Zuker 2008).  UNAFold's 'hybrid-ss.exe' yields a set of '.plot' files that give the probability of each base binding to each other base for all reasonable pairings.  After installing UNAFold and running 'hybrid-ss.exe' on a FASTA-formatted sequence, one can choose the '.plot' file with the number that most closely approximates the typical cellular temperature (in degrees Celsius) of the organism from which the sequence is derived.  This '.plot' file can be modified in Excel by sorting according to 'P(i,j)' values (the probability of pairing) and isolating only the rows for which 'P(i,j)' is above 0.95.  This stringent 95% pairing probability cut-off seems most easily justifiable; however, other cut-off values could potentially be used in the context of this method.

For integrating this type of data into a phylogenetic analysis (e.g., using RAxML 7.2.8; http://wwwkramer.in.tum.de/exelixis/software.html; Stamatakis 2006), the standard 'Vienna' dot-bracket notation is used (Hofacker et al. 1994).  Any standard secondary structure inference program can be used to create an initial structure that may serve as a template; parentheses can be converted to periods using a standard text-editor or secondary structure editing program (e.g., 4SALE; http://4sale.bioapps.biozentrum.uni-wuerzburg.de/; Seibel et al. 2006) for sites whose linkage is statistically non-significant.  These procedures will produce a secondary structure model that includes only the statistically supported elements of structure.  When this 'elemental' secondary structure model is incorporated into phylogenetic analyses, it could serve to decrease the degree of uncertainty inserted into the standard secondary structure-based inferences.

Future advances may allow the integration of various intermediate linkage probabilities to be considered in the calculation of tree likelihoods.  However, it seems that certain theoretical hurdles remain to be overcome before this type of analysis can be possible.  Meanwhile, a methodology like the one outlined here could be beneficial if one wishes to reduce the amount of chance introduced into phylogenetic analyses while still accounting for the fact that certain sites are inextricably linked.

- Brendan

---------------------------------------------------

References

Gutell, R. R., A. Power, G. Z. Hertz, E. J. Putz, and G. D. Stormo. 1992. Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods. Nucleic Acids Research 20(21): 5785–5795.

Hodkinson, B. P., and J. C. Lendemer. In review. Systematics of a enigmatic sterile crustose lichen. 

Hofacker, I. L., W. Fontana, P. F. Stadler, S. Bonhoeffer, M. Tacker, and P. Schuster. 1994. Fast folding and comparison of RNA secondary structures. Monatshefte für Chemie / Chemical Monthly 125: 167-188.

Kimura, M. 1985. The role of compensatory neutral mutations in molecular evolution. Journal of Genetics 64(1):7-19.

Kjer, K. M. 1995. Use of rRNA secondary structure in phylogenetic studies to identify homologous positions: an example of alignment and data presentation from frogs. Molecular Phylogenetics and Evolution 4: 314-330.

Lendemer, J. C., and B. P. Hodkinson. 2009. The wisdom of fools: new molecular and morphological insights into the North American apodetiate species of Cladonia. Opuscula Philolichenum 7: 79-100.

Markham, N., and M. Zuker. 2008. UNAFold: software for nucleic acid folding and hybridization. Methods in Molecular Biology 453: 3-31.

Mathews, D. H., and D. H. Turner. 2006. Prediction of RNA secondary structure by free energy minimization. Journal of Molecular Biology 16(3): 270-278.

Savill N. J., D. C. Hoyle, and P. G. Higgs. 2001. RNA sequence evolution with secondary structure constraints: comparison of substitution rate models using maximum likelihood methods. Genetics 157: 399-411.

Seibel P. N., T. Müller, T. Dandekar, J. Schultz, and M. Wolf. 2006. 4SALE - A tool for synchronous RNA sequence and secondary structure alignment and editing. BMC Bioinformatics 7: 498.

Stamatakis, A. 2006. RAxML-VI-HPC: maximum likelihood- based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22: 2688-2690.

Telford, M., M. Wise, and V. Gowri-Shankar. 2005. Consideration of RNA secondary structure significantly improves likelihood-based estimates of phylogeny: examples from the Bilateria. Molecular Biology and Evolution 22: 1129-1136.

Tillier, E. R. M., and R. A. Collins. 1995. Neighbor-joining and maximum likelihood with RNA sequences: addressing interdependence of sites. Molecular Biology and Evolution 12: 7-15.

Ullrich, B., K. Reinhold, O. Niehuis, and B. Misof. 2010. Secondary structure and phylogenetic analysis of the internal transcribed spacers 1 and 2 of bush crickets (Orthoptera: Tettigoniidae: Barbitistini). Journal of Zoological Systematics and Evolutionary Research 48(3): 219-228.

---------------------------------------------------

This article can be cited as:
Hodkinson, B. P. 2011. Building linkage-probability-based RNA secondary structure models for phylogenetic inference. Squamules Unlimited, New York. [Available at: http://squamules.blogspot.com/2011/11/building-linkage-probability-based-rna.html]