I have recently been conducting phylogenetic and taxonomic studies of selected groups of lichen-forming fungi using sequences from the quickly evolving nuclear ribosomal ITS (internal transcribed spacer) region to examine relationships within and between species (e.g., Hodkinson & Lendemer 2011, Hodkinson et al. 2010, Lendemer & Hodkinson 2009, 2010, in prep). In order to properly analyze the evolutionary relationships between the organisms from which these molecules were derived, I built secondary structure models for the RNA molecules encoded by ITS1 and ITS2 (the two rapidly evolving sections of the ITS region) for some of the groups.
The ITS1 and ITS2 spacer regions encode stretches of RNA that fold up in specific conformations and help to assemble the ribosomes (the pieces of cellular machinery that build protein molecules based on specific messenger RNA sequences transcribed from DNA). The particular folding pattern is referred to as the molecule's "secondary structure." Here is an example of a secondary structure model that I put together for ITS2 of Parmotrema perforatum:
Notice the A(adenine)-U(uracil) pairings and the G(guanine)-C(cytosine) pairings, just like the complementary strands of DNA (except that with DNA you have T for thymine instead of U for uracil).
There are two main reasons that one might want to have a secondary structure model when inferring phylogeny:
[1] Nucleotide Alignment - An understanding of the overall structure of the molecule can aid in discerning which sets of sites in different organisms actually represent the same character when they have different states and there are adjacent nucleotides that have been inserted or deleted in some taxa (Kjer 1995). Many studies use principles of secondary structure to aid in alignment.
[2] Phylogenetic Inference - Since paired sites in some sense evolve in tandem (if one nucleotide changes, the linked nucleotide will often change to compensate over evolutionary time), it is most appropriate within a likelihood framework to apply a different model of evolution to the paired nucleotides so that this can be taken into consideration. This type of inference can be done with RAxML (Stamatakis 2006) and I have recently integrated this into my workflow (Hodkinson & Lendemer in prep).
The really interesting thing to think about is the fact that this type of macromolecule needs to be able to move in order to function, which means that the structure is not actually static, but dynamic. While we usually use the 'best' structure for phylogenetic inference, there are actually many structures that are nearly equally good, and the molecule actually changes its conformation through space and time, flipping between these different conformations in order to perform its functions in the cell. To drive the point home, here is a quick video I made of the ITS2 molecule of Cladonia stipitata Lendemer & Hodkinson (2009) shifting between different likely conformations:
------------------------------
Sources cited:
Download publication (PDF file)
Download nucleotide alignment (NEXUS file)
Hodkinson, B. P., and J. C. Lendemer. In prep. Systematics of a enigmatic sterile crustose lichen.
Hodkinson, B. P., J. C. Lendemer, and T. L. Esslinger. 2010. Parmelia barrenoae, a macrolichen new to North America and Africa. North American Fungi 5(3): 1-5.
Download publication (PDF file)
Kjer, K. M. 1995. Use of rRNA secondary structure in phylogenetic studies to identify homologous positions: an example of alignment and data presentation from the frogs. Molecular Phylogenetics and Evolution 4: 314–330.
Lendemer, J. C., and B. P. Hodkinson. 2009. The Wisdom of Fools: new molecular and morphological insights into the North American apodetiate species of Cladonia. Opuscula Philolichenum 7: 79-100.
Download publication (PDF file)
Download nucleotide alignment (NEXUS file)
Lendemer, J. C., and B. P. Hodkinson. 2010. A new perspective on Punctelia subrudecta in North America: previously-rejected morphological characters corroborate molecular phylogenetic evidence and provide insight into an old problem. The Lichenologist 42(4): 405-421.
Download publication (PDF file)
Download nucleotide alignment (NEXUS file)
Stamatakis, A. 2006. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22: 2688–2690.
Hi Brendan!
ReplyDeleteI just want to thank you for your post. I'm new at this subject and I was just wondering how to align my ITS data. Just one question: what software do you recommend me for getting the secondary structure?
Hi!
ReplyDeleteThanks for your comment! The easiest way to align with RNA secondary structure is to use a model of the molecule that is already published. There may be models available for representatives of your particular genus/family or not. If there are no models available for any organisms that are even close to yours, then you can work on developing your own models if you have sequences representing the full length of the RNA molecule. You can use the CentroidAliFold software in the CentroidFold package [ http://www.ncrna.org/software/centroidfold/download/ ] to infer secondary structure from a Clustal-formatted alignment. It may seem a bit circular to get a structure from an alignment so that you can make an alignment, but the point of having secondary structure is not to allow one to make an alignment from scratch (plenty of programs like ClustalW and MAFFT can do that), but rather to improve an existing alignment. So the initial alignment without secondary structure should be good enough to allow CentroidAliFold to infer the underlying structure, which then can be used to further improve the alignment manually, mostly by matching up the nucleotides along the stems of the stem-loop structures (Kjer 1995; http://dx.doi.org/10.1006/mpev.1995.1028 ). Of course we must remember that secondary structure itself evolves, and if we try to align two sequences of the same gene that are so distant that the structures themselves have diverged, there will be large sections that remain unalignable. This is where software like PICS-Ord can come in handy ( http://www.biomedcentral.com/1471-2105/12/10 ).
It is also worth noting that I use the VARNA java webstart to visualize the Vienna-formatted secondary structure output from CentroidAliFold. I am about to submit a paper that details my workflow; after that, I will have additional posts here giving more information regarding logistics!
ReplyDelete