Recently, I co-authored a paper (Schmull et al. 2011) in which we presented the results of analyses aimed at determining the phylogenetic placement of numerous lineages of lichen-forming fungi that were previously placed in the genus Lecidea based solely on morphology. For a long time, it has been known that the assemblage of species placed in Lecidea by Zahlbruckner did not form a single evolutionary lineage. However, placing all of the species in known families has been problematic. For our paper, we conducted two separate 6-gene analyses of lichen-forming fungi in the class Lecanoromycetes in order to infer the placement of twenty-five Lecidea taxa. Most species fell within three families: Lecanoraceae, Pilocarpaceae, and Lecideaceae (the familiy of the 'real' Lecidea). Those within the first two families will unquestionably need to be given new generic names in the near future. The main story that I hope will come out of this paper is that there is much more work to be done! We have used molecular data to demonstrate the scope of the problem with the genus Lecidea, but the definitive placement of all described species will require a great deal of additional study. I'm looking forward to continuing work on this group in the future!
- Brendan
-----------------------------------------------
Reference
Schmull, M., J. Miadlikowska, M. Pelzer, E. Stocker-Wörgötter, V. Hofstetter, E. Fraker, B. P. Hodkinson, V. Reeb, M. Kukwa, H. T. Lumbsch, F. Kauff, and F. Lutzoni. 2011. Phylogenetic affiliations of members of the heterogeneous lichen-forming genus Lecidea sensu Zahlbruckner (Lecanoromycetes, Ascomycota). Mycologia 103(5): 983-1003.
Download publication (PDF file)
Download nucleotide alignment (NEXUS file)
Download supplementary data table 1 (PDF file)
Download supplementary data table 2 (PDF file)
Tuesday, September 27, 2011
Wednesday, September 14, 2011
Musical Lichenology
I recently received an email from Sean Beeching, famous to readers of this blog for his poetry (click here and here for samples). This is what he wrote:
"Here is a video that Nancy Lowe of discoverlife shot during our lichen workshop [http://www.youtube.com/watch?v=rvxnv-6Z6rg]. I am sawing up branches to show the students the lichens that were growing on them in time to Tommy [Jordan]'s banjo playing. He and I play together in the evenings during the workshop. You might also have a look at the lichen key I made for our students at discoverlife, go to the nature guides at the discoverlife.org website and then page down to 'Lichens, Georgia.' The site just passed a billion hits."
For anyone in the southeastern US interested in learning a thing or two about lichens, I would highly recommend any workshop with Sean!
- Brendan
"Here is a video that Nancy Lowe of discoverlife shot during our lichen workshop [http://www.youtube.com/watch?v=rvxnv-6Z6rg]. I am sawing up branches to show the students the lichens that were growing on them in time to Tommy [Jordan]'s banjo playing. He and I play together in the evenings during the workshop. You might also have a look at the lichen key I made for our students at discoverlife, go to the nature guides at the discoverlife.org website and then page down to 'Lichens, Georgia.' The site just passed a billion hits."
For anyone in the southeastern US interested in learning a thing or two about lichens, I would highly recommend any workshop with Sean!
- Brendan
Wednesday, September 7, 2011
Diversity of Lichenology
Not too long ago I reviewed the 100th anniversary issue of the journal/book series Bibliotheca Lichenologica for The Bryologist. Here is what I wrote:
The 100th volume of Bibliotheca Lichenologica (‘‘Diversity of Lichenology — Anniversary Volume’’) provides important contributions to the field and gives us further insights into the biology of lichens while connecting us to our historical roots. While the volumes of this series have taken many different forms, this edition appears as a standard journal volume, with 18 scientific and historical articles from a total of 37 authors representing a diverse array of lichenologists. It should be noted that the emphasis is on lichens of Eurasia and/or the Southern Hemisphere; however, many of the articles will appeal to a general, worldwide audience.
In terms of taxonomy, this volume will be of particular interest to those following the changes in the family Teloschistaceae. Kondratyuk et al. describe 35 new species in the family. Many of these are known only from the type locality, or a small handful of specimens, making further evaluation of some of the taxa difficult, although the authors are to be commended for providing excellent color photographs of the thalli. The work by Fedorenko et al. focusing on the phylogeny of ‘xanthorioid’ lichens represents a significant contribution in terms of both the data generated and the provisional generic concepts articulated. However, both of the aforementioned works seem to highlight the fact that the largest task still remains: an integrated systematic revision of the family Teloschistaceae that includes crustose, foliose, and fruticose forms. In different ways, these works both improve our understanding of this family for which the taxonomy continues to evolve.
This volume also includes important insights into the taxonomy of the oft-neglected lichenicolous fungi. Hafellner presents an excellent ‘traditional’ treatment of the lichenicolous genera Phacothecium and Phacographa. The work is notably thorough, articulating precisely what is known about the genera, while highlighting areas where additional data and analyses are needed. The author also provides a useful key to opegraphoid lichenicolous fungi with widely exposed hymenia, along with a summary table of the phenotypic characters separating the five opegraphoid arthonialean genera with lichenicolous members discussed in the text (i.e., Opegrapha, Lecanographa, Phacothecium, Phacographa, and Plectocarpon).
Among the works that will appeal to a broader audience is Kärnefelt’s contribution entitled ‘‘Fifty influential lichenologists.’’ This portion of the volume provides a veritable ‘‘Who’s Who’’ of lichenology. The careers of some of the world’s major players in our field, both modern and historical, are briefly summarized, starting with ‘‘the father of lichenology,’’ Erik Acharius. Another work with broad appeal is by Lücking et al., entitled ‘‘How many tropical lichens are there… really?’’ This piece discusses the various factors involved in calculating a ‘ballpark’ estimate of the overall diversity of lichens in the neotropics, the tropics in general, and the globe as a whole (the estimate for the latter comes in at 28,000 species!). Although any estimate is subjective in nature, various points that have not been explicitly integrated into previous estimates are considered (e.g., taxonomic ‘orphans,’ species pairs, photosymbiodemes, chemotypes, and cryptic species). Another portion of the volume that will appeal to amateurs and professionals alike is the section by Randlane et al., which provides what is undoubtedly the best synthetic work on the species of Usnea for the continent of Europe. Range maps and photographs of small-scale features make this section both informative and interesting.
Reading this volume also makes it apparent that the changing landscape of lichenological research has led to certain problems that require special attention. Many of the problematic issues plaguing our field seem to be associated with the process of adjusting to the molecular age. A number of the studies published herein leave the reader wanting to know more, especially in terms of molecular data and how they were analyzed (or how they could be analyzed differently). Beyond the simple deposition of sequences in a public repository, alignments must be reviewed and made available if alignment-based phylogenetic analyses are to be reproducible. Authors will find that making their assembled molecular datasets freely available to readers (on their own personal websites if necessary) increases the impact and relevance of their work by permitting others to easily build on their studies. Ultimately, this practice will allow our field to advance more quickly and will raise the bar in terms of research quality. In summary, this volume of Bibliotheca Lichenologica, as its title suggest, provides an excellent picture of the diversity of lichenology and represents quite well the overall state of the field as we enter this next decade. Many of the works contained in this anniversary edition provide important contributions, and any lichenologist’s collection would be enriched by the addition of this volume.
- Brendan
-----------------------------
Citation:
Hodkinson, B. P. 2010. Lichenological Diversity. The Bryologist 113(4): 828-829.
The 100th volume of Bibliotheca Lichenologica (‘‘Diversity of Lichenology — Anniversary Volume’’) provides important contributions to the field and gives us further insights into the biology of lichens while connecting us to our historical roots. While the volumes of this series have taken many different forms, this edition appears as a standard journal volume, with 18 scientific and historical articles from a total of 37 authors representing a diverse array of lichenologists. It should be noted that the emphasis is on lichens of Eurasia and/or the Southern Hemisphere; however, many of the articles will appeal to a general, worldwide audience.
In terms of taxonomy, this volume will be of particular interest to those following the changes in the family Teloschistaceae. Kondratyuk et al. describe 35 new species in the family. Many of these are known only from the type locality, or a small handful of specimens, making further evaluation of some of the taxa difficult, although the authors are to be commended for providing excellent color photographs of the thalli. The work by Fedorenko et al. focusing on the phylogeny of ‘xanthorioid’ lichens represents a significant contribution in terms of both the data generated and the provisional generic concepts articulated. However, both of the aforementioned works seem to highlight the fact that the largest task still remains: an integrated systematic revision of the family Teloschistaceae that includes crustose, foliose, and fruticose forms. In different ways, these works both improve our understanding of this family for which the taxonomy continues to evolve.
This volume also includes important insights into the taxonomy of the oft-neglected lichenicolous fungi. Hafellner presents an excellent ‘traditional’ treatment of the lichenicolous genera Phacothecium and Phacographa. The work is notably thorough, articulating precisely what is known about the genera, while highlighting areas where additional data and analyses are needed. The author also provides a useful key to opegraphoid lichenicolous fungi with widely exposed hymenia, along with a summary table of the phenotypic characters separating the five opegraphoid arthonialean genera with lichenicolous members discussed in the text (i.e., Opegrapha, Lecanographa, Phacothecium, Phacographa, and Plectocarpon).
Among the works that will appeal to a broader audience is Kärnefelt’s contribution entitled ‘‘Fifty influential lichenologists.’’ This portion of the volume provides a veritable ‘‘Who’s Who’’ of lichenology. The careers of some of the world’s major players in our field, both modern and historical, are briefly summarized, starting with ‘‘the father of lichenology,’’ Erik Acharius. Another work with broad appeal is by Lücking et al., entitled ‘‘How many tropical lichens are there… really?’’ This piece discusses the various factors involved in calculating a ‘ballpark’ estimate of the overall diversity of lichens in the neotropics, the tropics in general, and the globe as a whole (the estimate for the latter comes in at 28,000 species!). Although any estimate is subjective in nature, various points that have not been explicitly integrated into previous estimates are considered (e.g., taxonomic ‘orphans,’ species pairs, photosymbiodemes, chemotypes, and cryptic species). Another portion of the volume that will appeal to amateurs and professionals alike is the section by Randlane et al., which provides what is undoubtedly the best synthetic work on the species of Usnea for the continent of Europe. Range maps and photographs of small-scale features make this section both informative and interesting.
Reading this volume also makes it apparent that the changing landscape of lichenological research has led to certain problems that require special attention. Many of the problematic issues plaguing our field seem to be associated with the process of adjusting to the molecular age. A number of the studies published herein leave the reader wanting to know more, especially in terms of molecular data and how they were analyzed (or how they could be analyzed differently). Beyond the simple deposition of sequences in a public repository, alignments must be reviewed and made available if alignment-based phylogenetic analyses are to be reproducible. Authors will find that making their assembled molecular datasets freely available to readers (on their own personal websites if necessary) increases the impact and relevance of their work by permitting others to easily build on their studies. Ultimately, this practice will allow our field to advance more quickly and will raise the bar in terms of research quality. In summary, this volume of Bibliotheca Lichenologica, as its title suggest, provides an excellent picture of the diversity of lichenology and represents quite well the overall state of the field as we enter this next decade. Many of the works contained in this anniversary edition provide important contributions, and any lichenologist’s collection would be enriched by the addition of this volume.
- Brendan
-----------------------------
Citation:
Hodkinson, B. P. 2010. Lichenological Diversity. The Bryologist 113(4): 828-829.
Wednesday, August 31, 2011
ITS RNA secondary structure
I have recently been conducting phylogenetic and taxonomic studies of selected groups of lichen-forming fungi using sequences from the quickly evolving nuclear ribosomal ITS (internal transcribed spacer) region to examine relationships within and between species (e.g., Hodkinson & Lendemer 2011, Hodkinson et al. 2010, Lendemer & Hodkinson 2009, 2010, in prep). In order to properly analyze the evolutionary relationships between the organisms from which these molecules were derived, I built secondary structure models for the RNA molecules encoded by ITS1 and ITS2 (the two rapidly evolving sections of the ITS region) for some of the groups.
The ITS1 and ITS2 spacer regions encode stretches of RNA that fold up in specific conformations and help to assemble the ribosomes (the pieces of cellular machinery that build protein molecules based on specific messenger RNA sequences transcribed from DNA). The particular folding pattern is referred to as the molecule's "secondary structure." Here is an example of a secondary structure model that I put together for ITS2 of Parmotrema perforatum:
Notice the A(adenine)-U(uracil) pairings and the G(guanine)-C(cytosine) pairings, just like the complementary strands of DNA (except that with DNA you have T for thymine instead of U for uracil).
There are two main reasons that one might want to have a secondary structure model when inferring phylogeny:
[1] Nucleotide Alignment - An understanding of the overall structure of the molecule can aid in discerning which sets of sites in different organisms actually represent the same character when they have different states and there are adjacent nucleotides that have been inserted or deleted in some taxa (Kjer 1995). Many studies use principles of secondary structure to aid in alignment.
[2] Phylogenetic Inference - Since paired sites in some sense evolve in tandem (if one nucleotide changes, the linked nucleotide will often change to compensate over evolutionary time), it is most appropriate within a likelihood framework to apply a different model of evolution to the paired nucleotides so that this can be taken into consideration. This type of inference can be done with RAxML (Stamatakis 2006) and I have recently integrated this into my workflow (Hodkinson & Lendemer in prep).
The really interesting thing to think about is the fact that this type of macromolecule needs to be able to move in order to function, which means that the structure is not actually static, but dynamic. While we usually use the 'best' structure for phylogenetic inference, there are actually many structures that are nearly equally good, and the molecule actually changes its conformation through space and time, flipping between these different conformations in order to perform its functions in the cell. To drive the point home, here is a quick video I made of the ITS2 molecule of Cladonia stipitata Lendemer & Hodkinson (2009) shifting between different likely conformations:
------------------------------
Sources cited:
Download publication (PDF file)
Download nucleotide alignment (NEXUS file)
Hodkinson, B. P., and J. C. Lendemer. In prep. Systematics of a enigmatic sterile crustose lichen.
Hodkinson, B. P., J. C. Lendemer, and T. L. Esslinger. 2010. Parmelia barrenoae, a macrolichen new to North America and Africa. North American Fungi 5(3): 1-5.
Download publication (PDF file)
Kjer, K. M. 1995. Use of rRNA secondary structure in phylogenetic studies to identify homologous positions: an example of alignment and data presentation from the frogs. Molecular Phylogenetics and Evolution 4: 314–330.
Lendemer, J. C., and B. P. Hodkinson. 2009. The Wisdom of Fools: new molecular and morphological insights into the North American apodetiate species of Cladonia. Opuscula Philolichenum 7: 79-100.
Download publication (PDF file)
Download nucleotide alignment (NEXUS file)
Lendemer, J. C., and B. P. Hodkinson. 2010. A new perspective on Punctelia subrudecta in North America: previously-rejected morphological characters corroborate molecular phylogenetic evidence and provide insight into an old problem. The Lichenologist 42(4): 405-421.
Download publication (PDF file)
Download nucleotide alignment (NEXUS file)
Stamatakis, A. 2006. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22: 2688–2690.
Wednesday, August 17, 2011
Taxonomy: Art or Science?
When Googling "science definition," the first thing that came up was "The intellectual and practical activity encompassing the systematic study of the structure and behavior of the physical and natural world through observation and experiment." After a little more research, I was surprised to see that this seems to be one of the stricter definitions of science (others may be as broad as "the state of knowing" or some such...), but it is one with which I can get on board. I tend to think of science itself in a very strict sense, as the process of developing and testing hypotheses. However, my big caveat is that there are many activities that are involved in (and are absolutely essential to) the practice of science that are not science per se according to that definition. This does not diminish their value to science. Some of this has to do with the acquisition of background knowledge that informs the hypotheses to be tested, while some of it is associated with making the results of inquiry available and comprehensible to the scientific community and the public.
So then is taxonomy art or science? With taxonomy, there is not a "right" answer, although there are plenty of wrong answers if one wishes to have a system that is informed by the results of scientific inquiry. Taxonomic units are all in some sense arbitrary. Although a group of organisms may form a "clade," whether we recognize that clade with a certain name is somewhat arbitrary. I personally like to think of taxonomic units being defined by specific innovations (morphological, molecular, ecological, etc.) that have changed the evolutionary trajectory of a group, but that rule is certainly not universally applied, and there could certainly be many alternative taxonomies even if such standards were applied.
For me, the argument for taxonomy as an art does not actually diminish taxonomy in any way as part of what we must do in order to be effective and responsible scientists. In fact, having this perspective on taxonomy can help to enhance the understanding of the significance of taxonomy for science. As scientists, we must use what we discover through the scientific process to help facilitate communication about natural phenomena. Taxonomy is a tool that we use to communicate ideas about organisms, so taxonomy is an absolutely necessary part of the pursuit of scientific truth, even if it is not "science" itself.
One test for me of whether taxonomy is itself a science in the very strictest sense of the word is whether it is directly involved in the process of hypothesis testing. One can use principles of phylogenetics, ecology, or molecular biology to test hypotheses, but taxonomic principles would not be used. When we begin to dissect some of the scientific questions that are often deemed "taxonomic questions," it can be argued that they are not actually taxonomic in nature, and that the taxonomic repercussions would really only be a byproduct of obtaining results through scientific inquiry. For instance, a question like "Is this a good genus?" is really asking something like "Do the species form a distinct clade?", which is a question that is evolutionary in nature. Likewise, the question "Do these individuals make up one species?" is perhaps just a way of saying "How can we properly apply a biological, morphological, chemical, ecological, and/or phylogenetic species concept to this group of individuals?", a question that draws on different fields of biology.
I can see that many systematists would hesitate to state that taxonomy is an art, because of what it implies. If it is an art, then it opens the door for people to say that people who do taxonomy are not really scientists at all. But a consummate scientist is not just someone who constantly tests hypotheses one after another without consideration for anything else. To be a scientist, one must also lay the groundwork for scientific pursuits, and defining the terms used to communicate ideas about specific units of the tree of life (whether or not it is itself an artistic pursuit) is crucial to the advancement of science.
- Brendan
So then is taxonomy art or science? With taxonomy, there is not a "right" answer, although there are plenty of wrong answers if one wishes to have a system that is informed by the results of scientific inquiry. Taxonomic units are all in some sense arbitrary. Although a group of organisms may form a "clade," whether we recognize that clade with a certain name is somewhat arbitrary. I personally like to think of taxonomic units being defined by specific innovations (morphological, molecular, ecological, etc.) that have changed the evolutionary trajectory of a group, but that rule is certainly not universally applied, and there could certainly be many alternative taxonomies even if such standards were applied.
For me, the argument for taxonomy as an art does not actually diminish taxonomy in any way as part of what we must do in order to be effective and responsible scientists. In fact, having this perspective on taxonomy can help to enhance the understanding of the significance of taxonomy for science. As scientists, we must use what we discover through the scientific process to help facilitate communication about natural phenomena. Taxonomy is a tool that we use to communicate ideas about organisms, so taxonomy is an absolutely necessary part of the pursuit of scientific truth, even if it is not "science" itself.
One test for me of whether taxonomy is itself a science in the very strictest sense of the word is whether it is directly involved in the process of hypothesis testing. One can use principles of phylogenetics, ecology, or molecular biology to test hypotheses, but taxonomic principles would not be used. When we begin to dissect some of the scientific questions that are often deemed "taxonomic questions," it can be argued that they are not actually taxonomic in nature, and that the taxonomic repercussions would really only be a byproduct of obtaining results through scientific inquiry. For instance, a question like "Is this a good genus?" is really asking something like "Do the species form a distinct clade?", which is a question that is evolutionary in nature. Likewise, the question "Do these individuals make up one species?" is perhaps just a way of saying "How can we properly apply a biological, morphological, chemical, ecological, and/or phylogenetic species concept to this group of individuals?", a question that draws on different fields of biology.
I can see that many systematists would hesitate to state that taxonomy is an art, because of what it implies. If it is an art, then it opens the door for people to say that people who do taxonomy are not really scientists at all. But a consummate scientist is not just someone who constantly tests hypotheses one after another without consideration for anything else. To be a scientist, one must also lay the groundwork for scientific pursuits, and defining the terms used to communicate ideas about specific units of the tree of life (whether or not it is itself an artistic pursuit) is crucial to the advancement of science.
- Brendan
Thursday, August 4, 2011
Using Sequencher for Multiple Sequence Alignments
Much of the molecular research that I have done over the years has involved working with DNA sequences generated through Sanger sequencing. These sequences are never perfect, and always require manual correction. It is especially helpful to correct sequences and align them to other similar sequences simultaneously. In this way, alignment and structural data can be taken into consideration when interpreting the chromatograms for the DNA sequences.
So I wrote a couple of simple Perl scripts that would allow me to make my alignments in Sequencher (the standard program for editing raw sequence reads) and easily move it over to Mesquite or MacClade (standard programs for assembling data matrices for downstream phylogenetic analyses) so that it could be joined with a reference alignment that I had made previously. In this way, I could avoid completely realigning all sequences to one another through an automatic alignment program, thereby preserving certain sequence alignment patterns (note that I often deal with over 1000 sequences at a time). If you use Linux or Macintosh, running a Perl script is generally a pretty simple matter (since Perl interpreters are typically built into the operating system). If you use Windows, you will probably need to download an interpreter like Strawberry Perl or ActivePerl.
The type of data that I was dealing with was a set of bidirectional Sanger sequences (one forward, one reverse primer for each sequence) of fragments ~650 bp in length. These sequences were cloned and therefore had vector overhang on both ends of both strands, which had to be deleted. If you have data that are similar, here is a procedure that can be used to preserve the Sequencher alignment pattern and bring it into MacClade/Mesquite (potentially for merging with a curated reference alignment, if you have one of these):
[a] In the Sequencher alignment, make sure at least one sequencing strand of each pair of strands (from the bidirectionally-sequenced pool of DNA fragments) has all of the corrected bases, and delete the second strand for each pair. This gives an alignment with one strand for each sequence. [This Sequencher alignment can be tweaked visually to align with a reference set that is already pre-aligned by introducing gaps into the Sequencher alignment to accommodate the gaps in the reference alignment.]
[b] The Sequencher alignment can then be exported as a contig in aligned fasta format and subsequently opened in MacClade/Mesquite. [Note: If you have exported the sequences from Sequencher as a concatenated set of sequence fragments, it might use ':' instead of '-' to represent the gaps; make sure all of the gaps are changed to '-' for integration into MacClade or Mesquite (this can be done as a simple search and replace with any text editor).]
For my particular sequences, I had to deal with the issue of all of the sequence names being proceeded by my initials and having strand-specific information tacked on to the end (both standard pieces of information added by the sequencing facility). Here is another blog post with the Perl script that I wrote for editing the fasta file to extract the 10-digit alpha-numeric code used to identify my sequences. Also, I had to line my sequence block up with the portion of my reference alignment with which it correlated. In my particular situation, the block of sequences that I had aligned began 488 bases into the reference alignment. Here is the script that I used to add 488 bases to the front of each sequence in the fasta file (this script relies on having a 10-digit code name for each sequence):
#!/usr/bin/perl
print "\nPlease type the name of your input file: ";
my $filename = <STDIN>;
chomp $filename;
open (FASTA, $filename);
{
if ($filename =~ /(.*)\.[^.]*/)
{
open OUT, ">$1.ed.fasta";
}
}
while (<FASTA>)
{
if ($_ =~ /^>(..........)/)
{
print OUT "\r>$1\r\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\n";
}
else
{
print OUT $_;
}
}
The final step was to simply open up my reference alignment in MacClade and import the newly-generated fasta file of aligned cloned sequences... and they lined up perfectly! I then tweaked exclusion sets, saved the full alignment, and was ready for downstream phylogenetic analyses.
Even though MacClade and Mesquite are very good programs overall for alignment, aligning a set of 1000+ sequences is extremely cumbersome, and Sequencher can be much faster and easier as long as the sequences are relatively conserved. With this set of Perl scripts discussed above, hopefully researchers will no longer perceive impediments or inefficiency in a process that includes aligning and correcting relatively conserved sequences in Sequencher (with all of the raw sequence data) before moving them over to MacClade/Mesquite for final data set assembly and formatting.
- Brendan
----------------------------------------------
References
The above protocols are published in the following sources:
Hodkinson, B. P. 2011. A Phylogenetic, Ecological, and Functional Characterization of Non-Photoautotrophic Bacteria in the Lichen Microbiome. Doctoral Dissertation, Duke University, Durham, NC.
Download Dissertation (PDF file)
Hodkinson, B. P., N. R. Gottel, C. W. Schadt, and F. Lutzoni. 2011. Data from: Photoautotrophic symbiont and geography are major factors affecting highly structured and diverse bacterial communities in the lichen microbiome. Dryad Digital Repository doi:10.5061/dryad.t99b1.
Hodkinson, B. P., N. R. Gottel, C. W. Schadt, and F. Lutzoni. In press. Photoautotrophic symbiont and geography are major factors affecting highly structured and diverse bacterial communities in the lichen microbiome. Environmental Microbiology.
----------------------------------------------
This work was funded in part by NSF DEB-1011504.
So I wrote a couple of simple Perl scripts that would allow me to make my alignments in Sequencher (the standard program for editing raw sequence reads) and easily move it over to Mesquite or MacClade (standard programs for assembling data matrices for downstream phylogenetic analyses) so that it could be joined with a reference alignment that I had made previously. In this way, I could avoid completely realigning all sequences to one another through an automatic alignment program, thereby preserving certain sequence alignment patterns (note that I often deal with over 1000 sequences at a time). If you use Linux or Macintosh, running a Perl script is generally a pretty simple matter (since Perl interpreters are typically built into the operating system). If you use Windows, you will probably need to download an interpreter like Strawberry Perl or ActivePerl.
The type of data that I was dealing with was a set of bidirectional Sanger sequences (one forward, one reverse primer for each sequence) of fragments ~650 bp in length. These sequences were cloned and therefore had vector overhang on both ends of both strands, which had to be deleted. If you have data that are similar, here is a procedure that can be used to preserve the Sequencher alignment pattern and bring it into MacClade/Mesquite (potentially for merging with a curated reference alignment, if you have one of these):
[a] In the Sequencher alignment, make sure at least one sequencing strand of each pair of strands (from the bidirectionally-sequenced pool of DNA fragments) has all of the corrected bases, and delete the second strand for each pair. This gives an alignment with one strand for each sequence. [This Sequencher alignment can be tweaked visually to align with a reference set that is already pre-aligned by introducing gaps into the Sequencher alignment to accommodate the gaps in the reference alignment.]
[b] The Sequencher alignment can then be exported as a contig in aligned fasta format and subsequently opened in MacClade/Mesquite. [Note: If you have exported the sequences from Sequencher as a concatenated set of sequence fragments, it might use ':' instead of '-' to represent the gaps; make sure all of the gaps are changed to '-' for integration into MacClade or Mesquite (this can be done as a simple search and replace with any text editor).]
For my particular sequences, I had to deal with the issue of all of the sequence names being proceeded by my initials and having strand-specific information tacked on to the end (both standard pieces of information added by the sequencing facility). Here is another blog post with the Perl script that I wrote for editing the fasta file to extract the 10-digit alpha-numeric code used to identify my sequences. Also, I had to line my sequence block up with the portion of my reference alignment with which it correlated. In my particular situation, the block of sequences that I had aligned began 488 bases into the reference alignment. Here is the script that I used to add 488 bases to the front of each sequence in the fasta file (this script relies on having a 10-digit code name for each sequence):
#!/usr/bin/perl
print "\nPlease type the name of your input file: ";
my $filename = <STDIN>
chomp $filename;
open (FASTA, $filename);
{
if ($filename =~ /(.*)\.[^.]*/)
{
open OUT, ">$1.ed.fasta";
}
}
while (<FASTA>
{
if ($_ =~ /^>(..........)/)
{
print OUT "\r>$1\r\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\n";
}
else
{
print OUT $_;
}
}
The final step was to simply open up my reference alignment in MacClade and import the newly-generated fasta file of aligned cloned sequences... and they lined up perfectly! I then tweaked exclusion sets, saved the full alignment, and was ready for downstream phylogenetic analyses.
Even though MacClade and Mesquite are very good programs overall for alignment, aligning a set of 1000+ sequences is extremely cumbersome, and Sequencher can be much faster and easier as long as the sequences are relatively conserved. With this set of Perl scripts discussed above, hopefully researchers will no longer perceive impediments or inefficiency in a process that includes aligning and correcting relatively conserved sequences in Sequencher (with all of the raw sequence data) before moving them over to MacClade/Mesquite for final data set assembly and formatting.
- Brendan
----------------------------------------------
References
The above protocols are published in the following sources:
Hodkinson, B. P. 2011. A Phylogenetic, Ecological, and Functional Characterization of Non-Photoautotrophic Bacteria in the Lichen Microbiome. Doctoral Dissertation, Duke University, Durham, NC.
Download Dissertation (PDF file)
Hodkinson, B. P., N. R. Gottel, C. W. Schadt, and F. Lutzoni. 2011. Data from: Photoautotrophic symbiont and geography are major factors affecting highly structured and diverse bacterial communities in the lichen microbiome. Dryad Digital Repository doi:10.5061/dryad.t99b1.
Hodkinson, B. P., N. R. Gottel, C. W. Schadt, and F. Lutzoni. In press. Photoautotrophic symbiont and geography are major factors affecting highly structured and diverse bacterial communities in the lichen microbiome. Environmental Microbiology.
----------------------------------------------
This work was funded in part by NSF DEB-1011504.
Monday, July 25, 2011
Taxonomy as Wastewater Treatment
I was thinking about taxonomic treatments of specific groups of organisms; in some cases a 'treatment' is essentially a rehash and synthesis of what has been published previously (but just for a specific subregion, etc). While focusing on the mechanistic details of how my colleagues and I put together treatments (not through rehashing), I thought of the process by which sewage is treated. So I did a Google image search for "taxonomic treatment," and basically got a bunch of photos of journal pages/covers, some phylogenetic trees, and some photos of organisms. I think that's pretty much how most people see biological taxonomy, which explains a lot about why taxonomy is often seen as dull and sometimes (even worse) unscientific. But then I did a Google image search for "wastewater treatment" and I was amazed to see that the images there generally matched my concept of how to do a taxonomic treatment much better than what I saw in the previous search! What I saw were mainly flowcharts... they showed processes like screening, pre-treatment, cleaning, clarification, digestion, storage, disposal... and the processes all flowed into one another and ended up making products for public consumption! Yes! This is it! This is how we really need to be doing taxonomy! Instead of perpetuating the problems that exist, take them head on... get the junk out of the way, and make something that people can use. Many researchers seem to be of the mind that if it's mostly right, it's good enough; but as any wastewater treatment plant manager will tell you, even if it's only 10-20% sewage, it's not fit for public consumption. Let us view our taxonomy in the same manner!
Lichenologist James Lendemer is famously quoted as saying "I think of myself as a bounty hunter." Perhaps I should think of myself as a manager of a wastewater treatment plant. Maybe that's not as glorious, but it certainly is important. So much work remains to be done before we get close to having a reasonable set of names for the organisms on the Earth. As long as humans are involved, our nomenclatural system will be imperfect and will require constant cleaning, management, and enforcement of standards... and there I will stand, ready to take on the nastiest and dirtiest of the problems!
- Brendan
P.S. Some recent big news in the NYC area has been the big fire at a wastewater treatment plant that sent sewage spewing into the Hudson River. Thought exercise for taxonomists: Can you think of any events like this one (speaking metaphorically) that affected your particular group of organisms?
Lichenologist James Lendemer is famously quoted as saying "I think of myself as a bounty hunter." Perhaps I should think of myself as a manager of a wastewater treatment plant. Maybe that's not as glorious, but it certainly is important. So much work remains to be done before we get close to having a reasonable set of names for the organisms on the Earth. As long as humans are involved, our nomenclatural system will be imperfect and will require constant cleaning, management, and enforcement of standards... and there I will stand, ready to take on the nastiest and dirtiest of the problems!
- Brendan
P.S. Some recent big news in the NYC area has been the big fire at a wastewater treatment plant that sent sewage spewing into the Hudson River. Thought exercise for taxonomists: Can you think of any events like this one (speaking metaphorically) that affected your particular group of organisms?
Tuesday, July 12, 2011
Man It Feels Good 2 B A Lichen
Recently I was thinking about the plight of the sterile crustose lichens, specifically those in Eastern North America. One could feel sorry for them, being so taxonomically neglected and underrepresented in all major surveys of biodiversity. But in a certain kind of way, I think that they must be very proud. It's certainly amazing how they've been able to get by on so little (so little sex, so little attention from humans). Inspired by the story of the sterile crusts, I decided to write some lyrics, which I entitled "Man, It Feels Good 2 B A Lichen," and set the words to the music of the similarly-titled song (made famous by the movie Office Space) by the Geto Boys, about being a gangsta, not a lichen. Some people have said that it's groundbreaking... that it's a whole new genre (known as "Lichen Rap" or, more commonly, "Lich-Hop")... I just like to think of it as one of the products resulting from the inspiration that I get from the amazing organisms that I study!
Man, It Feels Good 2 B A Lichen (a sterile crust’s song)
Man, it feels good to be a lichen
Live sterile crust lichens ain’t indoors
Real sterile crust lichens are hard to identify
‘cause sterile crust lichens don’t make spores
Man, it feels good to be a lichen
I mean one that you don’t really know
Livin’ as a sterile crust, drivin’ people crazy
‘cause I can’t be identified for sho’
Now sterile crust lichens come in all shapes and colors
Some got killed in the past
But if NSF could just get their back with some fundin’
We could study them and hopefully they’ll last
Now all I gotta say to you
Sexually-reproducin’ lichen-formin’ fungi recombinin’
When your spore can’t find no algae what you think you gonna do?
Man, it feels good to be a lichen
Man, it feels good to be a lichen
Flying round on the currents all day
‘cause when a sterile crust lichen goes and tries to reproduce
It bundles up the partners and it blows away
Now when a lichen like this one is livin’ in your ‘hood
It’s most likely that you ain’t gonna know
‘cause lots of sterile crust lichens are small and inconspicuous
But under UV they might glow
Now all I gotta say to you
Sexually-reproducin’ lichen-formin’ fungi recombinin’
When your spore can’t find no algae what you think you gonna do?
Man, it feels good to be a lichen
Stay tuned for news about the record release party in the Bronx, to take place later this year!
- Brendan
Tuesday, July 5, 2011
Perl: Renaming DNA Sequences
When I began my dissertation studies, I did not know the wonders of the Perl programming language. However, within the past year, it has proven to be an invaluable tool for manipulating DNA sequence data sets and helping me to tackle projects that once seemed too large in scope. In this post I will just give one example of a Perl script that I wrote after getting some training from Dr. Bob Thomson, who I met at the NSF-Sponsored "Fast, Free Phylogenies" workshop at NIMBioS (Knoxville, TN).
When processing large numbers of DNA sequences, it always helps to have a standardized naming system so that the sequences can be handled in an automated way during downstream analyses. For a recent large-scale cloning experiment that involved picking 2880 clonal bacterial colonies (to amplify and sequence a vector-inserted 16S gene fragment from each), I developed a 10-digit alpha-numeric code that allowed me to encode all of the necessary data about my sequences into each specific sequence identifier. However, the sequencing facility also needed to use its own codes to keep track of my sequences, so I ended up with long names that had my own codes in the middle with information identifying them as my sequences in front and information about the individual sequence reads themselves tacked on at the end. Therefore, to recover the names (without retaining the sequencing facility's additions) in an automated way, I wrote a simple Perl script to edit a fasta file containing the sequences (this was run after the process of manual sequence correction had been finished).
The following script (‘Clon_16S_fasta_renamer.pl’) allowed me to extract the 10-digit alpha-numeric codes that I used in my dissertation studies (Hodkinson 2011) from the long names (with extraneous information) that come from the sequencing facility. It creates a new fasta file with these modified identifiers. Specifically, it takes sequences that have "BH_" (my initials), followed by a 10-digit code, followed by additional characters, and simply renames each sequence using just the 10-digit code (effectively stripping out "BH_" at the beginning and and extra characters at the end). The new file will have the same name, but the extension will be replaced by ".ed.fasta". This can be easily modified for any set of sequences that are identified using a standardized naming scheme.
If you need to know how to run a Perl script, you can look it up on Google, but here is one example of how to run a Perl script using Windows (it's actually easier on almost any other type of operating system). Since I was performing a simple task with a very specific data set, it was easy for me to use basic Perl commands. However, for more complex sequence manipulations, BioPerl provides an excellent collection of Perl modules for biological applications.
- Brendan
----------------------------------------------
References
The above script is published in the following sources:
Hodkinson, B. P. 2011. A Phylogenetic, Ecological, and Functional Characterization of Non-Photoautotrophic Bacteria in the Lichen Microbiome. Doctoral Dissertation, Duke University, Durham, NC.
Download Dissertation (PDF file)
Hodkinson, B. P., N. R. Gottel, C. W. Schadt, and F. Lutzoni. 2011. Data from: Photoautotrophic symbiont and geography are major factors affecting highly structured and diverse bacterial communities in the lichen microbiome. Dryad Digital Repository doi:10.5061/dryad.t99b1.
Hodkinson, B. P., N. R. Gottel, C. W. Schadt, and F. Lutzoni. In press. Photoautotrophic symbiont and geography are major factors affecting highly structured and diverse bacterial communities in the lichen microbiome. Environmental Microbiology.
----------------------------------------------
This work was funded in part by NSF DEB-1011504 and EF-0832858.
When processing large numbers of DNA sequences, it always helps to have a standardized naming system so that the sequences can be handled in an automated way during downstream analyses. For a recent large-scale cloning experiment that involved picking 2880 clonal bacterial colonies (to amplify and sequence a vector-inserted 16S gene fragment from each), I developed a 10-digit alpha-numeric code that allowed me to encode all of the necessary data about my sequences into each specific sequence identifier. However, the sequencing facility also needed to use its own codes to keep track of my sequences, so I ended up with long names that had my own codes in the middle with information identifying them as my sequences in front and information about the individual sequence reads themselves tacked on at the end. Therefore, to recover the names (without retaining the sequencing facility's additions) in an automated way, I wrote a simple Perl script to edit a fasta file containing the sequences (this was run after the process of manual sequence correction had been finished).
The following script (‘Clon_16S_fasta_renamer.pl’) allowed me to extract the 10-digit alpha-numeric codes that I used in my dissertation studies (Hodkinson 2011) from the long names (with extraneous information) that come from the sequencing facility. It creates a new fasta file with these modified identifiers. Specifically, it takes sequences that have "BH_" (my initials), followed by a 10-digit code, followed by additional characters, and simply renames each sequence using just the 10-digit code (effectively stripping out "BH_" at the beginning and and extra characters at the end). The new file will have the same name, but the extension will be replaced by ".ed.fasta". This can be easily modified for any set of sequences that are identified using a standardized naming scheme.
#!/usr/bin/perl
print "\nPlease type the name of your input file: ";
my $filename = <STDIN>;
chomp $filename;
open (FASTA, $filename);
{
if ($filename =~ /(.*)\.[^.]*/)
{
open OUT, ">$1.ed.fasta";
}
}
while ( <FASTA>)
{
if ($_ =~ /^>BH\_(..........)/)
{
print OUT ">$1\n";
}
if ($_ =~ /^[A,C,G,T,R,Y,K,M,S,W,B,D,H,V,N,:,-][A,C,G,T,R,Y,K,M,S,W,B,D,H,V,N,:,-][A,C,G,T,R,Y,K,M,S,W,B,D,H,V,N,:,-][A,C,G,T,R,Y,K,M,S,W,B,D,H,V,N,:,-]*/)
{
print OUT $_;
}
if ($_ =~ /^[A,C,G,T,R,Y,K,M,S,W,B,D,H,V,N,:,-][A,C,G,T,R,Y,K,M,S,W,B,D,H,V,N,:,-]$/)
{
print OUT $_;
}
if ($_ =~ /^[A,C,G,T,R,Y,K,M,S,W,B,D,H,V,N,:,-]$/)
{
print OUT $_;
}
}
If you need to know how to run a Perl script, you can look it up on Google, but here is one example of how to run a Perl script using Windows (it's actually easier on almost any other type of operating system). Since I was performing a simple task with a very specific data set, it was easy for me to use basic Perl commands. However, for more complex sequence manipulations, BioPerl provides an excellent collection of Perl modules for biological applications.
- Brendan
----------------------------------------------
References
The above script is published in the following sources:
Hodkinson, B. P. 2011. A Phylogenetic, Ecological, and Functional Characterization of Non-Photoautotrophic Bacteria in the Lichen Microbiome. Doctoral Dissertation, Duke University, Durham, NC.
Download Dissertation (PDF file)
Hodkinson, B. P., N. R. Gottel, C. W. Schadt, and F. Lutzoni. 2011. Data from: Photoautotrophic symbiont and geography are major factors affecting highly structured and diverse bacterial communities in the lichen microbiome. Dryad Digital Repository doi:10.5061/dryad.t99b1.
Hodkinson, B. P., N. R. Gottel, C. W. Schadt, and F. Lutzoni. In press. Photoautotrophic symbiont and geography are major factors affecting highly structured and diverse bacterial communities in the lichen microbiome. Environmental Microbiology.
----------------------------------------------
This work was funded in part by NSF DEB-1011504 and EF-0832858.
Saturday, June 18, 2011
Molecular Phylogenetics Workshop
Next week I will be running a short Molecular Phylogenetics Workshop at Roan Mountain State Park in Tennessee (June 22, 10:30-2:00). The workshop coincides with the meeting of the American Bryological and Lichenological Society, but I will be presenting general principles of molecular evolution and phylogenetic inference that are applicable to any set of organisms.
Here is the abstract:
"Cryptogams are notorious for their paucity of morphological characters when compared with higher plants and animals. As a result, an understanding of molecular data and what they can reveal in terms of evolution is perhaps more crucial in these organisms than in many others. Workshop participants will explore principles of molecular phylogenetics and learn basic protocols for running phylogenetic analyses. The main objectives will be (1) to promote an understanding of how events in the course of molecular sequence evolution affect phylogenetic inference, (2) to explore the advantages and disadvantages of different phylogenetic methods, and (3) to facilitate sound research into the phylogenetic history of life. The workshop will include both lecture and discussion. Participants are invited to bring their own data sets for more detailed evaluation at the conclusion of the workshop."
For those scheduled to attend, I look forward to seeing you there! For those not attending, I hope to see you at a future workshop!
- Brendan
----------------------------------------------
This work was was made possible in part by NSF (DEB-1011504) and the American Bryological and Lichenological Society.
Here is the abstract:
"Cryptogams are notorious for their paucity of morphological characters when compared with higher plants and animals. As a result, an understanding of molecular data and what they can reveal in terms of evolution is perhaps more crucial in these organisms than in many others. Workshop participants will explore principles of molecular phylogenetics and learn basic protocols for running phylogenetic analyses. The main objectives will be (1) to promote an understanding of how events in the course of molecular sequence evolution affect phylogenetic inference, (2) to explore the advantages and disadvantages of different phylogenetic methods, and (3) to facilitate sound research into the phylogenetic history of life. The workshop will include both lecture and discussion. Participants are invited to bring their own data sets for more detailed evaluation at the conclusion of the workshop."
For those scheduled to attend, I look forward to seeing you there! For those not attending, I hope to see you at a future workshop!
- Brendan
----------------------------------------------
This work was was made possible in part by NSF (DEB-1011504) and the American Bryological and Lichenological Society.
Friday, June 17, 2011
Writing a Phycas Script
For a while I have been wary of phylogenetic results supported only by Bayesian analyses, because of the so-called 'star-tree paradox' that haunts MrBayes and even some other programs like it. As I have mentioned previously, one of the best features of the Bayesian phylogenetic program Phycas is that it gives one the opportunity to allow polytomies in the trees sampled as part of the posterior (which can often deflate the inflated posterior probability values seen with programs like MrBayes). The specific command for this is:
"mcmc.allow_polytomies = True"
To run Phycas, it is best to write a script to go with a standard NEXUS-formatted sequence alignment. There is some basic information on how to install and run Phycas in my previous post:
http://squamules.blogspot.com/2011/06/installing-and-running-phycas.html
However, that post does not go into any of the details of scripting for Phycas. Recently, I ran a multigene analysis with mtSSU, ITS1, 5.8S, and ITS2 in different partitions, with a different evolutionary model for each. Here is what my Phycas script looked like:
from phycas import *
setMasterSeed(98765)
mcmc.data_source = 'Input_file_name.nex'
mcmc.out.log = 'Output_file_name.log'
mcmc.out.log.mode = REPLACE
mcmc.allow_polytomies = True
mcmc.polytomy_prior = False
mcmc.topo_prior_C = 1.0
mcmc.out.trees.prefix = 'Output_file_name'
mcmc.out.params.prefix = 'Output_file_name'
mcmc.ncycles = 50000
mcmc.sample_every = 10
# Set up the K80+I model for 5pt8S
model.type="hky"
model.state_freqs = [0.25, 0.25, 0.25, 0.25]
model.fix_freqs = True
model.kappa = 2.0
model.kappa_prior = BetaPrime(1.0, 1.0)
model.pinvar_model = True
# Save the K80+I model for 5pt8S
m3 = model()
# Set up the GTR+I model for mtSSU
model.type="gtr"
model.state_freqs = [0.3338, 0.1493, 0.1983, 0.3187]
model.fix_freqs = False
model.relrates = [1.4783, 5.8050, 3.3222, 0.6768, 7.6674, 1.0000]
model.pinvar_model = True
# Save the GTR+I model for mtSSU
m1 = model()
# Set up the HKY+G model for ITS1
model.type="hky"
model.state_freqs = [0.1487, 0.3566, 0.2704, 0.2244]
model.fix_freqs = False
model.kappa = 2.0
model.kappa_prior = BetaPrime(1.0, 1.0)
model.num_rates = 4
model.gamma_shape = 0.5
model.gamma_shape_prior = Exponential(1.0)
model.pinvar_model = False
# Save the HKY+G model for ITS1
m2 = model()
# Set up the HKY+G model for ITS2
model.state_freqs = [0.1419, 0.3069, 0.3199, 0.2314]
# Save the HKY+G model for ITS2
m4 = model()
# Define partition subsets
mtssu = subset(1, 1080)
its1 = subset(1081, 1607)
fivept8S = subset(1608, 1768)
its2 = subset(1769, 2041)
# Assign partition models to subsets
partition.addSubset(mtssu, m1, "mtSSU")
partition.addSubset(its1, m2, "ITS1")
partition.addSubset(fivept8S, m3, "5pt8S")
partition.addSubset(its2, m4, "ITS2")
partition()
# Start the run
mcmc()
# Summarize the posterior
sumt.trees = 'trees.t'
sumt.burnin = 500
sumt.tree_credible_prob = 1.0
sumt()
Although I have some notes within the script, please see the Phycas manual for instructions on what each of the individual commands does. Hopefully more people will be using Phycas (and allowing polytomies!) in the future!
- Brendan
Note: One question that I had about running Phycas was how to define exclusion sets; however, Phycas apparently can read the EXSET line of the ASSUMPTIONS block of the NEXUS file in the same way that Mesquite, MacClade, and PAUP* can.
"mcmc.allow_polytomies = True"
To run Phycas, it is best to write a script to go with a standard NEXUS-formatted sequence alignment. There is some basic information on how to install and run Phycas in my previous post:
http://squamules.blogspot.com/2011/06/installing-and-running-phycas.html
However, that post does not go into any of the details of scripting for Phycas. Recently, I ran a multigene analysis with mtSSU, ITS1, 5.8S, and ITS2 in different partitions, with a different evolutionary model for each. Here is what my Phycas script looked like:
from phycas import *
setMasterSeed(98765)
mcmc.data_source = 'Input_file_name.nex'
mcmc.out.log = 'Output_file_name.log'
mcmc.out.log.mode = REPLACE
mcmc.allow_polytomies = True
mcmc.polytomy_prior = False
mcmc.topo_prior_C = 1.0
mcmc.out.trees.prefix = 'Output_file_name'
mcmc.out.params.prefix = 'Output_file_name'
mcmc.ncycles = 50000
mcmc.sample_every = 10
# Set up the K80+I model for 5pt8S
model.type="hky"
model.state_freqs = [0.25, 0.25, 0.25, 0.25]
model.fix_freqs = True
model.kappa = 2.0
model.kappa_prior = BetaPrime(1.0, 1.0)
model.pinvar_model = True
# Save the K80+I model for 5pt8S
m3 = model()
# Set up the GTR+I model for mtSSU
model.type="gtr"
model.state_freqs = [0.3338, 0.1493, 0.1983, 0.3187]
model.fix_freqs = False
model.relrates = [1.4783, 5.8050, 3.3222, 0.6768, 7.6674, 1.0000]
model.pinvar_model = True
# Save the GTR+I model for mtSSU
m1 = model()
# Set up the HKY+G model for ITS1
model.type="hky"
model.state_freqs = [0.1487, 0.3566, 0.2704, 0.2244]
model.fix_freqs = False
model.kappa = 2.0
model.kappa_prior = BetaPrime(1.0, 1.0)
model.num_rates = 4
model.gamma_shape = 0.5
model.gamma_shape_prior = Exponential(1.0)
model.pinvar_model = False
# Save the HKY+G model for ITS1
m2 = model()
# Set up the HKY+G model for ITS2
model.state_freqs = [0.1419, 0.3069, 0.3199, 0.2314]
# Save the HKY+G model for ITS2
m4 = model()
# Define partition subsets
mtssu = subset(1, 1080)
its1 = subset(1081, 1607)
fivept8S = subset(1608, 1768)
its2 = subset(1769, 2041)
# Assign partition models to subsets
partition.addSubset(mtssu, m1, "mtSSU")
partition.addSubset(its1, m2, "ITS1")
partition.addSubset(fivept8S, m3, "5pt8S")
partition.addSubset(its2, m4, "ITS2")
partition()
# Start the run
mcmc()
# Summarize the posterior
sumt.trees = 'trees.t'
sumt.burnin = 500
sumt.tree_credible_prob = 1.0
sumt()
Although I have some notes within the script, please see the Phycas manual for instructions on what each of the individual commands does. Hopefully more people will be using Phycas (and allowing polytomies!) in the future!
- Brendan
Note: One question that I had about running Phycas was how to define exclusion sets; however, Phycas apparently can read the EXSET line of the ASSUMPTIONS block of the NEXUS file in the same way that Mesquite, MacClade, and PAUP* can.
Tuesday, June 14, 2011
Installing and Running Phycas
Phycas has recently earned a high spot on my short list of favorite computer programs for phylogenetics. Phycas is the amazing program that can run a Bayesian phylogenetic inference without being susceptible to the 'star-tree paradox' because it allows for the existence of polytomies in the sampled trees.
From an academic perspective, Phycas is actually a pretty easy program to run and install. Still, some additional notes on tricks and tips for running it were beneficial to one of my colleagues who was really having trouble getting it to go. Here were my instructions for installing Phycas on a Windows machine:
1) Install Python 2.7. [I use the Enthought Python Distribution, available here: http://www.enthought.com/products/epd.php. Everything is bundled together so components like SciPy, NumPy, etc., never need to be installed individually and the different versions of the components are all guaranteed to play well together.]
2) Follow the instructions here:
http://hydrodictyon.eeb.uconn. edu/projects/phycas/index.php/ Telling_Windows_where_to_find_ Python
to append Python27 (different from the versions they have listed there) to your PATH (I guess if your PATH is truly empty then you will just leave out the semi-colon; otherwise, keep whatever's already in your PATH in there, but just add ;C:\Python27 to the end of it it). [It might also be important to make sure that the PYTHON-STARTUP environmental variable says C:\Python27 (if you have that variable... mine was still set to 2.6, meaning that the wrong version of Python would likely open up by default), and that this is all being done for the system level and the user level... I was only doing it for the user level for a while and it got me mixed up.]
3) Do the 4-step Phycas installation as outlined on the "Windows XP/Windows Vista/Windows 7" section of this website (the manual itself is apparently wrong, so be careful here).
4) For your own particular analysis, put the NEXUS-formatted alignment file ('.nex') and the phycas script file ('.py') on the Desktop where you have the shortcut to the '.bat' file. [For more on writing a Phycas script, stay turned to this blog!]
5) Drag and drop the phycas script file (.py) onto 'Shortcut to phycas.bat'.
I'll have another blog post that goes more into the details of Phycas scripting, but I hope this post helps jump-start some of those eager to deflate their inflated posterior probability values!
- Brendan
From an academic perspective, Phycas is actually a pretty easy program to run and install. Still, some additional notes on tricks and tips for running it were beneficial to one of my colleagues who was really having trouble getting it to go. Here were my instructions for installing Phycas on a Windows machine:
1) Install Python 2.7. [I use the Enthought Python Distribution, available here: http://www.enthought.com/products/epd.php. Everything is bundled together so components like SciPy, NumPy, etc., never need to be installed individually and the different versions of the components are all guaranteed to play well together.]
2) Follow the instructions here:
http://hydrodictyon.eeb.uconn.
to append Python27 (different from the versions they have listed there) to your PATH (I guess if your PATH is truly empty then you will just leave out the semi-colon; otherwise, keep whatever's already in your PATH in there, but just add ;C:\Python27 to the end of it it). [It might also be important to make sure that the PYTHON-STARTUP environmental variable says C:\Python27 (if you have that variable... mine was still set to 2.6, meaning that the wrong version of Python would likely open up by default), and that this is all being done for the system level and the user level... I was only doing it for the user level for a while and it got me mixed up.]
3) Do the 4-step Phycas installation as outlined on the "Windows XP/Windows Vista/Windows 7" section of this website (the manual itself is apparently wrong, so be careful here).
4) For your own particular analysis, put the NEXUS-formatted alignment file ('.nex') and the phycas script file ('.py') on the Desktop where you have the shortcut to the '.bat' file. [For more on writing a Phycas script, stay turned to this blog!]
5) Drag and drop the phycas script file (.py) onto 'Shortcut to phycas.bat'.
I'll have another blog post that goes more into the details of Phycas scripting, but I hope this post helps jump-start some of those eager to deflate their inflated posterior probability values!
- Brendan
Monday, June 13, 2011
A New Home
I have relocated and am finally settled in New York! I will be doing my post-doctoral work at the New York Botanical Garden (NYBG) using advanced bioinformatics tools to study lichen ecology and evolution with Richard C. Harris and James C. Lendemer. Here is my new address:
International Plant Science Center
The New York Botanical Garden
2900 Southern Blvd.
Bronx, NY 10458-5126
NYBG has an amazing research program that is not typical for a botanical garden. The departments within the International Plant Science Center include the following: Institute of Systematic Botany, Cullman Program for Molecular Systematics, Plant Research Laboratory, NY Plant Genomics Consortium, Steere Herbarium, Graduate Studies, Mertz Library, and NYBG Press (plus more). My research will likely connect in some way with all of these departments, which is why NYBG is a perfect environment for my post-doctoral research.
Those who have been following my research closely might say "obviously you study diverse organisms from across the tree of life, but since when do you study plants?" Since the concept of "plants" once included fungi, the "plant" research program at NYBG (which began taking shape over 100 years ago) provides amazing resources for the study of fungal biology as well. Of course, I will specifically be focusing my energy on the lichen-forming fungi.
Please stay tuned for more on my research as it unfolds in New York City!
- Brendan
International Plant Science Center
The New York Botanical Garden
2900 Southern Blvd.
Bronx, NY 10458-5126
NYBG has an amazing research program that is not typical for a botanical garden. The departments within the International Plant Science Center include the following: Institute of Systematic Botany, Cullman Program for Molecular Systematics, Plant Research Laboratory, NY Plant Genomics Consortium, Steere Herbarium, Graduate Studies, Mertz Library, and NYBG Press (plus more). My research will likely connect in some way with all of these departments, which is why NYBG is a perfect environment for my post-doctoral research.
Those who have been following my research closely might say "obviously you study diverse organisms from across the tree of life, but since when do you study plants?" Since the concept of "plants" once included fungi, the "plant" research program at NYBG (which began taking shape over 100 years ago) provides amazing resources for the study of fungal biology as well. Of course, I will specifically be focusing my energy on the lichen-forming fungi.
Please stay tuned for more on my research as it unfolds in New York City!
- Brendan
Monday, May 23, 2011
Graduation/Dissertation
Upon my graduation, I thought I'd give a little update. The thesis defense last month was successful and the final draft of my dissertation has been submitted! My thesis was on the communities of non-photoautotrophic bacteria associated with lichens. This research broke a lot of new ground in terms of using high-throughput pyrosequencing and cutting-edge bioinformatics to examine the phylogenetic, ecological, and functional complexity of the lichen microbiome. My hope is that my dissertation will contribute to science through both the data that I have generated and the set of tools that I have developed. I have talked a little about some of the bioinformatics tools that I have developed in previous posts here, but I will continue to post additional elements of my thesis, especially as they are published in peer-reviewed journals.
The graduation itself gave me a final chance to reflect on the great opportunities that have been made available to me at Duke University. Here you can see me chillin' out after the hooding ceremony:
This week I'm in the process of moving up to NY to start my postdoctoral research on the systematics of lichen-forming fungi at the New York Botanical Garden. I will be working with Dr. Richard C. Harris and James Lendemer, focusing on long-standing problems in Eastern North American lichen taxonomy, while testing a number of specific ecological and biogeographical hypotheses. Addressing many of these issues will require cutting-edge bioinformatics tools that I have developed or am in the process of developing with a network of collaborators in diverse fields.
- Brendan
The graduation itself gave me a final chance to reflect on the great opportunities that have been made available to me at Duke University. Here you can see me chillin' out after the hooding ceremony:
This week I'm in the process of moving up to NY to start my postdoctoral research on the systematics of lichen-forming fungi at the New York Botanical Garden. I will be working with Dr. Richard C. Harris and James Lendemer, focusing on long-standing problems in Eastern North American lichen taxonomy, while testing a number of specific ecological and biogeographical hypotheses. Addressing many of these issues will require cutting-edge bioinformatics tools that I have developed or am in the process of developing with a network of collaborators in diverse fields.
- Brendan
Monday, May 16, 2011
Investing in Science
Several months back I worked with a group of other student members of the Botanical Society of America (BSA) to draft an open letter to lawmakers to express our hope that policymakers in Washington, DC, would sustain a national commitment to invest in our nation's scientific research, development, and education. This has now evolved into a petition. Please read the email below from the BSA student representatives to find out more about this effort:
"
Attention Students: Ask Lawmakers to Support Science Education and Research
First, we'd like to thank all of you who have taken action and responded to this call. Please take the extra step and ask your friends to consider doing so as well.
The end of the academic year is quickly approaching. Before you venture off for the summer research season or for a short break from classes, you have one more assignment. I am writing to ask that you join with other science students from across the country to sign an online petition to lawmakers. This statement reminds our elected leaders that scientific research and education are keys to our future and asks them to continue to make important investments in the scientific programs that will support your education and preparation for future careers in research, teaching, or the myriad fields that grow and benefit from scientific research.
We already have more than 2,750 signatures, but we would like to have more than 5,000 by the end of May. So, if you have not already signed this online petition, please do so today at http://www.aibs.org/public- policy/science_students_ letter.html. You may also sign the petition and encourage your friends to sign via Facebook -- http://www.facebook.com/pages/ Students-Sign-the-Open-Letter- to-Policymakers-About- Investments-in-Science/ 183684855001704.
Thank you for your time and support!
Sincerely,
Botanical Society of America Student Representatives, Marian Chau (University of Hawai`i at Manoa) and Rachel Meyer (New York Botanical Garden)
"
"
Attention Students: Ask Lawmakers to Support Science Education and Research
First, we'd like to thank all of you who have taken action and responded to this call. Please take the extra step and ask your friends to consider doing so as well.
The end of the academic year is quickly approaching. Before you venture off for the summer research season or for a short break from classes, you have one more assignment. I am writing to ask that you join with other science students from across the country to sign an online petition to lawmakers. This statement reminds our elected leaders that scientific research and education are keys to our future and asks them to continue to make important investments in the scientific programs that will support your education and preparation for future careers in research, teaching, or the myriad fields that grow and benefit from scientific research.
We already have more than 2,750 signatures, but we would like to have more than 5,000 by the end of May. So, if you have not already signed this online petition, please do so today at http://www.aibs.org/public-
Thank you for your time and support!
Sincerely,
Botanical Society of America Student Representatives, Marian Chau (University of Hawai`i at Manoa) and Rachel Meyer (New York Botanical Garden)
"
Friday, May 13, 2011
Punctelia eganii Hodkinson & Lendemer, sp. nov., a rare chemical oddity
Just today I had an article published in the journal Opuscula Philolichenum in which I, along with James Lendemer, describe a new species in the genus Punctelia that has only been found once and contains a chemical compound (lichexanthone) that has otherwise not been seen in this genus. The species was collected along the Alabama River in historic Monroe County, Alabama, near Monroeville (childhood home of Harper Lee and Truman Capote; "the literary capital of Alabama").
Under an ultraviolet light, it makes pin-pricks of bright light across the surface (due to the localized presence of lichexanthone). The species is named Punctelia eganii, after Dr. Bob Egan of the University of Nebraska at Omaha, who first collected the species and brought it to our attention. The paper ends with a discussion of 'chemotaxonomy' and the evolving views on the role of secondary chemistry in lichen taxonomy.
- Brendan
--------------------------------------------------------------
Reference:
Hodkinson, B. P., and J. C. Lendemer. 2011. Punctelia eganii, a new species in the P. rudecta group with a novel secondary compound for the genus. Opuscula Philolichenum 9: 35-38.
Download publication (PDF file)
Under an ultraviolet light, it makes pin-pricks of bright light across the surface (due to the localized presence of lichexanthone). The species is named Punctelia eganii, after Dr. Bob Egan of the University of Nebraska at Omaha, who first collected the species and brought it to our attention. The paper ends with a discussion of 'chemotaxonomy' and the evolving views on the role of secondary chemistry in lichen taxonomy.
- Brendan
--------------------------------------------------------------
Reference:
Hodkinson, B. P., and J. C. Lendemer. 2011. Punctelia eganii, a new species in the P. rudecta group with a novel secondary compound for the genus. Opuscula Philolichenum 9: 35-38.
Download publication (PDF file)
Tuesday, May 10, 2011
PICS-Ord Tricks
Those who have ever used the R statistical package know that it can be a bit tricky, especially if you're first starting out.
The R-based PICS-Ord program (developed to recode ambiguously-aligned regions for phylogenetic analyses; see here and here and here for more information) was written to be as simple as possible while remaining flexible and adjustable. A set of pretty comprehensive instructions was produced to facilitate analyses and provide recommendations for basic use (see 'manual.pdf' available here). The manual is where everyone should look first for help with PICS-Ord. However, those who are not experts in R and/or the command line may benefit from some additional information on how to implement PICS-Ord without really having to know much background information on the R statistical package. Therefore, the goal of this post is to detail one relatively simple way of implementing the PICS-Ord program on a PC.
Here are instructions for one way of running PICS-Ord on Microsoft Windows. Note that these instructions will only work with installations of the Windows version of R (try 2.12.0; the most recent version of R did not work at the time that this was last updated) [ http://cran.r-project.org/bin/ windows/base/old/2.12.0/ ] and the Ngila Windows executable [ http://scit.us/projects/ngila/ ] (for the latter, choose the option of putting Ngila in your PATH upon installation).
1) Place picsord.R (found in the picsord.zip archive) in the same folder as ngila.exe (probably C:\\"Program Files"\Ngila\bin\).
[Note: If Ngila is not in your path, you can go into the picsord.R file and change "ngila" (the one in quotes) to "ngila.exe" in the first line that is not preceded by a hash mark (#), or you can type out the full path (but this will not be necessary if you are following the rest of this procedure). Some users have manually edited picsord.R by deleting the first line (the line specifying the location of Rscript); this seems unnecessary but may be worthwhile on certain machines.]
2) Save/copy the input fasta file to a directory (e.g., your home directory, Desktop, or C:\).
3) In the Command Prompt window, use the 'cd' or 'chdir' command to navigate to the directory in which the input fasta file is stored (or simply put the input fasta file in the home directory so that navigation is not necessary), then type the following (this line can be modified based on the version of R being used or the specific location on the drive where Rscript and picsord.R are located):
C:\\"Program Files"\R\R-2.12.0\bin\Rscript.exe C:\\"Program Files"\Ngila\bin\picsord.R input.fas > output.phy
Multiple regions can be processed by running the command in step 3 separately for each region, or one can use the .bat file that comes as part of the PICS-Ord package (use of the .bat file is outlined in the manual). After this, the phylip-formatted PICS-Ord alignment portions can be pasted at the end of the original nucleotide alignment alongside the unambiguously-aligned sites. Please see the manual for further recommendations regarding implementation.
-Brendan
References:
Lücking, R., B. P. Hodkinson, A. Stamatakis, and R. A. Cartwright. 2011. PICS-Ord: Unlimited Coding of Ambiguous Regions by Pairwise Identity and Cost Scores Ordination. BMC Bioinformatics 12: 10.
Download publication (PDF file)
Download R-based PICS-Ord program (zipped program package)
View program wiki (website)
The R-based PICS-Ord program (developed to recode ambiguously-aligned regions for phylogenetic analyses; see here and here and here for more information) was written to be as simple as possible while remaining flexible and adjustable. A set of pretty comprehensive instructions was produced to facilitate analyses and provide recommendations for basic use (see 'manual.pdf' available here). The manual is where everyone should look first for help with PICS-Ord. However, those who are not experts in R and/or the command line may benefit from some additional information on how to implement PICS-Ord without really having to know much background information on the R statistical package. Therefore, the goal of this post is to detail one relatively simple way of implementing the PICS-Ord program on a PC.
Here are instructions for one way of running PICS-Ord on Microsoft Windows. Note that these instructions will only work with installations of the Windows version of R (try 2.12.0; the most recent version of R did not work at the time that this was last updated) [ http://cran.r-project.org/bin/
1) Place picsord.R (found in the picsord.zip archive) in the same folder as ngila.exe (probably C:\\"Program Files"\Ngila\bin\).
[Note: If Ngila is not in your path, you can go into the picsord.R file and change "ngila" (the one in quotes) to "ngila.exe" in the first line that is not preceded by a hash mark (#), or you can type out the full path (but this will not be necessary if you are following the rest of this procedure). Some users have manually edited picsord.R by deleting the first line (the line specifying the location of Rscript); this seems unnecessary but may be worthwhile on certain machines.]
2) Save/copy the input fasta file to a directory (e.g., your home directory, Desktop, or C:\).
3) In the Command Prompt window, use the 'cd' or 'chdir' command to navigate to the directory in which the input fasta file is stored (or simply put the input fasta file in the home directory so that navigation is not necessary), then type the following (this line can be modified based on the version of R being used or the specific location on the drive where Rscript and picsord.R are located):
C:\\"Program Files"\R\R-2.12.0\bin\Rscript.exe C:\\"Program Files"\Ngila\bin\picsord.R input.fas > output.phy
After this, the output phylip file should appear in the working directory.
Multiple regions can be processed by running the command in step 3 separately for each region, or one can use the .bat file that comes as part of the PICS-Ord package (use of the .bat file is outlined in the manual). After this, the phylip-formatted PICS-Ord alignment portions can be pasted at the end of the original nucleotide alignment alongside the unambiguously-aligned sites. Please see the manual for further recommendations regarding implementation.
-Brendan
References:
Lücking, R., B. P. Hodkinson, A. Stamatakis, and R. A. Cartwright. 2011. PICS-Ord: Unlimited Coding of Ambiguous Regions by Pairwise Identity and Cost Scores Ordination. BMC Bioinformatics 12: 10.
Download publication (PDF file)
Download R-based PICS-Ord program (zipped program package)
View program wiki (website)
Sunday, May 8, 2011
Semi-Cryptic Species
Last week the most recent volume of Bibliotheca Lichenologica (dedicated to Tom Nash) was released. There were 33 peer-reviewed contributions by 70 authors, and one of these contributions was a paper that I wrote with James Lendemer of the New York Botanical Garden. In short, the paper demonstrates with molecular data that the species Xanthoparmelia tasmanica (Parmeliaceae) contains at least two species that cannot be differentiated based on any known morphological or chemical characters. However, the two species belong to two larger clades within the genus Xanthoparmelia, one of which seems to be exclusively Australasian, and the other of which is distributed across the Earth's other continents.
The evolutionary pattern seen here is strikingly similar to what is found in placental mammals and marsupial mammals, with two larger clades of organisms (one of which is almost exclusively Australasian) in which certain pairs of species have converged on similar morphologies. The pair of Xanthoparmelia tasmanica and Xanthoparmelia hypofusca (the new name of the other species, which we sampled in North America) is interesting because it is the first known example of complete convergence in this group, where no distinguishing morphological or chemical characters could be identified for two species found to be in these two major clades. However, we use the term 'semi-cryptic' (as opposed to 'cryptic') to describe the pair of species, since geography would seem to indicate which species is represented by any given sample.
A press release came out from Duke for this story. The following list represents a handful of websites that have covered the story:
http://today.duke.edu/2011/05/lichen
http://dukemagazine.duke.edu/article/symbiotic-association
http://www.evoscience.com/2423/lichen-could-be-a-fungal-equivalent-at-least-evolutionarily/
http://www.sciencedaily.com/releases/2011/05/110502110622.htm
http://www.biologynews.net/archives/2011/05/02/lichen_evolved_on_2_tracks_like_marsupials_and_mammals.html
http://archaeologynewsnetwork.blogspot.com/2011/05/lichen-evolved-on-2-tracks-like.html
http://anpron.eu/?p=2023
http://scienceblog.com/44922/lichen-evolved-on-2-tracks-like-marsupials-and-mammals/
http://www.geneticarchaeology.com/research/Lichen_evolved_on_2_tracks_like_marsupials_and_mammals.asp
http://www.australasianscience.com.au/news/may-2011/lichen-evolved-two-tracks-marsupials-and-mammals.html
http://feelsynapsis.com/pg/blog/read/64210/the-lichens-are-a-good-example-of-convergent-evolution
http://www.noodls.com/viewNoodl/9870752/duke-university/lichen-evolved-on-two-tracks-like-mammals-and-marsupials
http://7thspace.com/headlines/381043/lichen_that_seem_identical_in_all_outward_appearances_are_in_fact_two_different_species.html
http://esciencenews.com/articles/2011/05/02/lichen.evolved.2.tracks.marsupials.and.mammals
http://pda.physorg.com/news/2011-05-lichen-evolved-tracks-marsupials-mammals.html
http://www.uux.cn/viewnews-27222.html
http://www.cnfossil.com/?action-viewnews-itemid-205
http://tieba.baidu.com/f?kz=1070028772
http://www.nigpas.cas.cn/kxcb/kpwz/201105/t20110506_3128772.html
http://www.uua.cn/news/show-11534-1.html
http://www.sciencetechmag.com/lichen-evolved-on-two-tracks-like-marsupials-and-mammals.html
http://www.ebionews.com/news-center/general-research/evolution/37344-lichen-evolved-on-two-tracks-like-marsupials-and-mammals.html
http://www.astrobio.net/pressrelease/3948/lichens-two-track-evolution
http://www.eurekalert.org/pub_releases/2011-05/du-leo050211.php
http://forum.grasscity.com/science-nature/806894-evolution-action-convergent-species.html
http://www.labspaces.net/110538/Lichen_evolved_on___tracks__like_marsupials_and_mammals
http://pda.physorg.com/news/2011-05-lichen-evolved-tracks-marsupials-mammals.html
http://www.verticalnews.com/premium_newsletters/Journal-of-Technology-and-Science/2011-05-22/213JTS.html
http://www.solociencia.com/biologia/11061009.htm [In Spanish]
http://www.firstscience.com/home/news/breaking-news-all-topics/lichen-evolved-on-2-tracks-like-marsupials-and-mammals_104827.html
http://www.irlab.org/now/GetSummary.do?id=3590
http://www.bioquicknews.com/node/494
http://i.bioknow.cn/portal/root/rsp/zx_nr.jsp?id=64585741
http://www.noticias21.com/node/3503 [In Spanish]
http://riktningnews.bio-medicine.org/biology-news-1/Lichen-evolved-on-2-tracks--like-marsupials-and-mammals-19054-1/
http://www.sciencenewsline.com/biology/2011050313000026.html?continue=y
http://www.sciencecodex.com/lichen_evolved_on_2_tracks_like_marsupials_and_mammals
http://www.onenewspage.com/news/Science/20110503/21924744/Lichen-Evolved-On-Tracks-Like-Marsupials-And.htm
http://www.astrobio.net/pdffiles/news_3948.pdf
http://chemical-and-chemistry.verticalnews.com/articles/5287667.html
http://www.freeusenetnewsgroup.com/lichen-evolved-on-2-tracks-like-marsupials-and-mammals.html
http://foronatura.mforos.com/1926434/10384172-2-sp-de-liquen-transformadas-en-una-sola-a-efectos-practicos/ [In Spanish]
http://newswithscience.blogspot.com/2011/05/lichen-evolved-on-2-tracks-like.html
http://i-science77.blogspot.com/2011/05/lichens-two-track-evolution.html
http://www.sciguru.com/newsitem/8367/lichen-evolved-on-two-tracks-like-marsupials-and-mammals
http://www.sciencenewsworld.com/science-articles/lichen-evolved-on-2-tracks-like-marsupials-and-mammals.html
- Brendan
-------------------------------
CITATIONS:
Hodkinson, B. P., and J. C. Lendemer. 2011. Molecular analyses reveal semi-cryptic species in Xanthoparmelia tasmanica. Bibliotheca Lichenologica 106: 115-126.
Download publication (PDF file)
Download nucleotide alignment (NEXUS file)
Hodkinson, B. P., and J. C. Lendemer. 2010. How do you solve a problem like Xanthoparmelia? Molecular analyses reveal semi-cryptic species in an Australasian-American 'disjunct' taxon. In: Botany 2010. Botanical Society of America, St. Louis, Missouri, abs. 355.
View abstract (website)
View poster
-------------------------------
UPDATE:
I even found a site that tries to use my article to disprove evolution!
http://www.evolutionnews.org/2011/05/convergent_genetic_evolution_i046651.html
I consider it to be a badge of honor. As an evolutionary biologist, when the intelligent design people start paying attention to your work, I think that it's an indication that you've "arrived"!
The evolutionary pattern seen here is strikingly similar to what is found in placental mammals and marsupial mammals, with two larger clades of organisms (one of which is almost exclusively Australasian) in which certain pairs of species have converged on similar morphologies. The pair of Xanthoparmelia tasmanica and Xanthoparmelia hypofusca (the new name of the other species, which we sampled in North America) is interesting because it is the first known example of complete convergence in this group, where no distinguishing morphological or chemical characters could be identified for two species found to be in these two major clades. However, we use the term 'semi-cryptic' (as opposed to 'cryptic') to describe the pair of species, since geography would seem to indicate which species is represented by any given sample.
A press release came out from Duke for this story. The following list represents a handful of websites that have covered the story:
http://today.duke.edu/2011/05/lichen
http://dukemagazine.duke.edu/article/symbiotic-association
http://www.evoscience.com/2423/lichen-could-be-a-fungal-equivalent-at-least-evolutionarily/
http://www.sciencedaily.com/releases/2011/05/110502110622.htm
http://www.biologynews.net/archives/2011/05/02/lichen_evolved_on_2_tracks_like_marsupials_and_mammals.html
http://archaeologynewsnetwork.blogspot.com/2011/05/lichen-evolved-on-2-tracks-like.html
http://anpron.eu/?p=2023
http://scienceblog.com/44922/lichen-evolved-on-2-tracks-like-marsupials-and-mammals/
http://www.geneticarchaeology.com/research/Lichen_evolved_on_2_tracks_like_marsupials_and_mammals.asp
http://www.australasianscience.com.au/news/may-2011/lichen-evolved-two-tracks-marsupials-and-mammals.html
http://feelsynapsis.com/pg/blog/read/64210/the-lichens-are-a-good-example-of-convergent-evolution
http://www.noodls.com/viewNoodl/9870752/duke-university/lichen-evolved-on-two-tracks-like-mammals-and-marsupials
http://7thspace.com/headlines/381043/lichen_that_seem_identical_in_all_outward_appearances_are_in_fact_two_different_species.html
http://esciencenews.com/articles/2011/05/02/lichen.evolved.2.tracks.marsupials.and.mammals
http://pda.physorg.com/news/2011-05-lichen-evolved-tracks-marsupials-mammals.html
http://www.uux.cn/viewnews-27222.html
http://www.cnfossil.com/?action-viewnews-itemid-205
http://tieba.baidu.com/f?kz=1070028772
http://www.nigpas.cas.cn/kxcb/kpwz/201105/t20110506_3128772.html
http://www.uua.cn/news/show-11534-1.html
http://www.sciencetechmag.com/lichen-evolved-on-two-tracks-like-marsupials-and-mammals.html
http://www.ebionews.com/news-center/general-research/evolution/37344-lichen-evolved-on-two-tracks-like-marsupials-and-mammals.html
http://www.astrobio.net/pressrelease/3948/lichens-two-track-evolution
http://www.eurekalert.org/pub_releases/2011-05/du-leo050211.php
http://forum.grasscity.com/science-nature/806894-evolution-action-convergent-species.html
http://www.labspaces.net/110538/Lichen_evolved_on___tracks__like_marsupials_and_mammals
http://pda.physorg.com/news/2011-05-lichen-evolved-tracks-marsupials-mammals.html
http://www.verticalnews.com/premium_newsletters/Journal-of-Technology-and-Science/2011-05-22/213JTS.html
http://www.solociencia.com/biologia/11061009.htm [In Spanish]
http://www.firstscience.com/home/news/breaking-news-all-topics/lichen-evolved-on-2-tracks-like-marsupials-and-mammals_104827.html
http://www.irlab.org/now/GetSummary.do?id=3590
http://www.bioquicknews.com/node/494
http://i.bioknow.cn/portal/root/rsp/zx_nr.jsp?id=64585741
http://www.noticias21.com/node/3503 [In Spanish]
http://riktningnews.bio-medicine.org/biology-news-1/Lichen-evolved-on-2-tracks--like-marsupials-and-mammals-19054-1/
http://www.sciencenewsline.com/biology/2011050313000026.html?continue=y
http://www.sciencecodex.com/lichen_evolved_on_2_tracks_like_marsupials_and_mammals
http://www.onenewspage.com/news/Science/20110503/21924744/Lichen-Evolved-On-Tracks-Like-Marsupials-And.htm
http://www.astrobio.net/pdffiles/news_3948.pdf
http://chemical-and-chemistry.verticalnews.com/articles/5287667.html
http://www.freeusenetnewsgroup.com/lichen-evolved-on-2-tracks-like-marsupials-and-mammals.html
http://foronatura.mforos.com/1926434/10384172-2-sp-de-liquen-transformadas-en-una-sola-a-efectos-practicos/ [In Spanish]
http://newswithscience.blogspot.com/2011/05/lichen-evolved-on-2-tracks-like.html
http://i-science77.blogspot.com/2011/05/lichens-two-track-evolution.html
http://www.sciguru.com/newsitem/8367/lichen-evolved-on-two-tracks-like-marsupials-and-mammals
http://www.sciencenewsworld.com/science-articles/lichen-evolved-on-2-tracks-like-marsupials-and-mammals.html
- Brendan
-------------------------------
CITATIONS:
Hodkinson, B. P., and J. C. Lendemer. 2011. Molecular analyses reveal semi-cryptic species in Xanthoparmelia tasmanica. Bibliotheca Lichenologica 106: 115-126.
Download publication (PDF file)
Download nucleotide alignment (NEXUS file)
Hodkinson, B. P., and J. C. Lendemer. 2010. How do you solve a problem like Xanthoparmelia? Molecular analyses reveal semi-cryptic species in an Australasian-American 'disjunct' taxon. In: Botany 2010. Botanical Society of America, St. Louis, Missouri, abs. 355.
View abstract (website)
View poster
-------------------------------
UPDATE:
I even found a site that tries to use my article to disprove evolution!
http://www.evolutionnews.org/2011/05/convergent_genetic_evolution_i046651.html
I consider it to be a badge of honor. As an evolutionary biologist, when the intelligent design people start paying attention to your work, I think that it's an indication that you've "arrived"!
Wednesday, April 13, 2011
WOODSmont #2 - Continuing Outreach
I recently set up an exhibit all about lichens at my second WOODSmont Childrens' Festival. Some readers may recall my post about last year's event. Once again, it was a great success; it was both a lot of fun and a great chance to expose people to lichens at a young age!
The festival was sponsored by Duke University's Wilderness Outdoor Opportunities for Durham Students ('WOODS'). My table included 'touch and feel' lichens, a dissecting scope (so that children could see the lichens up close), and samples of local lichens for children to take home. I also had opportunities to interact with people of all ages from the local community and a number of K-12 educators.
The festival was sponsored by Duke University's Wilderness Outdoor Opportunities for Durham Students ('WOODS'). My table included 'touch and feel' lichens, a dissecting scope (so that children could see the lichens up close), and samples of local lichens for children to take home. I also had opportunities to interact with people of all ages from the local community and a number of K-12 educators.
There were many lichens to put under the scope.
I was sure to accommodate even the smallest of lichen observers.
The whole family can learn to love lichens!
A lesson on minute graphids.
Saturday, April 2, 2011
Phylogenetic tree editing: Reinserting removed identical sequences
In phylogenetic analyses, a large number of identical sequences can sometimes prove to be problematic. This post outlines a protocol for creating and running a customized Unix shell script that reinserts identical sequences into a phylogenetic tree file (NEWICK or NEXUS format), for situations in which identical sequences were removed pre-analysis.
Identical sequences may have been removed using the Mothur 'unique.seqs' function (in which case a '.names' file would have been generated, storing the information about which sequences were removed) or RAxML (which generates a '.reduced.phy' file for phylogenetic analysis and a log file that contains a list of the removed sequences and their remaining representatives in a format that can be easily extracted using Unix or Microsoft Excel). The protocol described here relies on using a '.names' file. If sequences were not removed using Mothur, the '.names' file can be manually generated (here are notes on the basic format: http://www.mothur.org/wiki/Names_file ), or the original sequence file can be processed using the Mothur 'unique.seqs' function.
Identical sequences may have been removed using the Mothur 'unique.seqs' function (in which case a '.names' file would have been generated, storing the information about which sequences were removed) or RAxML (which generates a '.reduced.phy' file for phylogenetic analysis and a log file that contains a list of the removed sequences and their remaining representatives in a format that can be easily extracted using Unix or Microsoft Excel). The protocol described here relies on using a '.names' file. If sequences were not removed using Mothur, the '.names' file can be manually generated (here are notes on the basic format: http://www.mothur.org/wiki/Names_file ), or the original sequence file can be processed using the Mothur 'unique.seqs' function.
This script will need to be built from the ground up as a customized Unix shell script for your sequence set. This can be assembled easily in Microsoft Excel or one of its clones:
Column A: 'sed -i s/' all the way down the column
Column B: sequence IDs for representative sequences (Column 1 of the '.names' file)
Column C: backslashes all the way down the column
Column D: lists of sequences represented by each representative sequence (including the representative itself) separated by commas; each line must correlate with the Column B identifiers (Column D corresponds to Column 2 of the Mothur '.names' file)
Column E: '/g file_name.tre' all the way down the column
Column B: sequence IDs for representative sequences (Column 1 of the '.names' file)
Column C: backslashes all the way down the column
Column D: lists of sequences represented by each representative sequence (including the representative itself) separated by commas; each line must correlate with the Column B identifiers (Column D corresponds to Column 2 of the Mothur '.names' file)
Column E: '/g file_name.tre' all the way down the column
After this is put together, save it as tab-delimited text, open it with an advanced text editor (one that can perform a search and replace on tabs, e.g., TextWrangler or TextPad), remove all tabs (search for tabs and replace them with nothing), and add the first few lines manually to make it a working script. [Note: If one sequence name anywhere in the tree file or '.names' file is nested within another (e.g., 'bacterium' and 'bacterium2'), a colon can be added immediately after the name of the representative sequence with the shorter name, as long as a colon is added after the list of sequences being represented by that sequence.] The script can now be run on the original tree file and it will transform it into a tree file containing all of the sequences in the original sequence set (before removing identical sequences).
Here's an example:
Here's an example:
#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#$ -o search_replace.log -j y
sed -i s/5005c2/5005c2,HL06C03c12/g RAxML_bipartitions.Rhizo_RAxML_topo_BP_50plus.tre
sed -i s/5005c4/5005c4,CL08C02c09,uncultured_bacterium_FD01A08,uncultured_bacterium_FD04E06/g RAxML_bipartitions.Rhizo_RAxML_topo_BP_50plus.tre
sed -i s/5015c31/5015c31,EL02B02c77,EL02C01c63,EL02C02c85,EL02C03c68,EL04C01c65,EL04C01c68,EL04C01c71,EL04C01c72,EL05B03c02,EL06C03c65,EL06C03f17,EL08B01c10,EL08B01c13,EL08B03c17,EL08B03c19,EL09A01c65,EL09A01c67,EL09A01c68,EL09A01c70,EL09A03c19,EL09A03c20,EL09B02c36,EL09B02c39,EL10B01c41,HL10A02c32,NL07B01c12,NL08C03c25,NL08C03f89,NL08C03f90,NL08C03f93/g RAxML_bipartitions.Rhizo_RAxML_topo_BP_50plus.tre
sed -i s/5027c58/5027c58,EL08B01c09/g RAxML_bipartitions.Rhizo_RAxML_topo_BP_50plus.tre
sed -i s/uncultured_bacterium:/uncultured_bacterium,HL08B03c26:/g RAxML_bipartitions.Rhizo_RAxML_topo_BP_50plus.tre
sed -i s/uncultured_bacterium_5C231311/uncultured_bacterium_5C231311,GQ109020/g RAxML_bipartitions.Rhizo_RAxML_topo_BP_50plus.tre
sed -i s/uncultured_bacterium_FD02D06/uncultured_bacterium_FD02D06,EL02C01c61,EL02C02c84,EL02C03c67,EL08C01c23,EL08C03c09,EL10C03c15,NL01B03c63,NL07B01c10,NL07B01d89,NL07B03d84,NL10A02c32/g RAxML_bipartitions.Rhizo_RAxML_topo_BP_50plus.tre
sed -i s/uncultured_bacterium_nbw397h09c1/uncultured_bacterium_nbw397h09c1,HL05A03c20/g RAxML_bipartitions.Rhizo_RAxML_topo_BP_50plus.tre
sed -i s/uncultured_bacterium_Sed3/uncultured_bacterium_Sed3,EF064161/g RAxML_bipartitions.Rhizo_RAxML_topo_BP_50plus.tre
This information and many more bioinformatics tricks, tips, and scripts can be found in my doctoral dissertation (Hodkinson 2011), which will be coming out soon!
- Brendan
Update: These instructions are now published as part of a paper in Environmental Microbiology (Hodkinson et al. 2012) and the data/analysis/instruction files are available from the Dryad data repository (Hodkinson et al. 2011).
----------------------------------------------
References
The above instructions are published in the following sources:
Hodkinson, B. P. 2011. A phylogenetic, ecological, and functional characterization of non-photoautotrophic bacteria in the lichen microbiome. Doctoral Dissertation, Duke University, Durham, NC.
Download Dissertation (PDF file)
Hodkinson, B. P., N. R. Gottel, C. W. Schadt, and F. Lutzoni. 2011. Data from: Photoautotrophic symbiont and geography are major factors affecting highly structured and diverse bacterial communities in the lichen microbiome. Dryad Digital Repository doi:10.5061/dryad.t99b1.
Hodkinson, B. P., N. R. Gottel, C. W. Schadt, and F. Lutzoni. In press. Photoautotrophic symbiont and geography are major factors affecting highly structured and diverse bacterial communities in the lichen microbiome. Environmental Microbiology 14(1): 147-161. [doi:10.1111/j.1462-2920.2011.02560.x]
Download publication (PDF file)
Download supplementary phylogeny (PDF file)
View data and analysis file web-portal (website)
Download data and analysis file archive (ZIP file)
----------------------------------------------
This work was funded in part by NSF DEB-1011504.
- Brendan
Update: These instructions are now published as part of a paper in Environmental Microbiology (Hodkinson et al. 2012) and the data/analysis/instruction files are available from the Dryad data repository (Hodkinson et al. 2011).
----------------------------------------------
References
The above instructions are published in the following sources:
Hodkinson, B. P. 2011. A phylogenetic, ecological, and functional characterization of non-photoautotrophic bacteria in the lichen microbiome. Doctoral Dissertation, Duke University, Durham, NC.
Download Dissertation (PDF file)
Hodkinson, B. P., N. R. Gottel, C. W. Schadt, and F. Lutzoni. 2011. Data from: Photoautotrophic symbiont and geography are major factors affecting highly structured and diverse bacterial communities in the lichen microbiome. Dryad Digital Repository doi:10.5061/dryad.t99b1.
Hodkinson, B. P., N. R. Gottel, C. W. Schadt, and F. Lutzoni. In press. Photoautotrophic symbiont and geography are major factors affecting highly structured and diverse bacterial communities in the lichen microbiome. Environmental Microbiology 14(1): 147-161. [doi:10.1111/j.1462-2920.2011.02560.x]
Download publication (PDF file)
Download supplementary phylogeny (PDF file)
View data and analysis file web-portal (website)
Download data and analysis file archive (ZIP file)
----------------------------------------------
This work was funded in part by NSF DEB-1011504.
Subscribe to:
Posts (Atom)