Next week I will be running a short Molecular Phylogenetics Workshop at Roan Mountain State Park in Tennessee (June 22, 10:30-2:00). The workshop coincides with the meeting of the American Bryological and Lichenological Society, but I will be presenting general principles of molecular evolution and phylogenetic inference that are applicable to any set of organisms.
Here is the abstract:
"Cryptogams are notorious for their paucity of morphological characters when compared with higher plants and animals. As a result, an understanding of molecular data and what they can reveal in terms of evolution is perhaps more crucial in these organisms than in many others. Workshop participants will explore principles of molecular phylogenetics and learn basic protocols for running phylogenetic analyses. The main objectives will be (1) to promote an understanding of how events in the course of molecular sequence evolution affect phylogenetic inference, (2) to explore the advantages and disadvantages of different phylogenetic methods, and (3) to facilitate sound research into the phylogenetic history of life. The workshop will include both lecture and discussion. Participants are invited to bring their own data sets for more detailed evaluation at the conclusion of the workshop."
For those scheduled to attend, I look forward to seeing you there! For those not attending, I hope to see you at a future workshop!
- Brendan
----------------------------------------------
This work was was made possible in part by NSF (DEB-1011504) and the American Bryological and Lichenological Society.
Saturday, June 18, 2011
Friday, June 17, 2011
Writing a Phycas Script
For a while I have been wary of phylogenetic results supported only by Bayesian analyses, because of the so-called 'star-tree paradox' that haunts MrBayes and even some other programs like it. As I have mentioned previously, one of the best features of the Bayesian phylogenetic program Phycas is that it gives one the opportunity to allow polytomies in the trees sampled as part of the posterior (which can often deflate the inflated posterior probability values seen with programs like MrBayes). The specific command for this is:
"mcmc.allow_polytomies = True"
To run Phycas, it is best to write a script to go with a standard NEXUS-formatted sequence alignment. There is some basic information on how to install and run Phycas in my previous post:
http://squamules.blogspot.com/2011/06/installing-and-running-phycas.html
However, that post does not go into any of the details of scripting for Phycas. Recently, I ran a multigene analysis with mtSSU, ITS1, 5.8S, and ITS2 in different partitions, with a different evolutionary model for each. Here is what my Phycas script looked like:
from phycas import *
setMasterSeed(98765)
mcmc.data_source = 'Input_file_name.nex'
mcmc.out.log = 'Output_file_name.log'
mcmc.out.log.mode = REPLACE
mcmc.allow_polytomies = True
mcmc.polytomy_prior = False
mcmc.topo_prior_C = 1.0
mcmc.out.trees.prefix = 'Output_file_name'
mcmc.out.params.prefix = 'Output_file_name'
mcmc.ncycles = 50000
mcmc.sample_every = 10
# Set up the K80+I model for 5pt8S
model.type="hky"
model.state_freqs = [0.25, 0.25, 0.25, 0.25]
model.fix_freqs = True
model.kappa = 2.0
model.kappa_prior = BetaPrime(1.0, 1.0)
model.pinvar_model = True
# Save the K80+I model for 5pt8S
m3 = model()
# Set up the GTR+I model for mtSSU
model.type="gtr"
model.state_freqs = [0.3338, 0.1493, 0.1983, 0.3187]
model.fix_freqs = False
model.relrates = [1.4783, 5.8050, 3.3222, 0.6768, 7.6674, 1.0000]
model.pinvar_model = True
# Save the GTR+I model for mtSSU
m1 = model()
# Set up the HKY+G model for ITS1
model.type="hky"
model.state_freqs = [0.1487, 0.3566, 0.2704, 0.2244]
model.fix_freqs = False
model.kappa = 2.0
model.kappa_prior = BetaPrime(1.0, 1.0)
model.num_rates = 4
model.gamma_shape = 0.5
model.gamma_shape_prior = Exponential(1.0)
model.pinvar_model = False
# Save the HKY+G model for ITS1
m2 = model()
# Set up the HKY+G model for ITS2
model.state_freqs = [0.1419, 0.3069, 0.3199, 0.2314]
# Save the HKY+G model for ITS2
m4 = model()
# Define partition subsets
mtssu = subset(1, 1080)
its1 = subset(1081, 1607)
fivept8S = subset(1608, 1768)
its2 = subset(1769, 2041)
# Assign partition models to subsets
partition.addSubset(mtssu, m1, "mtSSU")
partition.addSubset(its1, m2, "ITS1")
partition.addSubset(fivept8S, m3, "5pt8S")
partition.addSubset(its2, m4, "ITS2")
partition()
# Start the run
mcmc()
# Summarize the posterior
sumt.trees = 'trees.t'
sumt.burnin = 500
sumt.tree_credible_prob = 1.0
sumt()
Although I have some notes within the script, please see the Phycas manual for instructions on what each of the individual commands does. Hopefully more people will be using Phycas (and allowing polytomies!) in the future!
- Brendan
Note: One question that I had about running Phycas was how to define exclusion sets; however, Phycas apparently can read the EXSET line of the ASSUMPTIONS block of the NEXUS file in the same way that Mesquite, MacClade, and PAUP* can.
"mcmc.allow_polytomies = True"
To run Phycas, it is best to write a script to go with a standard NEXUS-formatted sequence alignment. There is some basic information on how to install and run Phycas in my previous post:
http://squamules.blogspot.com/2011/06/installing-and-running-phycas.html
However, that post does not go into any of the details of scripting for Phycas. Recently, I ran a multigene analysis with mtSSU, ITS1, 5.8S, and ITS2 in different partitions, with a different evolutionary model for each. Here is what my Phycas script looked like:
from phycas import *
setMasterSeed(98765)
mcmc.data_source = 'Input_file_name.nex'
mcmc.out.log = 'Output_file_name.log'
mcmc.out.log.mode = REPLACE
mcmc.allow_polytomies = True
mcmc.polytomy_prior = False
mcmc.topo_prior_C = 1.0
mcmc.out.trees.prefix = 'Output_file_name'
mcmc.out.params.prefix = 'Output_file_name'
mcmc.ncycles = 50000
mcmc.sample_every = 10
# Set up the K80+I model for 5pt8S
model.type="hky"
model.state_freqs = [0.25, 0.25, 0.25, 0.25]
model.fix_freqs = True
model.kappa = 2.0
model.kappa_prior = BetaPrime(1.0, 1.0)
model.pinvar_model = True
# Save the K80+I model for 5pt8S
m3 = model()
# Set up the GTR+I model for mtSSU
model.type="gtr"
model.state_freqs = [0.3338, 0.1493, 0.1983, 0.3187]
model.fix_freqs = False
model.relrates = [1.4783, 5.8050, 3.3222, 0.6768, 7.6674, 1.0000]
model.pinvar_model = True
# Save the GTR+I model for mtSSU
m1 = model()
# Set up the HKY+G model for ITS1
model.type="hky"
model.state_freqs = [0.1487, 0.3566, 0.2704, 0.2244]
model.fix_freqs = False
model.kappa = 2.0
model.kappa_prior = BetaPrime(1.0, 1.0)
model.num_rates = 4
model.gamma_shape = 0.5
model.gamma_shape_prior = Exponential(1.0)
model.pinvar_model = False
# Save the HKY+G model for ITS1
m2 = model()
# Set up the HKY+G model for ITS2
model.state_freqs = [0.1419, 0.3069, 0.3199, 0.2314]
# Save the HKY+G model for ITS2
m4 = model()
# Define partition subsets
mtssu = subset(1, 1080)
its1 = subset(1081, 1607)
fivept8S = subset(1608, 1768)
its2 = subset(1769, 2041)
# Assign partition models to subsets
partition.addSubset(mtssu, m1, "mtSSU")
partition.addSubset(its1, m2, "ITS1")
partition.addSubset(fivept8S, m3, "5pt8S")
partition.addSubset(its2, m4, "ITS2")
partition()
# Start the run
mcmc()
# Summarize the posterior
sumt.trees = 'trees.t'
sumt.burnin = 500
sumt.tree_credible_prob = 1.0
sumt()
Although I have some notes within the script, please see the Phycas manual for instructions on what each of the individual commands does. Hopefully more people will be using Phycas (and allowing polytomies!) in the future!
- Brendan
Note: One question that I had about running Phycas was how to define exclusion sets; however, Phycas apparently can read the EXSET line of the ASSUMPTIONS block of the NEXUS file in the same way that Mesquite, MacClade, and PAUP* can.
Tuesday, June 14, 2011
Installing and Running Phycas
Phycas has recently earned a high spot on my short list of favorite computer programs for phylogenetics. Phycas is the amazing program that can run a Bayesian phylogenetic inference without being susceptible to the 'star-tree paradox' because it allows for the existence of polytomies in the sampled trees.
From an academic perspective, Phycas is actually a pretty easy program to run and install. Still, some additional notes on tricks and tips for running it were beneficial to one of my colleagues who was really having trouble getting it to go. Here were my instructions for installing Phycas on a Windows machine:
1) Install Python 2.7. [I use the Enthought Python Distribution, available here: http://www.enthought.com/products/epd.php. Everything is bundled together so components like SciPy, NumPy, etc., never need to be installed individually and the different versions of the components are all guaranteed to play well together.]
2) Follow the instructions here:
http://hydrodictyon.eeb.uconn. edu/projects/phycas/index.php/ Telling_Windows_where_to_find_ Python
to append Python27 (different from the versions they have listed there) to your PATH (I guess if your PATH is truly empty then you will just leave out the semi-colon; otherwise, keep whatever's already in your PATH in there, but just add ;C:\Python27 to the end of it it). [It might also be important to make sure that the PYTHON-STARTUP environmental variable says C:\Python27 (if you have that variable... mine was still set to 2.6, meaning that the wrong version of Python would likely open up by default), and that this is all being done for the system level and the user level... I was only doing it for the user level for a while and it got me mixed up.]
3) Do the 4-step Phycas installation as outlined on the "Windows XP/Windows Vista/Windows 7" section of this website (the manual itself is apparently wrong, so be careful here).
4) For your own particular analysis, put the NEXUS-formatted alignment file ('.nex') and the phycas script file ('.py') on the Desktop where you have the shortcut to the '.bat' file. [For more on writing a Phycas script, stay turned to this blog!]
5) Drag and drop the phycas script file (.py) onto 'Shortcut to phycas.bat'.
I'll have another blog post that goes more into the details of Phycas scripting, but I hope this post helps jump-start some of those eager to deflate their inflated posterior probability values!
- Brendan
From an academic perspective, Phycas is actually a pretty easy program to run and install. Still, some additional notes on tricks and tips for running it were beneficial to one of my colleagues who was really having trouble getting it to go. Here were my instructions for installing Phycas on a Windows machine:
1) Install Python 2.7. [I use the Enthought Python Distribution, available here: http://www.enthought.com/products/epd.php. Everything is bundled together so components like SciPy, NumPy, etc., never need to be installed individually and the different versions of the components are all guaranteed to play well together.]
2) Follow the instructions here:
http://hydrodictyon.eeb.uconn.
to append Python27 (different from the versions they have listed there) to your PATH (I guess if your PATH is truly empty then you will just leave out the semi-colon; otherwise, keep whatever's already in your PATH in there, but just add ;C:\Python27 to the end of it it). [It might also be important to make sure that the PYTHON-STARTUP environmental variable says C:\Python27 (if you have that variable... mine was still set to 2.6, meaning that the wrong version of Python would likely open up by default), and that this is all being done for the system level and the user level... I was only doing it for the user level for a while and it got me mixed up.]
3) Do the 4-step Phycas installation as outlined on the "Windows XP/Windows Vista/Windows 7" section of this website (the manual itself is apparently wrong, so be careful here).
4) For your own particular analysis, put the NEXUS-formatted alignment file ('.nex') and the phycas script file ('.py') on the Desktop where you have the shortcut to the '.bat' file. [For more on writing a Phycas script, stay turned to this blog!]
5) Drag and drop the phycas script file (.py) onto 'Shortcut to phycas.bat'.
I'll have another blog post that goes more into the details of Phycas scripting, but I hope this post helps jump-start some of those eager to deflate their inflated posterior probability values!
- Brendan
Monday, June 13, 2011
A New Home
I have relocated and am finally settled in New York! I will be doing my post-doctoral work at the New York Botanical Garden (NYBG) using advanced bioinformatics tools to study lichen ecology and evolution with Richard C. Harris and James C. Lendemer. Here is my new address:
International Plant Science Center
The New York Botanical Garden
2900 Southern Blvd.
Bronx, NY 10458-5126
NYBG has an amazing research program that is not typical for a botanical garden. The departments within the International Plant Science Center include the following: Institute of Systematic Botany, Cullman Program for Molecular Systematics, Plant Research Laboratory, NY Plant Genomics Consortium, Steere Herbarium, Graduate Studies, Mertz Library, and NYBG Press (plus more). My research will likely connect in some way with all of these departments, which is why NYBG is a perfect environment for my post-doctoral research.
Those who have been following my research closely might say "obviously you study diverse organisms from across the tree of life, but since when do you study plants?" Since the concept of "plants" once included fungi, the "plant" research program at NYBG (which began taking shape over 100 years ago) provides amazing resources for the study of fungal biology as well. Of course, I will specifically be focusing my energy on the lichen-forming fungi.
Please stay tuned for more on my research as it unfolds in New York City!
- Brendan
International Plant Science Center
The New York Botanical Garden
2900 Southern Blvd.
Bronx, NY 10458-5126
NYBG has an amazing research program that is not typical for a botanical garden. The departments within the International Plant Science Center include the following: Institute of Systematic Botany, Cullman Program for Molecular Systematics, Plant Research Laboratory, NY Plant Genomics Consortium, Steere Herbarium, Graduate Studies, Mertz Library, and NYBG Press (plus more). My research will likely connect in some way with all of these departments, which is why NYBG is a perfect environment for my post-doctoral research.
Those who have been following my research closely might say "obviously you study diverse organisms from across the tree of life, but since when do you study plants?" Since the concept of "plants" once included fungi, the "plant" research program at NYBG (which began taking shape over 100 years ago) provides amazing resources for the study of fungal biology as well. Of course, I will specifically be focusing my energy on the lichen-forming fungi.
Please stay tuned for more on my research as it unfolds in New York City!
- Brendan
Monday, May 23, 2011
Graduation/Dissertation
Upon my graduation, I thought I'd give a little update. The thesis defense last month was successful and the final draft of my dissertation has been submitted! My thesis was on the communities of non-photoautotrophic bacteria associated with lichens. This research broke a lot of new ground in terms of using high-throughput pyrosequencing and cutting-edge bioinformatics to examine the phylogenetic, ecological, and functional complexity of the lichen microbiome. My hope is that my dissertation will contribute to science through both the data that I have generated and the set of tools that I have developed. I have talked a little about some of the bioinformatics tools that I have developed in previous posts here, but I will continue to post additional elements of my thesis, especially as they are published in peer-reviewed journals.
The graduation itself gave me a final chance to reflect on the great opportunities that have been made available to me at Duke University. Here you can see me chillin' out after the hooding ceremony:
This week I'm in the process of moving up to NY to start my postdoctoral research on the systematics of lichen-forming fungi at the New York Botanical Garden. I will be working with Dr. Richard C. Harris and James Lendemer, focusing on long-standing problems in Eastern North American lichen taxonomy, while testing a number of specific ecological and biogeographical hypotheses. Addressing many of these issues will require cutting-edge bioinformatics tools that I have developed or am in the process of developing with a network of collaborators in diverse fields.
- Brendan
The graduation itself gave me a final chance to reflect on the great opportunities that have been made available to me at Duke University. Here you can see me chillin' out after the hooding ceremony:
This week I'm in the process of moving up to NY to start my postdoctoral research on the systematics of lichen-forming fungi at the New York Botanical Garden. I will be working with Dr. Richard C. Harris and James Lendemer, focusing on long-standing problems in Eastern North American lichen taxonomy, while testing a number of specific ecological and biogeographical hypotheses. Addressing many of these issues will require cutting-edge bioinformatics tools that I have developed or am in the process of developing with a network of collaborators in diverse fields.
- Brendan
Monday, May 16, 2011
Investing in Science
Several months back I worked with a group of other student members of the Botanical Society of America (BSA) to draft an open letter to lawmakers to express our hope that policymakers in Washington, DC, would sustain a national commitment to invest in our nation's scientific research, development, and education. This has now evolved into a petition. Please read the email below from the BSA student representatives to find out more about this effort:
"
Attention Students: Ask Lawmakers to Support Science Education and Research
First, we'd like to thank all of you who have taken action and responded to this call. Please take the extra step and ask your friends to consider doing so as well.
The end of the academic year is quickly approaching. Before you venture off for the summer research season or for a short break from classes, you have one more assignment. I am writing to ask that you join with other science students from across the country to sign an online petition to lawmakers. This statement reminds our elected leaders that scientific research and education are keys to our future and asks them to continue to make important investments in the scientific programs that will support your education and preparation for future careers in research, teaching, or the myriad fields that grow and benefit from scientific research.
We already have more than 2,750 signatures, but we would like to have more than 5,000 by the end of May. So, if you have not already signed this online petition, please do so today at http://www.aibs.org/public- policy/science_students_ letter.html. You may also sign the petition and encourage your friends to sign via Facebook -- http://www.facebook.com/pages/ Students-Sign-the-Open-Letter- to-Policymakers-About- Investments-in-Science/ 183684855001704.
Thank you for your time and support!
Sincerely,
Botanical Society of America Student Representatives, Marian Chau (University of Hawai`i at Manoa) and Rachel Meyer (New York Botanical Garden)
"
"
Attention Students: Ask Lawmakers to Support Science Education and Research
First, we'd like to thank all of you who have taken action and responded to this call. Please take the extra step and ask your friends to consider doing so as well.
The end of the academic year is quickly approaching. Before you venture off for the summer research season or for a short break from classes, you have one more assignment. I am writing to ask that you join with other science students from across the country to sign an online petition to lawmakers. This statement reminds our elected leaders that scientific research and education are keys to our future and asks them to continue to make important investments in the scientific programs that will support your education and preparation for future careers in research, teaching, or the myriad fields that grow and benefit from scientific research.
We already have more than 2,750 signatures, but we would like to have more than 5,000 by the end of May. So, if you have not already signed this online petition, please do so today at http://www.aibs.org/public-
Thank you for your time and support!
Sincerely,
Botanical Society of America Student Representatives, Marian Chau (University of Hawai`i at Manoa) and Rachel Meyer (New York Botanical Garden)
"
Friday, May 13, 2011
Punctelia eganii Hodkinson & Lendemer, sp. nov., a rare chemical oddity
Just today I had an article published in the journal Opuscula Philolichenum in which I, along with James Lendemer, describe a new species in the genus Punctelia that has only been found once and contains a chemical compound (lichexanthone) that has otherwise not been seen in this genus. The species was collected along the Alabama River in historic Monroe County, Alabama, near Monroeville (childhood home of Harper Lee and Truman Capote; "the literary capital of Alabama").
Under an ultraviolet light, it makes pin-pricks of bright light across the surface (due to the localized presence of lichexanthone). The species is named Punctelia eganii, after Dr. Bob Egan of the University of Nebraska at Omaha, who first collected the species and brought it to our attention. The paper ends with a discussion of 'chemotaxonomy' and the evolving views on the role of secondary chemistry in lichen taxonomy.
- Brendan
--------------------------------------------------------------
Reference:
Hodkinson, B. P., and J. C. Lendemer. 2011. Punctelia eganii, a new species in the P. rudecta group with a novel secondary compound for the genus. Opuscula Philolichenum 9: 35-38.
Download publication (PDF file)
Under an ultraviolet light, it makes pin-pricks of bright light across the surface (due to the localized presence of lichexanthone). The species is named Punctelia eganii, after Dr. Bob Egan of the University of Nebraska at Omaha, who first collected the species and brought it to our attention. The paper ends with a discussion of 'chemotaxonomy' and the evolving views on the role of secondary chemistry in lichen taxonomy.
- Brendan
--------------------------------------------------------------
Reference:
Hodkinson, B. P., and J. C. Lendemer. 2011. Punctelia eganii, a new species in the P. rudecta group with a novel secondary compound for the genus. Opuscula Philolichenum 9: 35-38.
Download publication (PDF file)
Tuesday, May 10, 2011
PICS-Ord Tricks
Those who have ever used the R statistical package know that it can be a bit tricky, especially if you're first starting out.
The R-based PICS-Ord program (developed to recode ambiguously-aligned regions for phylogenetic analyses; see here and here and here for more information) was written to be as simple as possible while remaining flexible and adjustable. A set of pretty comprehensive instructions was produced to facilitate analyses and provide recommendations for basic use (see 'manual.pdf' available here). The manual is where everyone should look first for help with PICS-Ord. However, those who are not experts in R and/or the command line may benefit from some additional information on how to implement PICS-Ord without really having to know much background information on the R statistical package. Therefore, the goal of this post is to detail one relatively simple way of implementing the PICS-Ord program on a PC.
Here are instructions for one way of running PICS-Ord on Microsoft Windows. Note that these instructions will only work with installations of the Windows version of R (try 2.12.0; the most recent version of R did not work at the time that this was last updated) [ http://cran.r-project.org/bin/ windows/base/old/2.12.0/ ] and the Ngila Windows executable [ http://scit.us/projects/ngila/ ] (for the latter, choose the option of putting Ngila in your PATH upon installation).
1) Place picsord.R (found in the picsord.zip archive) in the same folder as ngila.exe (probably C:\\"Program Files"\Ngila\bin\).
[Note: If Ngila is not in your path, you can go into the picsord.R file and change "ngila" (the one in quotes) to "ngila.exe" in the first line that is not preceded by a hash mark (#), or you can type out the full path (but this will not be necessary if you are following the rest of this procedure). Some users have manually edited picsord.R by deleting the first line (the line specifying the location of Rscript); this seems unnecessary but may be worthwhile on certain machines.]
2) Save/copy the input fasta file to a directory (e.g., your home directory, Desktop, or C:\).
3) In the Command Prompt window, use the 'cd' or 'chdir' command to navigate to the directory in which the input fasta file is stored (or simply put the input fasta file in the home directory so that navigation is not necessary), then type the following (this line can be modified based on the version of R being used or the specific location on the drive where Rscript and picsord.R are located):
C:\\"Program Files"\R\R-2.12.0\bin\Rscript.exe C:\\"Program Files"\Ngila\bin\picsord.R input.fas > output.phy
Multiple regions can be processed by running the command in step 3 separately for each region, or one can use the .bat file that comes as part of the PICS-Ord package (use of the .bat file is outlined in the manual). After this, the phylip-formatted PICS-Ord alignment portions can be pasted at the end of the original nucleotide alignment alongside the unambiguously-aligned sites. Please see the manual for further recommendations regarding implementation.
-Brendan
References:
Lücking, R., B. P. Hodkinson, A. Stamatakis, and R. A. Cartwright. 2011. PICS-Ord: Unlimited Coding of Ambiguous Regions by Pairwise Identity and Cost Scores Ordination. BMC Bioinformatics 12: 10.
Download publication (PDF file)
Download R-based PICS-Ord program (zipped program package)
View program wiki (website)
The R-based PICS-Ord program (developed to recode ambiguously-aligned regions for phylogenetic analyses; see here and here and here for more information) was written to be as simple as possible while remaining flexible and adjustable. A set of pretty comprehensive instructions was produced to facilitate analyses and provide recommendations for basic use (see 'manual.pdf' available here). The manual is where everyone should look first for help with PICS-Ord. However, those who are not experts in R and/or the command line may benefit from some additional information on how to implement PICS-Ord without really having to know much background information on the R statistical package. Therefore, the goal of this post is to detail one relatively simple way of implementing the PICS-Ord program on a PC.
Here are instructions for one way of running PICS-Ord on Microsoft Windows. Note that these instructions will only work with installations of the Windows version of R (try 2.12.0; the most recent version of R did not work at the time that this was last updated) [ http://cran.r-project.org/bin/
1) Place picsord.R (found in the picsord.zip archive) in the same folder as ngila.exe (probably C:\\"Program Files"\Ngila\bin\).
[Note: If Ngila is not in your path, you can go into the picsord.R file and change "ngila" (the one in quotes) to "ngila.exe" in the first line that is not preceded by a hash mark (#), or you can type out the full path (but this will not be necessary if you are following the rest of this procedure). Some users have manually edited picsord.R by deleting the first line (the line specifying the location of Rscript); this seems unnecessary but may be worthwhile on certain machines.]
2) Save/copy the input fasta file to a directory (e.g., your home directory, Desktop, or C:\).
3) In the Command Prompt window, use the 'cd' or 'chdir' command to navigate to the directory in which the input fasta file is stored (or simply put the input fasta file in the home directory so that navigation is not necessary), then type the following (this line can be modified based on the version of R being used or the specific location on the drive where Rscript and picsord.R are located):
C:\\"Program Files"\R\R-2.12.0\bin\Rscript.exe C:\\"Program Files"\Ngila\bin\picsord.R input.fas > output.phy
After this, the output phylip file should appear in the working directory.
Multiple regions can be processed by running the command in step 3 separately for each region, or one can use the .bat file that comes as part of the PICS-Ord package (use of the .bat file is outlined in the manual). After this, the phylip-formatted PICS-Ord alignment portions can be pasted at the end of the original nucleotide alignment alongside the unambiguously-aligned sites. Please see the manual for further recommendations regarding implementation.
-Brendan
References:
Lücking, R., B. P. Hodkinson, A. Stamatakis, and R. A. Cartwright. 2011. PICS-Ord: Unlimited Coding of Ambiguous Regions by Pairwise Identity and Cost Scores Ordination. BMC Bioinformatics 12: 10.
Download publication (PDF file)
Download R-based PICS-Ord program (zipped program package)
View program wiki (website)
Sunday, May 8, 2011
Semi-Cryptic Species
Last week the most recent volume of Bibliotheca Lichenologica (dedicated to Tom Nash) was released. There were 33 peer-reviewed contributions by 70 authors, and one of these contributions was a paper that I wrote with James Lendemer of the New York Botanical Garden. In short, the paper demonstrates with molecular data that the species Xanthoparmelia tasmanica (Parmeliaceae) contains at least two species that cannot be differentiated based on any known morphological or chemical characters. However, the two species belong to two larger clades within the genus Xanthoparmelia, one of which seems to be exclusively Australasian, and the other of which is distributed across the Earth's other continents.
The evolutionary pattern seen here is strikingly similar to what is found in placental mammals and marsupial mammals, with two larger clades of organisms (one of which is almost exclusively Australasian) in which certain pairs of species have converged on similar morphologies. The pair of Xanthoparmelia tasmanica and Xanthoparmelia hypofusca (the new name of the other species, which we sampled in North America) is interesting because it is the first known example of complete convergence in this group, where no distinguishing morphological or chemical characters could be identified for two species found to be in these two major clades. However, we use the term 'semi-cryptic' (as opposed to 'cryptic') to describe the pair of species, since geography would seem to indicate which species is represented by any given sample.
A press release came out from Duke for this story. The following list represents a handful of websites that have covered the story:
http://today.duke.edu/2011/05/lichen
http://dukemagazine.duke.edu/article/symbiotic-association
http://www.evoscience.com/2423/lichen-could-be-a-fungal-equivalent-at-least-evolutionarily/
http://www.sciencedaily.com/releases/2011/05/110502110622.htm
http://www.biologynews.net/archives/2011/05/02/lichen_evolved_on_2_tracks_like_marsupials_and_mammals.html
http://archaeologynewsnetwork.blogspot.com/2011/05/lichen-evolved-on-2-tracks-like.html
http://anpron.eu/?p=2023
http://scienceblog.com/44922/lichen-evolved-on-2-tracks-like-marsupials-and-mammals/
http://www.geneticarchaeology.com/research/Lichen_evolved_on_2_tracks_like_marsupials_and_mammals.asp
http://www.australasianscience.com.au/news/may-2011/lichen-evolved-two-tracks-marsupials-and-mammals.html
http://feelsynapsis.com/pg/blog/read/64210/the-lichens-are-a-good-example-of-convergent-evolution
http://www.noodls.com/viewNoodl/9870752/duke-university/lichen-evolved-on-two-tracks-like-mammals-and-marsupials
http://7thspace.com/headlines/381043/lichen_that_seem_identical_in_all_outward_appearances_are_in_fact_two_different_species.html
http://esciencenews.com/articles/2011/05/02/lichen.evolved.2.tracks.marsupials.and.mammals
http://pda.physorg.com/news/2011-05-lichen-evolved-tracks-marsupials-mammals.html
http://www.uux.cn/viewnews-27222.html
http://www.cnfossil.com/?action-viewnews-itemid-205
http://tieba.baidu.com/f?kz=1070028772
http://www.nigpas.cas.cn/kxcb/kpwz/201105/t20110506_3128772.html
http://www.uua.cn/news/show-11534-1.html
http://www.sciencetechmag.com/lichen-evolved-on-two-tracks-like-marsupials-and-mammals.html
http://www.ebionews.com/news-center/general-research/evolution/37344-lichen-evolved-on-two-tracks-like-marsupials-and-mammals.html
http://www.astrobio.net/pressrelease/3948/lichens-two-track-evolution
http://www.eurekalert.org/pub_releases/2011-05/du-leo050211.php
http://forum.grasscity.com/science-nature/806894-evolution-action-convergent-species.html
http://www.labspaces.net/110538/Lichen_evolved_on___tracks__like_marsupials_and_mammals
http://pda.physorg.com/news/2011-05-lichen-evolved-tracks-marsupials-mammals.html
http://www.verticalnews.com/premium_newsletters/Journal-of-Technology-and-Science/2011-05-22/213JTS.html
http://www.solociencia.com/biologia/11061009.htm [In Spanish]
http://www.firstscience.com/home/news/breaking-news-all-topics/lichen-evolved-on-2-tracks-like-marsupials-and-mammals_104827.html
http://www.irlab.org/now/GetSummary.do?id=3590
http://www.bioquicknews.com/node/494
http://i.bioknow.cn/portal/root/rsp/zx_nr.jsp?id=64585741
http://www.noticias21.com/node/3503 [In Spanish]
http://riktningnews.bio-medicine.org/biology-news-1/Lichen-evolved-on-2-tracks--like-marsupials-and-mammals-19054-1/
http://www.sciencenewsline.com/biology/2011050313000026.html?continue=y
http://www.sciencecodex.com/lichen_evolved_on_2_tracks_like_marsupials_and_mammals
http://www.onenewspage.com/news/Science/20110503/21924744/Lichen-Evolved-On-Tracks-Like-Marsupials-And.htm
http://www.astrobio.net/pdffiles/news_3948.pdf
http://chemical-and-chemistry.verticalnews.com/articles/5287667.html
http://www.freeusenetnewsgroup.com/lichen-evolved-on-2-tracks-like-marsupials-and-mammals.html
http://foronatura.mforos.com/1926434/10384172-2-sp-de-liquen-transformadas-en-una-sola-a-efectos-practicos/ [In Spanish]
http://newswithscience.blogspot.com/2011/05/lichen-evolved-on-2-tracks-like.html
http://i-science77.blogspot.com/2011/05/lichens-two-track-evolution.html
http://www.sciguru.com/newsitem/8367/lichen-evolved-on-two-tracks-like-marsupials-and-mammals
http://www.sciencenewsworld.com/science-articles/lichen-evolved-on-2-tracks-like-marsupials-and-mammals.html
- Brendan
-------------------------------
CITATIONS:
Hodkinson, B. P., and J. C. Lendemer. 2011. Molecular analyses reveal semi-cryptic species in Xanthoparmelia tasmanica. Bibliotheca Lichenologica 106: 115-126.
Download publication (PDF file)
Download nucleotide alignment (NEXUS file)
Hodkinson, B. P., and J. C. Lendemer. 2010. How do you solve a problem like Xanthoparmelia? Molecular analyses reveal semi-cryptic species in an Australasian-American 'disjunct' taxon. In: Botany 2010. Botanical Society of America, St. Louis, Missouri, abs. 355.
View abstract (website)
View poster
-------------------------------
UPDATE:
I even found a site that tries to use my article to disprove evolution!
http://www.evolutionnews.org/2011/05/convergent_genetic_evolution_i046651.html
I consider it to be a badge of honor. As an evolutionary biologist, when the intelligent design people start paying attention to your work, I think that it's an indication that you've "arrived"!
The evolutionary pattern seen here is strikingly similar to what is found in placental mammals and marsupial mammals, with two larger clades of organisms (one of which is almost exclusively Australasian) in which certain pairs of species have converged on similar morphologies. The pair of Xanthoparmelia tasmanica and Xanthoparmelia hypofusca (the new name of the other species, which we sampled in North America) is interesting because it is the first known example of complete convergence in this group, where no distinguishing morphological or chemical characters could be identified for two species found to be in these two major clades. However, we use the term 'semi-cryptic' (as opposed to 'cryptic') to describe the pair of species, since geography would seem to indicate which species is represented by any given sample.
A press release came out from Duke for this story. The following list represents a handful of websites that have covered the story:
http://today.duke.edu/2011/05/lichen
http://dukemagazine.duke.edu/article/symbiotic-association
http://www.evoscience.com/2423/lichen-could-be-a-fungal-equivalent-at-least-evolutionarily/
http://www.sciencedaily.com/releases/2011/05/110502110622.htm
http://www.biologynews.net/archives/2011/05/02/lichen_evolved_on_2_tracks_like_marsupials_and_mammals.html
http://archaeologynewsnetwork.blogspot.com/2011/05/lichen-evolved-on-2-tracks-like.html
http://anpron.eu/?p=2023
http://scienceblog.com/44922/lichen-evolved-on-2-tracks-like-marsupials-and-mammals/
http://www.geneticarchaeology.com/research/Lichen_evolved_on_2_tracks_like_marsupials_and_mammals.asp
http://www.australasianscience.com.au/news/may-2011/lichen-evolved-two-tracks-marsupials-and-mammals.html
http://feelsynapsis.com/pg/blog/read/64210/the-lichens-are-a-good-example-of-convergent-evolution
http://www.noodls.com/viewNoodl/9870752/duke-university/lichen-evolved-on-two-tracks-like-mammals-and-marsupials
http://7thspace.com/headlines/381043/lichen_that_seem_identical_in_all_outward_appearances_are_in_fact_two_different_species.html
http://esciencenews.com/articles/2011/05/02/lichen.evolved.2.tracks.marsupials.and.mammals
http://pda.physorg.com/news/2011-05-lichen-evolved-tracks-marsupials-mammals.html
http://www.uux.cn/viewnews-27222.html
http://www.cnfossil.com/?action-viewnews-itemid-205
http://tieba.baidu.com/f?kz=1070028772
http://www.nigpas.cas.cn/kxcb/kpwz/201105/t20110506_3128772.html
http://www.uua.cn/news/show-11534-1.html
http://www.sciencetechmag.com/lichen-evolved-on-two-tracks-like-marsupials-and-mammals.html
http://www.ebionews.com/news-center/general-research/evolution/37344-lichen-evolved-on-two-tracks-like-marsupials-and-mammals.html
http://www.astrobio.net/pressrelease/3948/lichens-two-track-evolution
http://www.eurekalert.org/pub_releases/2011-05/du-leo050211.php
http://forum.grasscity.com/science-nature/806894-evolution-action-convergent-species.html
http://www.labspaces.net/110538/Lichen_evolved_on___tracks__like_marsupials_and_mammals
http://pda.physorg.com/news/2011-05-lichen-evolved-tracks-marsupials-mammals.html
http://www.verticalnews.com/premium_newsletters/Journal-of-Technology-and-Science/2011-05-22/213JTS.html
http://www.solociencia.com/biologia/11061009.htm [In Spanish]
http://www.firstscience.com/home/news/breaking-news-all-topics/lichen-evolved-on-2-tracks-like-marsupials-and-mammals_104827.html
http://www.irlab.org/now/GetSummary.do?id=3590
http://www.bioquicknews.com/node/494
http://i.bioknow.cn/portal/root/rsp/zx_nr.jsp?id=64585741
http://www.noticias21.com/node/3503 [In Spanish]
http://riktningnews.bio-medicine.org/biology-news-1/Lichen-evolved-on-2-tracks--like-marsupials-and-mammals-19054-1/
http://www.sciencenewsline.com/biology/2011050313000026.html?continue=y
http://www.sciencecodex.com/lichen_evolved_on_2_tracks_like_marsupials_and_mammals
http://www.onenewspage.com/news/Science/20110503/21924744/Lichen-Evolved-On-Tracks-Like-Marsupials-And.htm
http://www.astrobio.net/pdffiles/news_3948.pdf
http://chemical-and-chemistry.verticalnews.com/articles/5287667.html
http://www.freeusenetnewsgroup.com/lichen-evolved-on-2-tracks-like-marsupials-and-mammals.html
http://foronatura.mforos.com/1926434/10384172-2-sp-de-liquen-transformadas-en-una-sola-a-efectos-practicos/ [In Spanish]
http://newswithscience.blogspot.com/2011/05/lichen-evolved-on-2-tracks-like.html
http://i-science77.blogspot.com/2011/05/lichens-two-track-evolution.html
http://www.sciguru.com/newsitem/8367/lichen-evolved-on-two-tracks-like-marsupials-and-mammals
http://www.sciencenewsworld.com/science-articles/lichen-evolved-on-2-tracks-like-marsupials-and-mammals.html
- Brendan
-------------------------------
CITATIONS:
Hodkinson, B. P., and J. C. Lendemer. 2011. Molecular analyses reveal semi-cryptic species in Xanthoparmelia tasmanica. Bibliotheca Lichenologica 106: 115-126.
Download publication (PDF file)
Download nucleotide alignment (NEXUS file)
Hodkinson, B. P., and J. C. Lendemer. 2010. How do you solve a problem like Xanthoparmelia? Molecular analyses reveal semi-cryptic species in an Australasian-American 'disjunct' taxon. In: Botany 2010. Botanical Society of America, St. Louis, Missouri, abs. 355.
View abstract (website)
View poster
-------------------------------
UPDATE:
I even found a site that tries to use my article to disprove evolution!
http://www.evolutionnews.org/2011/05/convergent_genetic_evolution_i046651.html
I consider it to be a badge of honor. As an evolutionary biologist, when the intelligent design people start paying attention to your work, I think that it's an indication that you've "arrived"!
Wednesday, April 13, 2011
WOODSmont #2 - Continuing Outreach
I recently set up an exhibit all about lichens at my second WOODSmont Childrens' Festival. Some readers may recall my post about last year's event. Once again, it was a great success; it was both a lot of fun and a great chance to expose people to lichens at a young age!
The festival was sponsored by Duke University's Wilderness Outdoor Opportunities for Durham Students ('WOODS'). My table included 'touch and feel' lichens, a dissecting scope (so that children could see the lichens up close), and samples of local lichens for children to take home. I also had opportunities to interact with people of all ages from the local community and a number of K-12 educators.
The festival was sponsored by Duke University's Wilderness Outdoor Opportunities for Durham Students ('WOODS'). My table included 'touch and feel' lichens, a dissecting scope (so that children could see the lichens up close), and samples of local lichens for children to take home. I also had opportunities to interact with people of all ages from the local community and a number of K-12 educators.
There were many lichens to put under the scope.
I was sure to accommodate even the smallest of lichen observers.
The whole family can learn to love lichens!
A lesson on minute graphids.
Saturday, April 2, 2011
Phylogenetic tree editing: Reinserting removed identical sequences
In phylogenetic analyses, a large number of identical sequences can sometimes prove to be problematic. This post outlines a protocol for creating and running a customized Unix shell script that reinserts identical sequences into a phylogenetic tree file (NEWICK or NEXUS format), for situations in which identical sequences were removed pre-analysis.
Identical sequences may have been removed using the Mothur 'unique.seqs' function (in which case a '.names' file would have been generated, storing the information about which sequences were removed) or RAxML (which generates a '.reduced.phy' file for phylogenetic analysis and a log file that contains a list of the removed sequences and their remaining representatives in a format that can be easily extracted using Unix or Microsoft Excel). The protocol described here relies on using a '.names' file. If sequences were not removed using Mothur, the '.names' file can be manually generated (here are notes on the basic format: http://www.mothur.org/wiki/Names_file ), or the original sequence file can be processed using the Mothur 'unique.seqs' function.
Identical sequences may have been removed using the Mothur 'unique.seqs' function (in which case a '.names' file would have been generated, storing the information about which sequences were removed) or RAxML (which generates a '.reduced.phy' file for phylogenetic analysis and a log file that contains a list of the removed sequences and their remaining representatives in a format that can be easily extracted using Unix or Microsoft Excel). The protocol described here relies on using a '.names' file. If sequences were not removed using Mothur, the '.names' file can be manually generated (here are notes on the basic format: http://www.mothur.org/wiki/Names_file ), or the original sequence file can be processed using the Mothur 'unique.seqs' function.
This script will need to be built from the ground up as a customized Unix shell script for your sequence set. This can be assembled easily in Microsoft Excel or one of its clones:
Column A: 'sed -i s/' all the way down the column
Column B: sequence IDs for representative sequences (Column 1 of the '.names' file)
Column C: backslashes all the way down the column
Column D: lists of sequences represented by each representative sequence (including the representative itself) separated by commas; each line must correlate with the Column B identifiers (Column D corresponds to Column 2 of the Mothur '.names' file)
Column E: '/g file_name.tre' all the way down the column
Column B: sequence IDs for representative sequences (Column 1 of the '.names' file)
Column C: backslashes all the way down the column
Column D: lists of sequences represented by each representative sequence (including the representative itself) separated by commas; each line must correlate with the Column B identifiers (Column D corresponds to Column 2 of the Mothur '.names' file)
Column E: '/g file_name.tre' all the way down the column
After this is put together, save it as tab-delimited text, open it with an advanced text editor (one that can perform a search and replace on tabs, e.g., TextWrangler or TextPad), remove all tabs (search for tabs and replace them with nothing), and add the first few lines manually to make it a working script. [Note: If one sequence name anywhere in the tree file or '.names' file is nested within another (e.g., 'bacterium' and 'bacterium2'), a colon can be added immediately after the name of the representative sequence with the shorter name, as long as a colon is added after the list of sequences being represented by that sequence.] The script can now be run on the original tree file and it will transform it into a tree file containing all of the sequences in the original sequence set (before removing identical sequences).
Here's an example:
Here's an example:
#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#$ -o search_replace.log -j y
sed -i s/5005c2/5005c2,HL06C03c12/g RAxML_bipartitions.Rhizo_RAxML_topo_BP_50plus.tre
sed -i s/5005c4/5005c4,CL08C02c09,uncultured_bacterium_FD01A08,uncultured_bacterium_FD04E06/g RAxML_bipartitions.Rhizo_RAxML_topo_BP_50plus.tre
sed -i s/5015c31/5015c31,EL02B02c77,EL02C01c63,EL02C02c85,EL02C03c68,EL04C01c65,EL04C01c68,EL04C01c71,EL04C01c72,EL05B03c02,EL06C03c65,EL06C03f17,EL08B01c10,EL08B01c13,EL08B03c17,EL08B03c19,EL09A01c65,EL09A01c67,EL09A01c68,EL09A01c70,EL09A03c19,EL09A03c20,EL09B02c36,EL09B02c39,EL10B01c41,HL10A02c32,NL07B01c12,NL08C03c25,NL08C03f89,NL08C03f90,NL08C03f93/g RAxML_bipartitions.Rhizo_RAxML_topo_BP_50plus.tre
sed -i s/5027c58/5027c58,EL08B01c09/g RAxML_bipartitions.Rhizo_RAxML_topo_BP_50plus.tre
sed -i s/uncultured_bacterium:/uncultured_bacterium,HL08B03c26:/g RAxML_bipartitions.Rhizo_RAxML_topo_BP_50plus.tre
sed -i s/uncultured_bacterium_5C231311/uncultured_bacterium_5C231311,GQ109020/g RAxML_bipartitions.Rhizo_RAxML_topo_BP_50plus.tre
sed -i s/uncultured_bacterium_FD02D06/uncultured_bacterium_FD02D06,EL02C01c61,EL02C02c84,EL02C03c67,EL08C01c23,EL08C03c09,EL10C03c15,NL01B03c63,NL07B01c10,NL07B01d89,NL07B03d84,NL10A02c32/g RAxML_bipartitions.Rhizo_RAxML_topo_BP_50plus.tre
sed -i s/uncultured_bacterium_nbw397h09c1/uncultured_bacterium_nbw397h09c1,HL05A03c20/g RAxML_bipartitions.Rhizo_RAxML_topo_BP_50plus.tre
sed -i s/uncultured_bacterium_Sed3/uncultured_bacterium_Sed3,EF064161/g RAxML_bipartitions.Rhizo_RAxML_topo_BP_50plus.tre
This information and many more bioinformatics tricks, tips, and scripts can be found in my doctoral dissertation (Hodkinson 2011), which will be coming out soon!
- Brendan
Update: These instructions are now published as part of a paper in Environmental Microbiology (Hodkinson et al. 2012) and the data/analysis/instruction files are available from the Dryad data repository (Hodkinson et al. 2011).
----------------------------------------------
References
The above instructions are published in the following sources:
Hodkinson, B. P. 2011. A phylogenetic, ecological, and functional characterization of non-photoautotrophic bacteria in the lichen microbiome. Doctoral Dissertation, Duke University, Durham, NC.
Download Dissertation (PDF file)
Hodkinson, B. P., N. R. Gottel, C. W. Schadt, and F. Lutzoni. 2011. Data from: Photoautotrophic symbiont and geography are major factors affecting highly structured and diverse bacterial communities in the lichen microbiome. Dryad Digital Repository doi:10.5061/dryad.t99b1.
Hodkinson, B. P., N. R. Gottel, C. W. Schadt, and F. Lutzoni. In press. Photoautotrophic symbiont and geography are major factors affecting highly structured and diverse bacterial communities in the lichen microbiome. Environmental Microbiology 14(1): 147-161. [doi:10.1111/j.1462-2920.2011.02560.x]
Download publication (PDF file)
Download supplementary phylogeny (PDF file)
View data and analysis file web-portal (website)
Download data and analysis file archive (ZIP file)
----------------------------------------------
This work was funded in part by NSF DEB-1011504.
- Brendan
Update: These instructions are now published as part of a paper in Environmental Microbiology (Hodkinson et al. 2012) and the data/analysis/instruction files are available from the Dryad data repository (Hodkinson et al. 2011).
----------------------------------------------
References
The above instructions are published in the following sources:
Hodkinson, B. P. 2011. A phylogenetic, ecological, and functional characterization of non-photoautotrophic bacteria in the lichen microbiome. Doctoral Dissertation, Duke University, Durham, NC.
Download Dissertation (PDF file)
Hodkinson, B. P., N. R. Gottel, C. W. Schadt, and F. Lutzoni. 2011. Data from: Photoautotrophic symbiont and geography are major factors affecting highly structured and diverse bacterial communities in the lichen microbiome. Dryad Digital Repository doi:10.5061/dryad.t99b1.
Hodkinson, B. P., N. R. Gottel, C. W. Schadt, and F. Lutzoni. In press. Photoautotrophic symbiont and geography are major factors affecting highly structured and diverse bacterial communities in the lichen microbiome. Environmental Microbiology 14(1): 147-161. [doi:10.1111/j.1462-2920.2011.02560.x]
Download publication (PDF file)
Download supplementary phylogeny (PDF file)
View data and analysis file web-portal (website)
Download data and analysis file archive (ZIP file)
----------------------------------------------
This work was funded in part by NSF DEB-1011504.
Sunday, March 20, 2011
Even More Florida Lichens
Two years ago the Tuckerman Foray and Workshop was held in southern Florida. 'The Tuckerman,' as it's often called, was begun by Dick Harris, and is best summarized in his own words:
"This series of workshops, aimed at amateurs interested in lichens, was initiated because of a concern over the decline in organismal lichenology in North America. The workshops were begun as a way to pass along knowledge about lichens as living organisms and their systematics to a group of amateurs who could then keep the knowledge alive until academia once again becomes interested. The first workshop in 1994 had only about 10 participants. However, as the reputation of the workshops grew, and the original participants acted as teachers themselves, the workshops grew to 30-40 participants. The 19th workshop was held in 2010 in Georgia. For several of the workshops, major keys or other publications have been written and distributed to the participants. For the first several workshops I was the only mentor, but through the years the workshops have attracted other professionals who welcome the opportunity to interact, and learn (!), from enthusiastic amateurs. There are no registration fees for the workshops and all professionals pay their own expenses." (excerpted from Harris's own NSF biographical sketch)
In 2001, Dick won the first Peter Raven Award (American Society of Plant Taxonomists), for public outreach to nonscientists, for organizing the Tuckerman Workshops. In an age when morphological studies of organisms are out of fashion, it is important to keep these traditions alive. Once the initial pizazz of molecular studies wears off (perhaps it already has?), we will really only be able to break significant new ground in lichenology if we have an integrated approach that utilizes both the old and the new. I am very much with E.S. Luttrell when he says that:
"Little progress can be made if new techniques are used only to replace, rather than complement, the old" (E.S. Luttrell, 1989).
So the Tuckerman Foray keeps some of the old traditions and techniques alive in lichenology. At the particular one referenced in this post, we had a great time exploring Fakahatchee Strand near the Everglades and collecting an amazing diversity of lichens! Over a dozen new species are described as part of the final field trip publication (Lücking et al. 2011) and nearly 100 species are newly reported for North America. Be sure to check out the amazing supplementary photos!
I'm currently working on finishing up my dissertation on lichen-associated bacteria so I might not write for a while... the defense is April 1st!
- Brendan
*UPDATE*
This article has been featured on sciencedaily.com and earthsky.org!
------------------------------------------------
Citation:
Lücking, R., F. Seavey, R. S. Common, S. Q. Beeching, O. Breuss, W. R. Buck, L. Crane, M. Hodges, B. P. Hodkinson, E. Lay, J. C. Lendemer, R. T. McMullin, J. A. Mercado-Díaz, M. P. Nelsen, E. Rivas Plata, W. Safranek, W. B. Sanders, H. P. Schaefer Jr. and J. Seavey. 2011. The lichens of Fakahatchee Strand Preserve State Park, Florida: Proceedings from the 18th Tuckerman Workshop. Bulletin of the Florida Museum of Natural History 49(4):127-186.
Publication: http://www.flmnh.ufl.edu/bulletin/vol49no4/vol49no4.pdf
Supplementary photos: http://www.flmnh.ufl.edu/bulletin/vol49no4supplmats.htm
"This series of workshops, aimed at amateurs interested in lichens, was initiated because of a concern over the decline in organismal lichenology in North America. The workshops were begun as a way to pass along knowledge about lichens as living organisms and their systematics to a group of amateurs who could then keep the knowledge alive until academia once again becomes interested. The first workshop in 1994 had only about 10 participants. However, as the reputation of the workshops grew, and the original participants acted as teachers themselves, the workshops grew to 30-40 participants. The 19th workshop was held in 2010 in Georgia. For several of the workshops, major keys or other publications have been written and distributed to the participants. For the first several workshops I was the only mentor, but through the years the workshops have attracted other professionals who welcome the opportunity to interact, and learn (!), from enthusiastic amateurs. There are no registration fees for the workshops and all professionals pay their own expenses." (excerpted from Harris's own NSF biographical sketch)
In 2001, Dick won the first Peter Raven Award (American Society of Plant Taxonomists), for public outreach to nonscientists, for organizing the Tuckerman Workshops. In an age when morphological studies of organisms are out of fashion, it is important to keep these traditions alive. Once the initial pizazz of molecular studies wears off (perhaps it already has?), we will really only be able to break significant new ground in lichenology if we have an integrated approach that utilizes both the old and the new. I am very much with E.S. Luttrell when he says that:
"Little progress can be made if new techniques are used only to replace, rather than complement, the old" (E.S. Luttrell, 1989).
So the Tuckerman Foray keeps some of the old traditions and techniques alive in lichenology. At the particular one referenced in this post, we had a great time exploring Fakahatchee Strand near the Everglades and collecting an amazing diversity of lichens! Over a dozen new species are described as part of the final field trip publication (Lücking et al. 2011) and nearly 100 species are newly reported for North America. Be sure to check out the amazing supplementary photos!
I'm currently working on finishing up my dissertation on lichen-associated bacteria so I might not write for a while... the defense is April 1st!
- Brendan
*UPDATE*
This article has been featured on sciencedaily.com and earthsky.org!
------------------------------------------------
Citation:
Lücking, R., F. Seavey, R. S. Common, S. Q. Beeching, O. Breuss, W. R. Buck, L. Crane, M. Hodges, B. P. Hodkinson, E. Lay, J. C. Lendemer, R. T. McMullin, J. A. Mercado-Díaz, M. P. Nelsen, E. Rivas Plata, W. Safranek, W. B. Sanders, H. P. Schaefer Jr. and J. Seavey. 2011. The lichens of Fakahatchee Strand Preserve State Park, Florida: Proceedings from the 18th Tuckerman Workshop. Bulletin of the Florida Museum of Natural History 49(4):127-186.
Publication: http://www.flmnh.ufl.edu/bulletin/vol49no4/vol49no4.pdf
Supplementary photos: http://www.flmnh.ufl.edu/bulletin/vol49no4supplmats.htm
Monday, March 14, 2011
Lichen Research in Science News
OK, so I guess this is old news, but maybe it'll be new to you! The Lutzoni Lab was featured a couple of years back in Science News. You can check out the article here:
http://www.sciencenewsdigital.org/sciencenews/20091107?pg=20#pg18
If for nothing else, have a look at the great spread of photos by Stephen Sharnoff!
- Brendan
http://www.sciencenewsdigital.org/sciencenews/20091107?pg=20#pg18
If for nothing else, have a look at the great spread of photos by Stephen Sharnoff!
- Brendan
Friday, March 4, 2011
Usnea Launch Project
Certain lichens need a little help. That's the philosophy behind the Usnea Launch Project out in the Pacific Northwest region of the United States. Their motto is "A Better Future for Usnea longissima." From what I can glean from the promotional film, the project entails taking fallen lichens, packing them up in little latex capsules with plenty of water, and launching them high into the trees using a giant 3-person catapult.
If you haven't gotten a chance to see it yet, have a look at the video. It will show you some of the lengths that lichenologists go to for conservation!
- Brendan
If you haven't gotten a chance to see it yet, have a look at the video. It will show you some of the lengths that lichenologists go to for conservation!
- Brendan
Friday, February 25, 2011
The Licheniad
The Licheniad
by Sean Q. Beeching
Canto I
A lichen, one may theorize,
When on the future casts his eyes,
His dear descendants he descries.
Eternal life, so it appears,
And a youth that lasts a thousand years,
The lichen spurns as cause for tears.
He dreams, or she, perhaps I’ll say,
Of numerous, happy, progeny,
With whom it would, perforce, parté.
The truth to which one must attest,
Is that our lives may not be blest,
And nuclear winter may end the fest.
In which event the lichen too,
Will perish along with me and you,
Unless a lifeboat he can constue.
But how to effect the goal sublime?
To reproduce two souls entwined,
Involves a course most labyrinthine.
Too sadly he must bid ado,
To sex, I’m sorry, but it’s true,
The ordinary method won’t work for two.
By sex what here we represent,
Does not demand adult consent,
Simple meiosis is all that’s meant.
These poor dears, for aught we know,
Lack the genders and hence forgo,
What here to say would not be apropos.
And had they genders, why they’d be four,
Two for the fungus, for the alga two more,
As state of affairs one would deplore.
It would lead to confusion,
To mishap and exclusion,
And not, in fine, to the hoped for diffusion.
One half, if the better, I’ll not say,
May be engendered in the usual way,
What results, I’m afraid, is a lichen manqué.
Oft tales are told of the sailing spores,
Which travel the heavens beyond our shores
And by means unknown the lichen restores.
Yet how the bionts reunite,
On what rare moonlit starry night;
It remains conjecture, that secret rite.
Yet how the lichen reunites,
By what fantastic arcane rites
Remains unknown, they’re unseen sights.
(Betwixt the preceding tercets twain
Neither of the other could the advantage gain,
I’ve left them both as a short refrain.)
Though it may happen, it seems farfetched,
Belief must needs be sorely stretched,
To credit the procedure I have thus far sketched.
Instead the lichen puts his trust,
In structures far removed from lust;
They seem to us no more than dust.
Within the thallus, all unseen,
Assembles the lichen his breeding machine,
As in an ant hill he were the queen.
At length the surface of its skin,
Reveals the tumult deep within,
It writhes and wrinkles and waxes thin.
Upon its face begin to vent,
Depending on the creature’s bent,
Features odd but of small extent.
Their smallness is indeed a test
Of our student’s eyesight. They protest,
And think our labels a cruel jest.
They’re each quite different, we declare,
Which drives them all to black despair,
As at the mocking plants they stare.
And truth to tell not even we,
Are always sure which one we see,
Isidia, soredia, which could it be?
Regardless which these fly aloft,
Both symbionts combined but soft,
They fall to earth, not seldom but oft.
Back on the earth, behold, it sprouts,
It lives, it breathes, convention flouts,
And grows to manhood, or thereabouts.
Thus, dear readers, have you now heard,
A tale as marvelous but far less blurred,
Than that of the logos, the living word.
Should that comparison seem extreme;
By it am I seen to blaspheme:
It’s nothing, I assure you, but blown off steam.
by Sean Q. Beeching
Canto I
A lichen, one may theorize,
When on the future casts his eyes,
His dear descendants he descries.
Eternal life, so it appears,
And a youth that lasts a thousand years,
The lichen spurns as cause for tears.
He dreams, or she, perhaps I’ll say,
Of numerous, happy, progeny,
With whom it would, perforce, parté.
The truth to which one must attest,
Is that our lives may not be blest,
And nuclear winter may end the fest.
In which event the lichen too,
Will perish along with me and you,
Unless a lifeboat he can constue.
But how to effect the goal sublime?
To reproduce two souls entwined,
Involves a course most labyrinthine.
Too sadly he must bid ado,
To sex, I’m sorry, but it’s true,
The ordinary method won’t work for two.
By sex what here we represent,
Does not demand adult consent,
Simple meiosis is all that’s meant.
These poor dears, for aught we know,
Lack the genders and hence forgo,
What here to say would not be apropos.
And had they genders, why they’d be four,
Two for the fungus, for the alga two more,
As state of affairs one would deplore.
It would lead to confusion,
To mishap and exclusion,
And not, in fine, to the hoped for diffusion.
One half, if the better, I’ll not say,
May be engendered in the usual way,
What results, I’m afraid, is a lichen manqué.
Oft tales are told of the sailing spores,
Which travel the heavens beyond our shores
And by means unknown the lichen restores.
Yet how the bionts reunite,
On what rare moonlit starry night;
It remains conjecture, that secret rite.
Yet how the lichen reunites,
By what fantastic arcane rites
Remains unknown, they’re unseen sights.
(Betwixt the preceding tercets twain
Neither of the other could the advantage gain,
I’ve left them both as a short refrain.)
Though it may happen, it seems farfetched,
Belief must needs be sorely stretched,
To credit the procedure I have thus far sketched.
Instead the lichen puts his trust,
In structures far removed from lust;
They seem to us no more than dust.
Within the thallus, all unseen,
Assembles the lichen his breeding machine,
As in an ant hill he were the queen.
At length the surface of its skin,
Reveals the tumult deep within,
It writhes and wrinkles and waxes thin.
Upon its face begin to vent,
Depending on the creature’s bent,
Features odd but of small extent.
Their smallness is indeed a test
Of our student’s eyesight. They protest,
And think our labels a cruel jest.
They’re each quite different, we declare,
Which drives them all to black despair,
As at the mocking plants they stare.
And truth to tell not even we,
Are always sure which one we see,
Isidia, soredia, which could it be?
Regardless which these fly aloft,
Both symbionts combined but soft,
They fall to earth, not seldom but oft.
Back on the earth, behold, it sprouts,
It lives, it breathes, convention flouts,
And grows to manhood, or thereabouts.
Thus, dear readers, have you now heard,
A tale as marvelous but far less blurred,
Than that of the logos, the living word.
Should that comparison seem extreme;
By it am I seen to blaspheme:
It’s nothing, I assure you, but blown off steam.
Tuesday, February 22, 2011
Fast UniFrac for barcoded 454 16S amplicons
When analyzing large 16S sequence data sets from multiple samples to infer community patterns, one of the best analytical methods available is Fast UniFrac. Barcoded 454 sequencing (a type of 'pyrosequencing') is quickly becoming the method of choice for generating the data for these types of analyses, and Fast UniFrac has been built especially to handle the large data sets generated using this method. However, there are simple data management issues that can keep researchers from using this methodology. One major hindrance with Fast UniFrac is that, if one wants to take advantage of the 16S reference tree that is built in to the program (something that becomes almost obligatory with sufficiently large data sets), it is necessary to put all of the sequence names in a specific format that reveals the sample of origin while preserving the individual sequence identifiers. Below I describe two distinct procedures that I developed for assembling the input file for the first step of the Fast UniFrac pyrosequencing analysis procedure. I have tested the first procedure on a data set of ~120,000 sequences (this required sending an email to request a higher Fast UniFrac quota, which is typically capped at 100,000 sequences) of ~500bp in length and the second procedure on a data set of ~40,000 sequences of the same approximate length.
Files needed:
Sequence files ('.fna'+'.qual' or '.sff' or '.fasta'+'.groups') with sequence names that are all the same length (standard for 454 data)
Oligos file with primers and barcodes for each sample (for use with Mothur) [http://www.mothur.org/wiki/Trim.seqs]
Special programs needed:
Mothur
A text editor that can perform search and replace on returns (e.g., TextWrangler for Macintosh or TextPad for Windows)
Microsoft Excel 2008 (previous versions max out at ~65000 rows and may do other unexpected things to large data sets)
BLAST (local)
PyCogent
Enthought Python Distribution (I used v6.3) (alternatively, you can download Python and Numpy, but the EPD has versions of these that play well together and should hopefully work with PyCogent as long as they are first in your path... getting Python, Numpy, and PyCogent to talk can be more complicated than it would seem)
Method #1 (no UNIX scripting required):
1) Take the '.fna' (fasta) and '.qual' (quality) files (or alternatively the '.sff' file) plus the manually-produced '.oligos' file for your data set and process them with Mothur using the 'trim.seqs' function [http://www.mothur.org/wiki/Trim.seqs]. This will produce a '.fasta' file with your sequences and a '.groups' file that shows which sequences come from which samples ('environments' in the language of UniFrac). I recommend using the many functions of Mothur to further cull your data set; however, the amazing power of Mothur will be the subject of future posts. If you do decide to change the composition of the '.fasta' file in any way (this is always necessary at least to some degree), you can use the Mothur 'list.seqs' function (performed on the finalized '.fasta' file) to generate an '.accnos' file, then follow that by the 'get.seqs' function (performed on the '.accnos' file and the original '.groups' file) to generate a finalized '.groups' file that correlates with the finalized '.fasta' file.
2) Open the '.groups' file in Microsoft Excel and edit the file to create a four-column spreadsheet: (A) '>' on every row; (B) sample names; (C) delimiter (I use '#') on every row, and (D) sequence names. Highlight all four columns and sort ascending according to column D (sequence names). This is done by going to 'Data' > 'Sort' > 'Column D', 'ascending'. Save as 'file1.csv' (comma-delimited text).
3) Open the new 'file1.csv' file in a text editor and remove all commas (this can be easily done in TextWrangler for Mac by opening the file and going to 'Search' > 'Find', putting ',' in the 'Find:' box, typing nothing in the 'Replace:' box, and clicking 'Replace All'). Save as 'file2.txt' (plain text).
4) Open the '.fasta' file with a text editor (e.g., TextWrangler). Remove all returns (e.g., Find: '\r', Replace: '' in TextWrangler with the 'Grep' box checked on the Find/Replace screen). Replace all instances of '>' with a return and '>' (e.g., '\r>'; remember that 'Grep' must be checked if you are using TextWrangler). Now the first line is empty; manually delete that line. Save as 'file3.txt' (plain text).
5) Import 'file2.txt' into the first column of Microsoft Excel (column A). On the same spreadsheet, import the names and sequences from file3.txt to the second and third columns (columns B and C) by clicking on cell B1 (column B, row 1) and importing 'file3.txt' as a text file with 'fixed width'; set the field width for the column break manually at the interface between the sequence name and the beginning of the nucleotide sequence (for my data, it was around the 15th place; if this doesn't work with the text import wizard, fixed-width delimitation can be performed under 'Text to Columns...' in the 'Data' menu). Highlight columns B and C *only* (if you highlight column A as well, this will not work). Go to 'Data' > 'Sort' > 'Column B', 'ascending'. At this point, you should have rows that look something like: '>CLSt#GR5DBVW03HJKND' '>GR5DBVW03HJKND' 'TACGATCGATCGATCAGCATCGATCA...' where the columns are correlated with one another and reading across a row should show the same identifier in column A as in column B. Be sure that this is the case throughout the spreadsheet. Now delete column B (the one with the identifiers from the imported fasta file). Save as 'file4.csv' (comma-delimited text).
6) Open 'file4.csv' with a text editor. Replace all instances of ',,' with a single return (e.g., find ',,' and replace with '\r' in TextWrangler... remember to have the 'Grep' box checked). Save as 'file5.fasta'.
7) Follow the instructions for "The BLAST to GreenGenes protocol" and 1-7 of the "Steps" in the Fast UniFrac tutorial (download raw pairwise distance matrices for further analyses), all found here:
http://bmf2.colorado.edu/fastunifrac/tutorial.psp
Method #2 (simpler overall, but requires assembling and running a customized UNIX shell script):
1) Take the '.fna' (fasta) and '.qual' (quality) files (or alternatively the '.sff' file) plus the manually-produced '.oligos' file for your data set and process them with Mothur using the 'trim.seqs' function [http://www.mothur.org/wiki/Trim.seqs]. This will produce a '.fasta' file with your sequences and a '.groups' file that shows which sequences come from which samples ('environments' in the language of UniFrac). I recommend using the many functions of Mothur to further cull your data set; however, the amazing power of Mothur will be the subject of future posts. If you do decide to change the composition of the '.fasta' file in any way (this is always necessary at least to some degree), you can use the Mothur 'list.seqs' function (performed on the finalized '.fasta' file) to generate an '.accnos' file, then follow that by the 'get.seqs' function (performed on the '.accnos' file and the original '.groups' file) to generate a finalized '.groups' file that correlates with the finalized '.fasta' file.
2) Create and run a shell script on the '.fasta' file that looks something like this:
#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#$ -o search_replace.log -j y
sed -i s/GMQ03P202B3SVL/MID08#GMQ03P202B3SVL/g file_name.fasta
sed -i s/GMQ03P202BTOMD/MID08#GMQ03P202BTOMD/g file_name.fasta
sed -i s/GMQ03P202CGYZW/MID08#GMQ03P202CGYZW/g file_name.fasta
sed -i s/GMQ03P202BT4E9/MID08#GMQ03P202BT4E9/g file_name.fasta
sed -i s/GMQ03P202BXELV/MID08#GMQ03P202BXELV/g file_name.fasta
It will have to contain a row for every sequence. This sort of thing can be assembled in Microsoft Excel:
Column A: 'sed -i s/' all the way down the column
Column B: sequence IDs (from a Mothur '.groups' file)
Column C: backslashes all the way down the column
Column D: sample identifiers (correlated with the sequence IDs... easily done if the sequence IDs and sample identifiers are both taken from a Mothur '.groups' file and the order is preserved)
Column E: delimiter (e.g., #) all the way down the column
Column F: identical to column B
Column G: '/g file_name.fasta' all the way down the column
After that is put together, save it as comma-delimited text, open it with a text editor, remove all commas, add the first few lines manually to make it a working script, and run.
3) Follow the instructions for "The BLAST to GreenGenes protocol" and 1-7 of the "Steps" in the Fast UniFrac tutorial (download raw pairwise distance matrices for further analyses), all found here: http://bmf2.colorado.edu/fastunifrac/tutorial.psp
Once you have the raw pairwise distance matrices, they can be analyzed in R or another statistical package/program. This will be the subject of a later post. As part of the Fast UniFrac protocol (when using the GreenGeenes core backbone tree), it will be necessary to use PyCogent. If you have trouble with PyCogent itself (or the command-line in general), please see my earlier post with some tricks and tips on getting that part of the Fast UniFrac 'BLAST-to-GreenGenes' protocol to work. Hopefully the current post can help someone in constructing the initial input file for the first part of the Fast UniFrac procedure!
- Brendan
Files needed:
Sequence files ('.fna'+'.qual' or '.sff' or '.fasta'+'.groups') with sequence names that are all the same length (standard for 454 data)
Oligos file with primers and barcodes for each sample (for use with Mothur) [http://www.mothur.org/wiki/Trim.seqs]
Special programs needed:
Mothur
A text editor that can perform search and replace on returns (e.g., TextWrangler for Macintosh or TextPad for Windows)
Microsoft Excel 2008 (previous versions max out at ~65000 rows and may do other unexpected things to large data sets)
BLAST (local)
PyCogent
Enthought Python Distribution (I used v6.3) (alternatively, you can download Python and Numpy, but the EPD has versions of these that play well together and should hopefully work with PyCogent as long as they are first in your path... getting Python, Numpy, and PyCogent to talk can be more complicated than it would seem)
Method #1 (no UNIX scripting required):
1) Take the '.fna' (fasta) and '.qual' (quality) files (or alternatively the '.sff' file) plus the manually-produced '.oligos' file for your data set and process them with Mothur using the 'trim.seqs' function [http://www.mothur.org/wiki/Trim.seqs]. This will produce a '.fasta' file with your sequences and a '.groups' file that shows which sequences come from which samples ('environments' in the language of UniFrac). I recommend using the many functions of Mothur to further cull your data set; however, the amazing power of Mothur will be the subject of future posts. If you do decide to change the composition of the '.fasta' file in any way (this is always necessary at least to some degree), you can use the Mothur 'list.seqs' function (performed on the finalized '.fasta' file) to generate an '.accnos' file, then follow that by the 'get.seqs' function (performed on the '.accnos' file and the original '.groups' file) to generate a finalized '.groups' file that correlates with the finalized '.fasta' file.
2) Open the '.groups' file in Microsoft Excel and edit the file to create a four-column spreadsheet: (A) '>' on every row; (B) sample names; (C) delimiter (I use '#') on every row, and (D) sequence names. Highlight all four columns and sort ascending according to column D (sequence names). This is done by going to 'Data' > 'Sort' > 'Column D', 'ascending'. Save as 'file1.csv' (comma-delimited text).
3) Open the new 'file1.csv' file in a text editor and remove all commas (this can be easily done in TextWrangler for Mac by opening the file and going to 'Search' > 'Find', putting ',' in the 'Find:' box, typing nothing in the 'Replace:' box, and clicking 'Replace All'). Save as 'file2.txt' (plain text).
4) Open the '.fasta' file with a text editor (e.g., TextWrangler). Remove all returns (e.g., Find: '\r', Replace: '' in TextWrangler with the 'Grep' box checked on the Find/Replace screen). Replace all instances of '>' with a return and '>' (e.g., '\r>'; remember that 'Grep' must be checked if you are using TextWrangler). Now the first line is empty; manually delete that line. Save as 'file3.txt' (plain text).
5) Import 'file2.txt' into the first column of Microsoft Excel (column A). On the same spreadsheet, import the names and sequences from file3.txt to the second and third columns (columns B and C) by clicking on cell B1 (column B, row 1) and importing 'file3.txt' as a text file with 'fixed width'; set the field width for the column break manually at the interface between the sequence name and the beginning of the nucleotide sequence (for my data, it was around the 15th place; if this doesn't work with the text import wizard, fixed-width delimitation can be performed under 'Text to Columns...' in the 'Data' menu). Highlight columns B and C *only* (if you highlight column A as well, this will not work). Go to 'Data' > 'Sort' > 'Column B', 'ascending'. At this point, you should have rows that look something like: '>CLSt#GR5DBVW03HJKND' '>GR5DBVW03HJKND' 'TACGATCGATCGATCAGCATCGATCA...' where the columns are correlated with one another and reading across a row should show the same identifier in column A as in column B. Be sure that this is the case throughout the spreadsheet. Now delete column B (the one with the identifiers from the imported fasta file). Save as 'file4.csv' (comma-delimited text).
6) Open 'file4.csv' with a text editor. Replace all instances of ',,' with a single return (e.g., find ',,' and replace with '\r' in TextWrangler... remember to have the 'Grep' box checked). Save as 'file5.fasta'.
7) Follow the instructions for "The BLAST to GreenGenes protocol" and 1-7 of the "Steps" in the Fast UniFrac tutorial (download raw pairwise distance matrices for further analyses), all found here:
http://bmf2.colorado.edu/fastunifrac/tutorial.psp
Method #2 (simpler overall, but requires assembling and running a customized UNIX shell script):
1) Take the '.fna' (fasta) and '.qual' (quality) files (or alternatively the '.sff' file) plus the manually-produced '.oligos' file for your data set and process them with Mothur using the 'trim.seqs' function [http://www.mothur.org/wiki/Trim.seqs]. This will produce a '.fasta' file with your sequences and a '.groups' file that shows which sequences come from which samples ('environments' in the language of UniFrac). I recommend using the many functions of Mothur to further cull your data set; however, the amazing power of Mothur will be the subject of future posts. If you do decide to change the composition of the '.fasta' file in any way (this is always necessary at least to some degree), you can use the Mothur 'list.seqs' function (performed on the finalized '.fasta' file) to generate an '.accnos' file, then follow that by the 'get.seqs' function (performed on the '.accnos' file and the original '.groups' file) to generate a finalized '.groups' file that correlates with the finalized '.fasta' file.
2) Create and run a shell script on the '.fasta' file that looks something like this:
#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#$ -o search_replace.log -j y
sed -i s/GMQ03P202B3SVL/MID08#GMQ03P202B3SVL/g file_name.fasta
sed -i s/GMQ03P202BTOMD/MID08#GMQ03P202BTOMD/g file_name.fasta
sed -i s/GMQ03P202CGYZW/MID08#GMQ03P202CGYZW/g file_name.fasta
sed -i s/GMQ03P202BT4E9/MID08#GMQ03P202BT4E9/g file_name.fasta
sed -i s/GMQ03P202BXELV/MID08#GMQ03P202BXELV/g file_name.fasta
It will have to contain a row for every sequence. This sort of thing can be assembled in Microsoft Excel:
Column A: 'sed -i s/' all the way down the column
Column B: sequence IDs (from a Mothur '.groups' file)
Column C: backslashes all the way down the column
Column D: sample identifiers (correlated with the sequence IDs... easily done if the sequence IDs and sample identifiers are both taken from a Mothur '.groups' file and the order is preserved)
Column E: delimiter (e.g., #) all the way down the column
Column F: identical to column B
Column G: '/g file_name.fasta' all the way down the column
After that is put together, save it as comma-delimited text, open it with a text editor, remove all commas, add the first few lines manually to make it a working script, and run.
3) Follow the instructions for "The BLAST to GreenGenes protocol" and 1-7 of the "Steps" in the Fast UniFrac tutorial (download raw pairwise distance matrices for further analyses), all found here: http://bmf2.colorado.edu/fastunifrac/tutorial.psp
Once you have the raw pairwise distance matrices, they can be analyzed in R or another statistical package/program. This will be the subject of a later post. As part of the Fast UniFrac protocol (when using the GreenGeenes core backbone tree), it will be necessary to use PyCogent. If you have trouble with PyCogent itself (or the command-line in general), please see my earlier post with some tricks and tips on getting that part of the Fast UniFrac 'BLAST-to-GreenGenes' protocol to work. Hopefully the current post can help someone in constructing the initial input file for the first part of the Fast UniFrac procedure!
- Brendan
P.S. This procedure places '#' in the sequence names as the delimiter. This information will be needed during the PyCogent-dependent portion of the Fast UniFrac protocol. Here is an example of what I type in the command line once I have navigated to the folder where I have placed the 'create_unifrac_env_file_BLAST.py' python script and the 'blast_output...' file (the confusing thing with my nomenclature is that the Python input is named 'output' because it is the BLAST output, while the Python output is named 'input' because it becomes the Fast UniFrac input; the logic behind this is that the essence of this step is the transformation of the BLAST output file into the Fast UniFrac input file):
python create_unifrac_env_file_BLAST. py blast_output_allreplaced_ ready4python.txt fastunifrac_input_file.txt #
Update: These instructions are now published as part of my dissertation (Hodkinson 2011) and an article in Environmental Microbiology (Hodkinson et al. 2012a); the supporting data/analysis/instruction files for the latter are available from the Dryad data repository (Hodkinson et al. 2012b).
----------------------------------------------
References
The above instructions are published in the following sources:
Hodkinson, B. P. 2011. A phylogenetic, ecological, and functional characterization of non-photoautotrophic bacteria in the lichen microbiome. Doctoral Dissertation, Duke University, Durham, NC.
Download Dissertation (PDF file)
Hodkinson, B. P., N. R. Gottel, C. W. Schadt, and F. Lutzoni. 2012a. Photoautotrophic symbiont and geography are major factors affecting highly structured and diverse bacterial communities in the lichen microbiome. Environmental Microbiology 14(1): 147-161.
Hodkinson, B. P., N. R. Gottel, C. W. Schadt, and F. Lutzoni. 2012b. Data from: Photoautotrophic symbiont and geography are major factors affecting highly structured and diverse bacterial communities in the lichen microbiome. Dryad Digital Repository doi:10.5061/dryad.t99b1.
----------------------------------------------
The development of the above protocols was supported in part by NSF (DEB-1011504) and the US Department of Energy.
Update: These instructions are now published as part of my dissertation (Hodkinson 2011) and an article in Environmental Microbiology (Hodkinson et al. 2012a); the supporting data/analysis/instruction files for the latter are available from the Dryad data repository (Hodkinson et al. 2012b).
----------------------------------------------
References
The above instructions are published in the following sources:
Hodkinson, B. P. 2011. A phylogenetic, ecological, and functional characterization of non-photoautotrophic bacteria in the lichen microbiome. Doctoral Dissertation, Duke University, Durham, NC.
Download Dissertation (PDF file)
Hodkinson, B. P., N. R. Gottel, C. W. Schadt, and F. Lutzoni. 2012a. Photoautotrophic symbiont and geography are major factors affecting highly structured and diverse bacterial communities in the lichen microbiome. Environmental Microbiology 14(1): 147-161.
Hodkinson, B. P., N. R. Gottel, C. W. Schadt, and F. Lutzoni. 2012b. Data from: Photoautotrophic symbiont and geography are major factors affecting highly structured and diverse bacterial communities in the lichen microbiome. Dryad Digital Repository doi:10.5061/dryad.t99b1.
----------------------------------------------
The development of the above protocols was supported in part by NSF (DEB-1011504) and the US Department of Energy.
Thursday, February 10, 2011
PyCogent for Fast UniFrac
As someone studying the composition of lichen-associated bacterial communities, I have generated several data sets of 16S rRNA gene sequences from bacteria that live in this specialized niche. Beyond the simple question of "who lives there?" we can start to use phylogenetic inferences to examine the ecology of this niche by comparing sets of 16S sequences from different communities and taking into account where the different members fall in a phylogeny. UniFrac is a tool that allows the integration of phylogenetic information into ecological comparative community analyses, and its hip new cousin Fast UniFrac is all the rage these days. But, alas, fully utilizing the special features of Fast UniFrac (such as mapping pyrosequencing reads to a reference phylogeny) requires PyCogent, the installation of which has given me much grief recently.
PyCogent is a great Python-based toolkit that can be used for conducting a number of analyses on biological sequence data (DNA, RNA, proteins); it is billed as "making sense from sequence" (Knight et al. 2007). There is a good guide to PyCogent known as the PyCogent Cookbook. Some programs/packages/pipelines that depend on PyCogent include QIIME and Fast UniFrac (for the latter, PyCogent is required only if you have a large 16S data set that requires a guide tree).
I have had trouble getting the different versions of Python, NumPy, and PyCogent to communicate with one another through UNIX (on both CentOS and MacOSX... although all of the various versions of the different dependencies may have been an issue, since I do not own the machines and I run several versions of Python myself locally), but I ran through the simple 2-step protocol listed below on Windows XP and Windows 7 and it worked very well for running the Python script associated with the Fast UniFrac 'BLAST-to-GreenGenes' protocol. This is a little odd since it is written that installation of PyCogent by itself is not supported for Windows... and the procedure that I outline below seems to be a pretty simple way to get it installed.
Installing and running PyCogent requires using the command line. If you would like to do this on a Windows machine and you are unfamiliar with the Windows command line, you can google tutorials on "MS-DOS" and/or "command prompt". There is a decent introductory guide here. The instructions below are written in a broad, inclusive way so that they should work with a UNIX-based system as well (including Macintosh; if you are a Mac user and are unfamiliar with the command line, you can google something like "Mac OSX Terminal" or find a good beginners' tutorial here).
Whatever type of system it is, the PATH variables must be set correctly so that the programs can find one another. As long as you do not have previous versions of Python, NumPy, or PyCogent installed, Windows should automatically set the environmental variables so that this protocol will work without a hitch (Macintosh most likely will not set the variables automatically because it usually comes with a pre-installed Python that it will always want to use). Click here to see a post that further addresses one of the issues with the wrong version of Python/NumPy getting in the way.
Here is my simplistic protocol for getting PyCogent moving enough to run the Python script mentioned above (I should note that this protocol is not approved by the makers of PyCogent, since it may not produce a fully-functional package, but it does allow me to run the script):
1) Installing Python, NumPy, etc.:
Install the most recent version of the Enthought Python Distribution package (free for academics).
Download the most recent version of PyCogent ('.tgz' file).
Unzip the folder (using, e.g., WinRAR, WinZip, or 7-Zip; an automatic partial unzip might leave it as '.gz' but one of the previously mentioned programs will allow you to unzip it fully and you can drag the folder to your desktop if necessary).
PyCogent is a great Python-based toolkit that can be used for conducting a number of analyses on biological sequence data (DNA, RNA, proteins); it is billed as "making sense from sequence" (Knight et al. 2007). There is a good guide to PyCogent known as the PyCogent Cookbook. Some programs/packages/pipelines that depend on PyCogent include QIIME and Fast UniFrac (for the latter, PyCogent is required only if you have a large 16S data set that requires a guide tree).
I have had trouble getting the different versions of Python, NumPy, and PyCogent to communicate with one another through UNIX (on both CentOS and MacOSX... although all of the various versions of the different dependencies may have been an issue, since I do not own the machines and I run several versions of Python myself locally), but I ran through the simple 2-step protocol listed below on Windows XP and Windows 7 and it worked very well for running the Python script associated with the Fast UniFrac 'BLAST-to-GreenGenes' protocol. This is a little odd since it is written that installation of PyCogent by itself is not supported for Windows... and the procedure that I outline below seems to be a pretty simple way to get it installed.
Installing and running PyCogent requires using the command line. If you would like to do this on a Windows machine and you are unfamiliar with the Windows command line, you can google tutorials on "MS-DOS" and/or "command prompt". There is a decent introductory guide here. The instructions below are written in a broad, inclusive way so that they should work with a UNIX-based system as well (including Macintosh; if you are a Mac user and are unfamiliar with the command line, you can google something like "Mac OSX Terminal" or find a good beginners' tutorial here).
Whatever type of system it is, the PATH variables must be set correctly so that the programs can find one another. As long as you do not have previous versions of Python, NumPy, or PyCogent installed, Windows should automatically set the environmental variables so that this protocol will work without a hitch (Macintosh most likely will not set the variables automatically because it usually comes with a pre-installed Python that it will always want to use). Click here to see a post that further addresses one of the issues with the wrong version of Python/NumPy getting in the way.
Here is my simplistic protocol for getting PyCogent moving enough to run the Python script mentioned above (I should note that this protocol is not approved by the makers of PyCogent, since it may not produce a fully-functional package, but it does allow me to run the script):
1) Installing Python, NumPy, etc.:
Install the most recent version of the Enthought Python Distribution package (free for academics).
2) Installing PyCogent:
Unzip the folder (using, e.g., WinRAR, WinZip, or 7-Zip; an automatic partial unzip might leave it as '.gz' but one of the previously mentioned programs will allow you to unzip it fully and you can drag the folder to your desktop if necessary).
In the command line, navigate to the PyCogent directory.
Type in the command line:
There are some further notes on installation here and in the README, but please note that it was the fact that these instructions didn't quite get me to where I was going that inspired me to write this post. Still, they are likely to provide exactly what is needed for most situations.
Depending on the sort of jobs you need to run using PyCogent, a single computer may or may not have enough computing power. I have an interest in PyCogent because I need it to run the aforementioned script that makes the Fast UniFrac '.env' input file (see the Fast UniFrac tutorial for more details on how this fits into the overall Fast UniFrac protocol). A single computer processor has more than enough computing power to handle this job, but some of the more advanced QIIME functions will certainly require greater power for sufficiently large data sets.
Hopefully the notes here can make Fast UniFrac more accessible to more people (specifically, when the mapping of pyrosequencing reads to a reference tree is required), since the various errors that may occur with PyCogent, NumPy, Python, etc. can be difficult. If you wish to use PyCogent directly, you will probably have to be somewhat familiar with the Python programming language, although the cookbook has enough examples that one may be able to stumble through it naively (not that I would recommend it). If you're like me, and only use PyCogent so that you can map sequences to a reference tree for Fast UniFrac, then everything else you'll need to know can probably be found in the excellent Fast UniFrac tutorial. The Fast UniFrac 'BLAST-to-GreenGenes' procedure also requires a local installation of BLAST (installation instructions for PC, Mac, Linux, etc.). Making the initial input file for this specific type of Fast UniFrac analysis can require some creative thinking, and will be the subject of a future post.
- Brendan
Type in the command line:
python setup.py install
There are some further notes on installation here and in the README, but please note that it was the fact that these instructions didn't quite get me to where I was going that inspired me to write this post. Still, they are likely to provide exactly what is needed for most situations.
Depending on the sort of jobs you need to run using PyCogent, a single computer may or may not have enough computing power. I have an interest in PyCogent because I need it to run the aforementioned script that makes the Fast UniFrac '.env' input file (see the Fast UniFrac tutorial for more details on how this fits into the overall Fast UniFrac protocol). A single computer processor has more than enough computing power to handle this job, but some of the more advanced QIIME functions will certainly require greater power for sufficiently large data sets.
Hopefully the notes here can make Fast UniFrac more accessible to more people (specifically, when the mapping of pyrosequencing reads to a reference tree is required), since the various errors that may occur with PyCogent, NumPy, Python, etc. can be difficult. If you wish to use PyCogent directly, you will probably have to be somewhat familiar with the Python programming language, although the cookbook has enough examples that one may be able to stumble through it naively (not that I would recommend it). If you're like me, and only use PyCogent so that you can map sequences to a reference tree for Fast UniFrac, then everything else you'll need to know can probably be found in the excellent Fast UniFrac tutorial. The Fast UniFrac 'BLAST-to-GreenGenes' procedure also requires a local installation of BLAST (installation instructions for PC, Mac, Linux, etc.). Making the initial input file for this specific type of Fast UniFrac analysis can require some creative thinking, and will be the subject of a future post.
- Brendan
Sunday, January 23, 2011
Poemules
Those involved in Southeastern U.S. licheneering have probably met Sean Q. Beeching on a foray. His knowledge of the local lichen flora of Georgia and neighboring states surpasses that of most lichenologists for any region. He even has a species named after him: Megalaria beechingii (Lendemer 2007). He has also authored a book of essays... I highly recommend the fascinatingly-titled: "'I Like You But What Can You Do, Can You Be a Bird?': Adventures In The Lichen Trade." Page 3 of this bulletin provides a review of the book that gives some clues as to what kinds of essays you will find between the covers.
I recently received a message from Sean regarding the video of James Lendemer on the NPR website and some of the thoughts that it inspired in him:
"
I sent James's lichen video, by way of your blog, to a friend of mine, the botanist Lisa Kruse, and she, being sharper than me, noticed that the narrator says that the lichen fungus eats the algae. She then asked me if that is true because her understanding, and mine, was that the fungus appropriates from the algae, in one way or another, its nutrients without actually consuming the little green fellows. Instead of answering the question I sent her these three poems, or perhaps, poemules, would more accurately describe them. And thus I send them to you in the hope that you find them amusing.... on the other hand, if recent investigations have determined that the fungus does indeed devour the algae, I would like to know about that.
Were I to venture to explain
How doth the fungus entertain,
I’d not have said he eats his guests
But rather shares in their bequests.
If to me the burden fell,
To illustrate the lichen, well,
More like the cow-maid, I would hold,
The fungus cultivates her fold,
She slaughters not her gentle beasts,
But rather milks them by their teats.
How doth the lichen food obtain?
The fungus, I would ascertain,
The alga probes with fingers shrewd.
His ticklish damsels something lewd,
And makes the ladies to surrender,
Unto its hands the purloined provender.
"
My only thoughts on the issue of whether the fungus 'eats' the algae are that [1] the fungi eat the products made by the algae (and they often even seem to enhance the production of food by the algae) and [2] the fungi presumably must devour the carcasses of the dead algae. Whether or not the fungi may, under certain conditions, 'kill' some of the algae or speed along their ultimate demise is not something that I think is known, although I would welcome additional commentary on the matter!
- Brendan
Works Cited:
Lendemer, J.C. 2007. Megalaria beechingii, a new species from the southern Appalachian Mountains of eastern North America. Opuscula Philolichenum 4: 41-44.
Download publication (PDF file)
I recently received a message from Sean regarding the video of James Lendemer on the NPR website and some of the thoughts that it inspired in him:
"
I sent James's lichen video, by way of your blog, to a friend of mine, the botanist Lisa Kruse, and she, being sharper than me, noticed that the narrator says that the lichen fungus eats the algae. She then asked me if that is true because her understanding, and mine, was that the fungus appropriates from the algae, in one way or another, its nutrients without actually consuming the little green fellows. Instead of answering the question I sent her these three poems, or perhaps, poemules, would more accurately describe them. And thus I send them to you in the hope that you find them amusing.... on the other hand, if recent investigations have determined that the fungus does indeed devour the algae, I would like to know about that.
Three rhymes on the lichen symbiosis occasioned by a lichen video’s narrator having said that the lichen fungus eats the lichen algae.
Were I to venture to explain
How doth the fungus entertain,
I’d not have said he eats his guests
But rather shares in their bequests.
If to me the burden fell,
To illustrate the lichen, well,
More like the cow-maid, I would hold,
The fungus cultivates her fold,
She slaughters not her gentle beasts,
But rather milks them by their teats.
How doth the lichen food obtain?
The fungus, I would ascertain,
The alga probes with fingers shrewd.
His ticklish damsels something lewd,
And makes the ladies to surrender,
Unto its hands the purloined provender.
"
My only thoughts on the issue of whether the fungus 'eats' the algae are that [1] the fungi eat the products made by the algae (and they often even seem to enhance the production of food by the algae) and [2] the fungi presumably must devour the carcasses of the dead algae. Whether or not the fungi may, under certain conditions, 'kill' some of the algae or speed along their ultimate demise is not something that I think is known, although I would welcome additional commentary on the matter!
- Brendan
Works Cited:
Lendemer, J.C. 2007. Megalaria beechingii, a new species from the southern Appalachian Mountains of eastern North America. Opuscula Philolichenum 4: 41-44.
Download publication (PDF file)
Saturday, January 8, 2011
PICS-Ord
Just this week I had a 'methodology' article published in BMC Bioinformatics. Robert Lücking of the Field Museum was the first author, and we worked with Alexis Stamatakis (of RAxML fame) and Reed Cartwright (creator of Ngila and Dawg). The paper is entitled "PICS-Ord: Unlimited Coding of Ambiguous Regions by Pairwise Identity and Cost Scores Ordination" and it presents a method for encoding data found in ambiguously-aligned regions of multiple sequence alignments in a way that makes it possible to integrate such data into standard molecular phylogenetic analyses. Most researchers simply exclude data found in ambiguously-aligned regions of nucleotide or amino-acid alignments when conducting phylogenetic inferences. While such practices are perfectly sound, a large amount of potentially informative data is subsequently left out of downstream analyses. However, using a method to recode these regions and integrate the data into phylogenetic analyses allows one to consider all of the data present in the larger molecular regions being analyzed.
Until PICS-Ord, no method had been devised for properly integrating this type of data into likelihood-based analyses (e.g., ML, Bayesian). INAASE (Lutzoni et al. 2000) is a program that recodes ambiguously-aligned regions, but since the distances between different sequence types are encoded as cost matrices, its utility is limited to parsimony-based analyses. It also has a finite number of symbols, making it impractical for large data sets. For each ambiguously-aligned region, PICS-Ord uses ordination of scores (which are based on pairwise alignments between the sequences for each taxon) to create a series of axes that are converted to discreet characters which can be appended to a multiple sequence alignment. The matrix of the sequence alignment plus the recoded characters can then be analyzed phylogenetically based on any number of criteria, including maximum likelihood (ML) and Bayesian inference.
PICS-Ord is available here as an R-based program. As academic software goes, it's pretty friendly, but please let us know if you run into any troubles. The publication of this method along with a program for implementation represents a great leap forward in phylogenetics, with the ability to finally integrate data from ambiguously-aligned regions into likelihood-based analyses!
-Brendan
P.S. Find out more about PICS-Ord here on Reed Cartwright's blog:
http://pandasthumb.org/archives/2011/01/pics-ord-unlimi.html
Works Cited:
Lücking, R., B. P. Hodkinson, A. Stamatakis, and R. A. Cartwright. 2011. PICS-Ord: Unlimited Coding of Ambiguous Regions by Pairwise Identity and Cost Scores Ordination. BMC Bioinformatics 12: 10.
Download publication (PDF file)
Download R-based PICS-Ord program (zipped program package)
View program wiki (website)
Lutzoni, F., P. Wagner, V. Reeb, and S. Zoller. 2000. Integrating ambiguously aligned regions of DNA sequences in phylogenetic analyses without violating positional homology. Systematic Biology 49: 628-651.
Download publication (PDF file)
Download Java-based INAASE program (zipped program package)
Until PICS-Ord, no method had been devised for properly integrating this type of data into likelihood-based analyses (e.g., ML, Bayesian). INAASE (Lutzoni et al. 2000) is a program that recodes ambiguously-aligned regions, but since the distances between different sequence types are encoded as cost matrices, its utility is limited to parsimony-based analyses. It also has a finite number of symbols, making it impractical for large data sets. For each ambiguously-aligned region, PICS-Ord uses ordination of scores (which are based on pairwise alignments between the sequences for each taxon) to create a series of axes that are converted to discreet characters which can be appended to a multiple sequence alignment. The matrix of the sequence alignment plus the recoded characters can then be analyzed phylogenetically based on any number of criteria, including maximum likelihood (ML) and Bayesian inference.
PICS-Ord is available here as an R-based program. As academic software goes, it's pretty friendly, but please let us know if you run into any troubles. The publication of this method along with a program for implementation represents a great leap forward in phylogenetics, with the ability to finally integrate data from ambiguously-aligned regions into likelihood-based analyses!
-Brendan
P.S. Find out more about PICS-Ord here on Reed Cartwright's blog:
http://pandasthumb.org/archives/2011/01/pics-ord-unlimi.html
Works Cited:
Lücking, R., B. P. Hodkinson, A. Stamatakis, and R. A. Cartwright. 2011. PICS-Ord: Unlimited Coding of Ambiguous Regions by Pairwise Identity and Cost Scores Ordination. BMC Bioinformatics 12: 10.
Download publication (PDF file)
Download R-based PICS-Ord program (zipped program package)
View program wiki (website)
Lutzoni, F., P. Wagner, V. Reeb, and S. Zoller. 2000. Integrating ambiguously aligned regions of DNA sequences in phylogenetic analyses without violating positional homology. Systematic Biology 49: 628-651.
Download publication (PDF file)
Download Java-based INAASE program (zipped program package)
Friday, December 31, 2010
Snow Camp Lichens
A few years back I stopped into a little town called Snow Camp, NC, and saw some amazing rooftops with wooden shingles filled with Cladonia lichens! I had never seen colonies so large in this region before. Recently, I realized that I would be passing by the town again and decided to take some photographs of this phenomenon. I had to do some acrobatics to get close-ups of them. You can click on the photos to see greater detail!
Standing on the back of a car to photograph the lowest of the Cladonia-filled rooftops.
One half of a roof almost entirely covered with lichens.
A roof corner with Cladonia cristatella and other species.
A close-up of just one shingle.
Happy New Year!
- Brendan
Subscribe to:
Posts (Atom)