Wednesday, April 13, 2011

WOODSmont #2 - Continuing Outreach

I recently set up an exhibit all about lichens at my second WOODSmont Childrens' Festival. Some readers may recall my post about last year's event. Once again, it was a great success; it was both a lot of fun and a great chance to expose people to lichens at a young age!

The festival was sponsored by Duke University's Wilderness Outdoor Opportunities for Durham Students ('WOODS'). My table included 'touch and feel' lichens, a dissecting scope (so that children could see the lichens up close), and samples of local lichens for children to take home. I also had opportunities to interact with people of all ages from the local community and a number of K-12 educators.

There were many lichens to put under the scope.

I was sure to accommodate even the smallest of lichen observers.

The whole family can learn to love lichens!

 A lesson on minute graphids.
 

Saturday, April 2, 2011

Phylogenetic tree editing: Reinserting removed identical sequences

In phylogenetic analyses, a large number of identical sequences can sometimes prove to be problematic.  This post outlines a protocol for creating and running a customized Unix shell script that reinserts identical sequences into a phylogenetic tree file (NEWICK or NEXUS format), for situations in which identical sequences were removed pre-analysis.

Identical sequences may have been removed using the Mothur 'unique.seqs' function (in which case a '.names' file would have been generated, storing the information about which sequences were removed) or RAxML (which generates a '.reduced.phy' file for phylogenetic analysis and a log file that contains a list of the removed sequences and their remaining representatives in a format that can be easily extracted using Unix or Microsoft Excel). The protocol described here relies on using a '.names' file. If sequences were not removed using Mothur, the '.names' file can be manually generated (here are notes on the basic format: http://www.mothur.org/wiki/Names_file ), or the original sequence file can be processed using the Mothur 'unique.seqs' function.

This script will need to be built from the ground up as a customized Unix shell script for your sequence set. This can be assembled easily in Microsoft Excel or one of its clones:
Column A: 'sed -i s/' all the way down the column
Column B: sequence IDs for representative sequences (Column 1 of the '.names' file)
Column C: backslashes all the way down the column
Column D: lists of sequences represented by each representative sequence (including the representative itself) separated by commas; each line must correlate with the Column B identifiers (Column D corresponds to Column 2 of the Mothur '.names' file)
Column E: '/g file_name.tre' all the way down the column
After this is put together, save it as tab-delimited text, open it with an advanced text editor (one that can perform a search and replace on tabs, e.g., TextWrangler or TextPad), remove all tabs (search for tabs and replace them with nothing), and add the first few lines manually to make it a working script. [Note: If one sequence name anywhere in the tree file or '.names' file is nested within another (e.g., 'bacterium' and 'bacterium2'), a colon can be added immediately after the name of the representative sequence with the shorter name, as long as a colon is added after the list of sequences being represented by that sequence.] The script can now be run on the original tree file and it will transform it into a tree file containing all of the sequences in the original sequence set (before removing identical sequences).

Here's an example:

#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#$ -o search_replace.log -j y

sed -i s/5005c2/5005c2,HL06C03c12/g RAxML_bipartitions.Rhizo_RAxML_topo_BP_50plus.tre
sed -i s/5005c4/5005c4,CL08C02c09,uncultured_bacterium_FD01A08,uncultured_bacterium_FD04E06/g RAxML_bipartitions.Rhizo_RAxML_topo_BP_50plus.tre
sed -i s/5015c31/5015c31,EL02B02c77,EL02C01c63,EL02C02c85,EL02C03c68,EL04C01c65,EL04C01c68,EL04C01c71,EL04C01c72,EL05B03c02,EL06C03c65,EL06C03f17,EL08B01c10,EL08B01c13,EL08B03c17,EL08B03c19,EL09A01c65,EL09A01c67,EL09A01c68,EL09A01c70,EL09A03c19,EL09A03c20,EL09B02c36,EL09B02c39,EL10B01c41,HL10A02c32,NL07B01c12,NL08C03c25,NL08C03f89,NL08C03f90,NL08C03f93/g RAxML_bipartitions.Rhizo_RAxML_topo_BP_50plus.tre
sed -i s/5027c58/5027c58,EL08B01c09/g RAxML_bipartitions.Rhizo_RAxML_topo_BP_50plus.tre
sed -i s/uncultured_bacterium:/uncultured_bacterium,HL08B03c26:/g RAxML_bipartitions.Rhizo_RAxML_topo_BP_50plus.tre
sed -i s/uncultured_bacterium_5C231311/uncultured_bacterium_5C231311,GQ109020/g RAxML_bipartitions.Rhizo_RAxML_topo_BP_50plus.tre
sed -i s/uncultured_bacterium_FD02D06/uncultured_bacterium_FD02D06,EL02C01c61,EL02C02c84,EL02C03c67,EL08C01c23,EL08C03c09,EL10C03c15,NL01B03c63,NL07B01c10,NL07B01d89,NL07B03d84,NL10A02c32/g RAxML_bipartitions.Rhizo_RAxML_topo_BP_50plus.tre
sed -i s/uncultured_bacterium_nbw397h09c1/uncultured_bacterium_nbw397h09c1,HL05A03c20/g RAxML_bipartitions.Rhizo_RAxML_topo_BP_50plus.tre
sed -i s/uncultured_bacterium_Sed3/uncultured_bacterium_Sed3,EF064161/g RAxML_bipartitions.Rhizo_RAxML_topo_BP_50plus.tre

This information and many more bioinformatics tricks, tips, and scripts can be found in my doctoral dissertation (Hodkinson 2011), which will be coming out soon!

- Brendan

Update: These instructions are now published as part of a paper in Environmental Microbiology (Hodkinson et al. 2012) and the data/analysis/instruction files are available from the Dryad data repository (Hodkinson et al. 2011).

----------------------------------------------

References

The above instructions are published in the following sources:

Hodkinson, B. P. 2011. A phylogenetic, ecological, and functional characterization of non-photoautotrophic bacteria in the lichen microbiome. Doctoral Dissertation, Duke University, Durham, NC.
Download Dissertation (PDF file)

Hodkinson, B. P., N. R. Gottel, C. W. Schadt, and F. Lutzoni. 2011. Data from: Photoautotrophic symbiont and geography are major factors affecting highly structured and diverse bacterial communities in the lichen microbiome. Dryad Digital Repository doi:10.5061/dryad.t99b1.

Hodkinson, B. P., N. R. Gottel, C. W. Schadt, and F. Lutzoni. In press. Photoautotrophic symbiont and geography are major factors affecting highly structured and diverse bacterial communities in the lichen microbiome. Environmental Microbiology 14(1): 147-161. [doi:10.1111/j.1462-2920.2011.02560.x]
Download publication (PDF file)
Download supplementary phylogeny (PDF file)
View data and analysis file web-portal (website)
Download data and analysis file archive (ZIP file)

----------------------------------------------

This work was funded in part by NSF DEB-1011504.