Monday, July 25, 2011

Taxonomy as Wastewater Treatment

I was thinking about taxonomic treatments of specific groups of organisms; in some cases a 'treatment' is essentially a rehash and synthesis of what has been published previously (but just for a specific subregion, etc). While focusing on the mechanistic details of how my colleagues and I put together treatments (not through rehashing), I thought of the process by which sewage is treated. So I did a Google image search for "taxonomic treatment," and basically got a bunch of photos of journal pages/covers, some phylogenetic trees, and some photos of organisms. I think that's pretty much how most people see biological taxonomy, which explains a lot about why taxonomy is often seen as dull and sometimes (even worse) unscientific. But then I did a Google image search for "wastewater treatment" and I was amazed to see that the images there generally matched my concept of how to do a taxonomic treatment much better than what I saw in the previous search! What I saw were mainly flowcharts... they showed processes like screening, pre-treatment, cleaning, clarification, digestion, storage, disposal... and the processes all flowed into one another and ended up making products for public consumption! Yes! This is it! This is how we really need to be doing taxonomy! Instead of perpetuating the problems that exist, take them head on... get the junk out of the way, and make something that people can use. Many researchers seem to be of the mind that if it's mostly right, it's good enough; but as any wastewater treatment plant manager will tell you, even if it's only 10-20% sewage, it's not fit for public consumption. Let us view our taxonomy in the same manner!

Lichenologist James Lendemer is famously quoted as saying "I think of myself as a bounty hunter." Perhaps I should think of myself as a manager of a wastewater treatment plant. Maybe that's not as glorious, but it certainly is important. So much work remains to be done before we get close to having a reasonable set of names for the organisms on the Earth. As long as humans are involved, our nomenclatural system will be imperfect and will require constant cleaning, management, and enforcement of standards... and there I will stand, ready to take on the nastiest and dirtiest of the problems!

- Brendan

P.S. Some recent big news in the NYC area has been the big fire at a wastewater treatment plant that sent sewage spewing into the Hudson River. Thought exercise for taxonomists: Can you think of any events like this one (speaking metaphorically) that affected your particular group of organisms?

Tuesday, July 12, 2011

Man It Feels Good 2 B A Lichen

Recently I was thinking about the plight of the sterile crustose lichens, specifically those in Eastern North America.  One could feel sorry for them, being so taxonomically neglected and underrepresented in all major surveys of biodiversity.  But in a certain kind of way, I think that they must be very proud.  It's certainly amazing how they've been able to get by on so little (so little sex, so little attention from humans).  Inspired by the story of the sterile crusts, I decided to write some lyrics, which I entitled "Man, It Feels Good 2 B A Lichen," and set the words to the music of the similarly-titled song (made famous by the movie Office Space) by the Geto Boys, about being a gangsta, not a lichen. Some people have said that it's groundbreaking... that it's a whole new genre (known as "Lichen Rap" or, more commonly, "Lich-Hop")... I just like to think of it as one of the products resulting from the inspiration that I get from the amazing organisms that I study!



Man, It Feels Good 2 B A Lichen (a sterile crust’s song)

Man, it feels good to be a lichen
Live sterile crust lichens ain’t indoors
Real sterile crust lichens are hard to identify
‘cause sterile crust lichens don’t make spores

Man, it feels good to be a lichen
I mean one that you don’t really know
Livin’ as a sterile crust, drivin’ people crazy
‘cause I can’t be identified for sho’

Now sterile crust lichens come in all shapes and colors
Some got killed in the past
But if NSF could just get their back with some fundin’
We could study them and hopefully they’ll last

Now all I gotta say to you
Sexually-reproducin’ lichen-formin’ fungi recombinin’
When your spore can’t find no algae what you think you gonna do?
Man, it feels good to be a lichen

Man, it feels good to be a lichen
Flying round on the currents all day
‘cause when a sterile crust lichen goes and tries to reproduce
It bundles up the partners and it blows away

Now when a lichen like this one is livin’ in your ‘hood
It’s most likely that you ain’t gonna know
‘cause lots of sterile crust lichens are small and inconspicuous
But under UV they might glow

Now all I gotta say to you
Sexually-reproducin’ lichen-formin’ fungi recombinin’
When your spore can’t find no algae what you think you gonna do?
Man, it feels good to be a lichen


Stay tuned for news about the record release party in the Bronx, to take place later this year! 

Tuesday, July 5, 2011

Perl: Renaming DNA Sequences

When I began my dissertation studies, I did not know the wonders of the Perl programming language. However, within the past year, it has proven to be an invaluable tool for manipulating DNA sequence data sets and helping me to tackle projects that once seemed too large in scope. In this post I will just give one example of a Perl script that I wrote after getting some training from Dr. Bob Thomson, who I met at the NSF-Sponsored "Fast, Free Phylogenies" workshop at NIMBioS (Knoxville, TN).

When processing large numbers of DNA sequences, it always helps to have a standardized naming system so that the sequences can be handled in an automated way during downstream analyses. For a recent large-scale cloning experiment that involved picking 2880 clonal bacterial colonies (to amplify and sequence a vector-inserted 16S gene fragment from each), I developed a 10-digit alpha-numeric code that allowed me to encode all of the necessary data about my sequences into each specific sequence identifier. However, the sequencing facility also needed to use its own codes to keep track of my sequences, so I ended up with long names that had my own codes in the middle with information identifying them as my sequences in front and information about the individual sequence reads themselves tacked on at the end. Therefore, to recover the names (without retaining the sequencing facility's additions) in an automated way, I wrote a simple Perl script to edit a fasta file containing the sequences (this was run after the process of manual sequence correction had been finished).

The following script (‘Clon_16S_fasta_renamer.pl’) allowed me to extract the 10-digit alpha-numeric codes that I used in my dissertation studies (Hodkinson 2011) from the long names (with extraneous information) that come from the sequencing facility. It creates a new fasta file with these modified identifiers. Specifically, it takes sequences that have "BH_" (my initials), followed by a 10-digit code, followed by additional characters, and simply renames each sequence using just the 10-digit code (effectively stripping out "BH_" at the beginning and and extra characters at the end). The new file will have the same name, but the extension will be replaced by ".ed.fasta". This can be easily modified for any set of sequences that are identified using a standardized naming scheme.

#!/usr/bin/perl

print "\nPlease type the name of your input file: ";
my $filename = <STDIN>;
chomp $filename;
open (FASTA, $filename);
    {
    if ($filename =~ /(.*)\.[^.]*/)
        {
        open OUT, ">$1.ed.fasta";
        }
    }

while (<FASTA>)
    {
    if ($_ =~ /^>BH\_(..........)/)
        {
        print OUT ">$1\n";
        }
    if ($_ =~ /^[A,C,G,T,R,Y,K,M,S,W,B,D,H,V,N,:,-][A,C,G,T,R,Y,K,M,S,W,B,D,H,V,N,:,-][A,C,G,T,R,Y,K,M,S,W,B,D,H,V,N,:,-][A,C,G,T,R,Y,K,M,S,W,B,D,H,V,N,:,-]*/)
        {
        print OUT $_;
        }
    if ($_ =~ /^[A,C,G,T,R,Y,K,M,S,W,B,D,H,V,N,:,-][A,C,G,T,R,Y,K,M,S,W,B,D,H,V,N,:,-]$/)
        {
        print OUT $_;
        }
    if ($_ =~ /^[A,C,G,T,R,Y,K,M,S,W,B,D,H,V,N,:,-]$/)
        {
        print OUT $_;
        }
    }

If you need to know how to run a Perl script, you can look it up on Google, but here is one example of how to run a Perl script using Windows (it's actually easier on almost any other type of operating system). Since I was performing a simple task with a very specific data set, it was easy for me to use basic Perl commands. However, for more complex sequence manipulations, BioPerl provides an excellent collection of Perl modules for biological applications.

- Brendan

----------------------------------------------

References

The above script is published in the following sources:

Hodkinson, B. P. 2011. A Phylogenetic, Ecological, and Functional Characterization of Non-Photoautotrophic Bacteria in the Lichen Microbiome. Doctoral Dissertation, Duke University, Durham, NC.
Download Dissertation (PDF file)

Hodkinson, B. P., N. R. Gottel, C. W. Schadt, and F. Lutzoni. 2011. Data from: Photoautotrophic symbiont and geography are major factors affecting highly structured and diverse bacterial communities in the lichen microbiome. Dryad Digital Repository doi:10.5061/dryad.t99b1.

Hodkinson, B. P., N. R. Gottel, C. W. Schadt, and F. Lutzoni. In press. Photoautotrophic symbiont and geography are major factors affecting highly structured and diverse bacterial communities in the lichen microbiome. Environmental Microbiology.

----------------------------------------------

This work was funded in part by NSF DEB-1011504 and EF-0832858.