As someone studying the composition of lichen-associated bacterial communities, I have generated several data sets of 16S rRNA gene sequences from bacteria that live in this specialized niche. Beyond the simple question of "who lives there?" we can start to use phylogenetic inferences to examine the ecology of this niche by comparing sets of 16S sequences from different communities and taking into account where the different members fall in a phylogeny. UniFrac is a tool that allows the integration of phylogenetic information into ecological comparative community analyses, and its hip new cousin Fast UniFrac is all the rage these days. But, alas, fully utilizing the special features of Fast UniFrac (such as mapping pyrosequencing reads to a reference phylogeny) requires PyCogent, the installation of which has given me much grief recently.
PyCogent is a great Python-based toolkit that can be used for conducting a number of analyses on biological sequence data (DNA, RNA, proteins); it is billed as "making sense from sequence" (Knight et al. 2007). There is a good guide to PyCogent known as the PyCogent Cookbook. Some programs/packages/pipelines that depend on PyCogent include QIIME and Fast UniFrac (for the latter, PyCogent is required only if you have a large 16S data set that requires a guide tree).
I have had trouble getting the different versions of Python, NumPy, and PyCogent to communicate with one another through UNIX (on both CentOS and MacOSX... although all of the various versions of the different dependencies may have been an issue, since I do not own the machines and I run several versions of Python myself locally), but I ran through the simple 2-step protocol listed below on Windows XP and Windows 7 and it worked very well for running the Python script associated with the Fast UniFrac 'BLAST-to-GreenGenes' protocol. This is a little odd since it is written that installation of PyCogent by itself is not supported for Windows... and the procedure that I outline below seems to be a pretty simple way to get it installed.
Installing and running PyCogent requires using the command line. If you would like to do this on a Windows machine and you are unfamiliar with the Windows command line, you can google tutorials on "MS-DOS" and/or "command prompt". There is a decent introductory guide here. The instructions below are written in a broad, inclusive way so that they should work with a UNIX-based system as well (including Macintosh; if you are a Mac user and are unfamiliar with the command line, you can google something like "Mac OSX Terminal" or find a good beginners' tutorial here).
Whatever type of system it is, the PATH variables must be set correctly so that the programs can find one another. As long as you do not have previous versions of Python, NumPy, or PyCogent installed, Windows should automatically set the environmental variables so that this protocol will work without a hitch (Macintosh most likely will not set the variables automatically because it usually comes with a pre-installed Python that it will always want to use). Click here to see a post that further addresses one of the issues with the wrong version of Python/NumPy getting in the way.
Here is my simplistic protocol for getting PyCogent moving enough to run the Python script mentioned above (I should note that this protocol is not approved by the makers of PyCogent, since it may not produce a fully-functional package, but it does allow me to run the script):
Download the most recent version of PyCogent ('.tgz' file). Unzip the folder (using, e.g., WinRAR, WinZip, or 7-Zip; an automatic partial unzip might leave it as '.gz' but one of the previously mentioned programs will allow you to unzip it fully and you can drag the folder to your desktop if necessary).
In the command line, navigate to the PyCogent directory.
Type in the command line:
python setup.py install
There are some further notes on installation here and in the README, but please note that it was the fact that these instructions didn't quite get me to where I was going that inspired me to write this post. Still, they are likely to provide exactly what is needed for most situations.
Depending on the sort of jobs you need to run using PyCogent, a single computer may or may not have enough computing power. I have an interest in PyCogent because I need it to run the aforementioned script that makes the Fast UniFrac '.env' input file (see the Fast UniFrac tutorial for more details on how this fits into the overall Fast UniFrac protocol). A single computer processor has more than enough computing power to handle this job, but some of the more advanced QIIME functions will certainly require greater power for sufficiently large data sets.
Hopefully the notes here can make Fast UniFrac more accessible to more people (specifically, when the mapping of pyrosequencing reads to a reference tree is required), since the various errors that may occur with PyCogent, NumPy, Python, etc. can be difficult. If you wish to use PyCogent directly, you will probably have to be somewhat familiar with the Python programming language, although the cookbook has enough examples that one may be able to stumble through it naively (not that I would recommend it). If you're like me, and only use PyCogent so that you can map sequences to a reference tree for Fast UniFrac, then everything else you'll need to know can probably be found in the excellent Fast UniFrac tutorial. The Fast UniFrac 'BLAST-to-GreenGenes' procedure also requires a local installation of BLAST (installation instructions for PC, Mac, Linux, etc.). Making the initial input file for this specific type of Fast UniFrac analysis can require some creative thinking, and will be the subject of a future post.
Portions of this blog are based upon work supported by the National Science Foundation (NSF) and the National Institutes of Health (NIH) under grants specified in the individual posts. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the NSF or NIH.