Friday, June 17, 2011

Writing a Phycas Script

For a while I have been wary of phylogenetic results supported only by Bayesian analyses, because of the so-called 'star-tree paradox' that haunts MrBayes and even some other programs like it. As I have mentioned previously, one of the best features of the Bayesian phylogenetic program Phycas is that it gives one the opportunity to allow polytomies in the trees sampled as part of the posterior (which can often deflate the inflated posterior probability values seen with programs like MrBayes). The specific command for this is:
"mcmc.allow_polytomies = True"

To run Phycas, it is best to write a script to go with a standard NEXUS-formatted sequence alignment. There is some basic information on how to install and run Phycas in my previous post:
http://squamules.blogspot.com/2011/06/installing-and-running-phycas.html
However, that post does not go into any of the details of scripting for Phycas. Recently, I ran a multigene analysis with mtSSU, ITS1, 5.8S, and ITS2 in different partitions, with a different evolutionary model for each. Here is what my Phycas script looked like:

from phycas import *
setMasterSeed(98765)
mcmc.data_source = 'Input_file_name.nex'
mcmc.out.log = 'Output_file_name.log'
mcmc.out.log.mode = REPLACE
mcmc.allow_polytomies = True
mcmc.polytomy_prior = False
mcmc.topo_prior_C = 1.0
mcmc.out.trees.prefix = 'Output_file_name'
mcmc.out.params.prefix = 'Output_file_name'
mcmc.ncycles = 50000
mcmc.sample_every = 10
# Set up the K80+I model for 5pt8S
model.type="hky"
model.state_freqs = [0.25, 0.25, 0.25, 0.25]
model.fix_freqs = True
model.kappa = 2.0
model.kappa_prior = BetaPrime(1.0, 1.0)
model.pinvar_model = True
# Save the K80+I model for 5pt8S
m3 = model()
# Set up the GTR+I model for mtSSU
model.type="gtr"
model.state_freqs = [0.3338, 0.1493, 0.1983, 0.3187]
model.fix_freqs = False
model.relrates = [1.4783, 5.8050, 3.3222, 0.6768, 7.6674, 1.0000]
model.pinvar_model = True
# Save the GTR+I model for mtSSU
m1 = model()
# Set up the HKY+G model for ITS1
model.type="hky"
model.state_freqs = [0.1487, 0.3566, 0.2704, 0.2244]
model.fix_freqs = False
model.kappa = 2.0
model.kappa_prior = BetaPrime(1.0, 1.0)
model.num_rates = 4
model.gamma_shape = 0.5
model.gamma_shape_prior = Exponential(1.0)
model.pinvar_model = False
# Save the HKY+G model for ITS1
m2 = model()
# Set up the HKY+G model for ITS2
model.state_freqs = [0.1419, 0.3069, 0.3199, 0.2314]
# Save the HKY+G model for ITS2
m4 = model()
# Define partition subsets
mtssu = subset(1, 1080)
its1 = subset(1081, 1607)
fivept8S = subset(1608, 1768)
its2 = subset(1769, 2041)
# Assign partition models to subsets
partition.addSubset(mtssu, m1, "mtSSU")
partition.addSubset(its1, m2, "ITS1")
partition.addSubset(fivept8S, m3, "5pt8S")
partition.addSubset(its2, m4, "ITS2")
partition()
# Start the run
mcmc()
# Summarize the posterior
sumt.trees = 'trees.t'
sumt.burnin = 500
sumt.tree_credible_prob = 1.0
sumt()

Although I have some notes within the script, please see the Phycas manual for instructions on what each of the individual commands does. Hopefully more people will be using Phycas (and allowing polytomies!) in the future!

- Brendan

Note: One question that I had about running Phycas was how to define exclusion sets; however, Phycas apparently can read the EXSET line of the ASSUMPTIONS block of the NEXUS file in the same way that Mesquite, MacClade, and PAUP* can.

No comments:

Post a Comment