In this practical we will first choose our own data and then apply two versions of the Mk model to our data.
First, go to this website and choose a data set you want to analyse. Take the Nexus file. Once you have chosen you data set add it to your data folder for this exercise. In the scripts folder should create three files.
You can choose what extensions to add to your model based on what you think is important for your data set
Read in our data
morpho <- readDiscreteCharacterData("data/file-name.nex")
Set up helper variables
taxa <- morpho.names()
num_taxa <- taxa.size()
# num of branches for an unrooted tree
num_branches <- 2 * num_taxa - 3
# moves
moves = VectorMoves()
br_len_lambda <- 10
# We assume a uniform prior on topology.
phylogeny ~ dnUniformTopologyBranchLength(taxa, branchLengthDistribution=dnExponential(br_len_lambda))
# compute the tree length from the phylogeny
tree_length := phylogeny.treeLength()
moves.append( mvNNI(phylogeny, weight=num_branches/2.0) )
moves.append( mvSPR(phylogeny, weight=num_branches/10.0) )
moves.append( mvBranchLengthScale(phylogeny, weight=num_branches) )
Source the Mk file
source("scripts/Mk.Rev")
Replace Mk.Rev
with what ever you have named your
script
Define the model
# We define our model.
# We can use any node of our model as a handle, here we chose to use the tree.
mymodel = model(phylogeny)
The object mymodel
is a wrapper around the entire model
graph and allows us to pass the model to various functions that are
specific to our MCMC analysis.
Set up our monitors for the analysis Change MK-NAME
to
the version of the model you are running. You can call it anything, just
make sure you change it between your two versions.
monitors.append( mnModel(filename= "output/" + MK-NAME + "_posterior.log",printgen=10, separator = TAB) )
monitors.append( mnFile(filename= "output/" + MK-NAME + "_posterior.trees",printgen=10, separator = TAB, phylogeny) )
monitors.append( mnScreen(printgen=1000, tree_length) )
monitors.append( mnStochasticVariable(filename= "output/" + MK-NAME +"_posterior.var",printgen=10) )
Run the anaylsis
mymcmc = mcmc(mymodel, monitors, moves, nruns=2, combine="mixed")
mymcmc.burnin(generations=200,tuningInterval=200)
mymcmc.run(generations= 10000 ,tuningInterval=200)
Here you will create two scripts. You can have an unpartitioned or partitioned version of the model it is up to you. Decide which extensions you think are interesting/important and build your sequence evolution model
#### specify the Jukes-Cantor substitution model applied uniformly to all sites ###
# change num_state to the maximum number of states in your data set
Q := fnJC(num_states)
alpha_inv ~ dnExponential(1)
alpha_morpho := 1 / alpha_inv
rates_morpho := fnDiscretizeGamma( alpha_morpho, alpha_morpho, 4 )
# Moves on the parameters to the Gamma distribution.
moves.append( mvScale(alpha_inv,lambda=1, weight=2.0) )
To do this, we don’t need to create any new variables. We just add it
to out PhyloCTMC object by coding="variable"
# the sequence evolution model.
# This is for just the Mk model, no extensions
seq ~ dnPhyloCTMC(tree=phylogeny, Q=Q, type="Standard")
# to add gamma add
siteRates=rates_morpho
# to account for Ascertainment Bias add
coding="variable
# attach the data
seq.clamp(morpho)
This code will partition the data by the number of character states. This code builds a model of sequance evolution for each partition. This is for an Mk model. To add G and V add the lines as above.
n_max_states <- num_states
idx = 1
morpho_bystate[1] <- morpho
for (i in 2:n_max_states) {
morpho_bystate[i] <- morpho # make local tmp copy of data
morpho_bystate[i].setNumStatesPartition(i) # only keep character blocks with state space equal to size i
nc = morpho_bystate[i].nchar() # get number of characters per character size with i-sized states
if (nc > 0) { # for non-empty character blocks
q[idx] <- fnJC(i) # make i-by-i rate matrix
m_morph[idx] ~ dnPhyloCTMC( tree=phylogeny,
Q=q[idx],
nSites=nc,
type="Standard") # create model of evolution for the character block
m_morph[idx].clamp(morpho_bystate[i]) # attach the data
idx = idx + 1 # increment counter
idx
}
}