Slides day 1
Exercise 1 - Parameter estimation
Exercise 2 - Tree topologies
Exercise 3 - Model comparison
Exercise 4 - Branch support
Exercise 5 - Command line
Exercise 6 - Inferring ML phylogenies with codon models
Exercise 7 - Inferring ML phylogenies using real datasets
Exercise 8 - Re-Analyze published datasets
Exercise 1 - Simple codon model
We will use codeml program from PAML by Ziheng Yang. Use the command line mode for the tasks below. First, you need to understand which control file options to use. Next, try to reproduce the same analyses with codeml</code>.
You will need a dataset of homologous protein-coding DNA sequences (starting with the 1st codon position and ending with the 3rd). We will use data from published articles and will regenerate published results:
Site-models: Yang, Z., R. Nielsen, N. Goldman, A.-M. K. Pedersen. 2000. Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155: 431-449.
Data 1: bglobin.nuc Tree 1: bglobin.tree
Data 2: HIVenvSweden.nuc Tree 2: HIVenvSweden.trees
Data 3: adh.nuc Tree 3: adh.trees
The simple codon model with constant ω.
Choose a dataset from the publication above and fit model M0 - the most simple codon model with constant ω over time and sites. Run model M0 twice: first with branch lengths fixed to those in the tree file, and once with branch lengths estimated by ML.
Compare the optimised log-likelihoods for the two runs? In which case is it higher? Why?
Next, study the output file for the run with estimated branch lengths: Do you observe the codon frequency bias?
Study the statistics of nucleotide usage for different codon positions. Which position displays the most bias? Why? What is the ML estimate of the transition-transversion ratio κ? What is the ML estimate of the ω-ratio? How do you interpret these ML estimates?
Please refer to PaML/codeml documentation available here