Slides day 1
Exercise 1 - Parameter estimation
Exercise 2 - Tree topologies
Exercise 3 - Model comparison
Exercise 4 - Branch support
Exercise 5 - Command line
Exercise 6 - Inferring ML phylogenies with codon models
Exercise 7 - Inferring ML phylogenies using real datasets
Exercise 8 - Re-Analyze published datasets
Exercise 2 - Branch models
We will use codeml program from PAML by Ziheng Yang. Use the command line mode for the tasks below. First, you need to understand which control file options to use. Next, try to reproduce the same analyses with
You will need a dataset of homologous protein-coding DNA sequences (starting with the 1st codon position and ending with the 3rd). We will use data from published articles and will regenerate published results:
Branch models: Yang, Z. 1998. Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol. Biol. Evol. 15:568-573.
Data 1: lysozymeSmall.nucTree 1: lysozymeSmall.trees
Use the small lysozyme example to fit free-ratio model for branches. Are results consistent with those presented in the publication above?
Label the branch leading to colobine clade and fit the 2-ratio branch model to your data.
In addition, label the branch leading to hominoids clade (use different label) and fit the 3-ratio branch model to your data.
Based on LRTs, what model fits your data the best (among 2-ratio, 3-ratio and free-ratio models)? What are the degrees of freedoms for each comparison?
What can you tell about the evolution of your gene from the ML estimates under this best model?</p></li>
Please refer to
PaML/codeml documentation available here