Slides day 1
Exercise 1 - Parameter estimation
Exercise 2 - Tree topologies
Exercise 3 - Model comparison
Exercise 4 - Branch support
Exercise 5 - Command line
Exercise 6 - Inferring ML phylogenies with codon models
Exercise 7 - Inferring ML phylogenies using real datasets
Exercise 8 - Re-Analyze published datasets
Exercise 3 - Site models
We will use codeml program from PAML by Ziheng Yang. Use the command line mode for the tasks below. First, you need to understand which control file options to use. Next, try to reproduce the same analyses with
You will need a dataset of homologous protein-coding DNA sequences (starting with the 1st codon position and ending with the 3rd). We will use data from published articles and will regenerate published results:
Site-models: Yang, Z., R. Nielsen, N. Goldman, A.-M. K. Pedersen. 2000. Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155: 431-449.
Data 1: bglobin.nuc Tree 1: bglobin.tree
Data 2: HIVenvSweden.nuc Tree 2: HIVenvSweden.trees
Data 3: adh.nuc Tree 3: adh.trees
Choose a dataset from publication 1 and fit the following site models to your data:
M1, M2, M3, M7, M8a, and M8 (always estimate branch lengths by ML). Note 1:Model M0 was already fitted in exercise 1 (make sure you have the output file).
Note 2: M8a is model M8 with ω for the discrete category fixed to 1. Which models are nested?
Perform likelihood ratio tests (LRTs) of nested hypotheses. How many degrees of freedom do you use each time to test for significance of the LRT statistic? Do your tests suggest positive selection?
Interpret the ML estimates relevant to selective pressure.
If LRTs suggest positive selection, which sites are inferred by the Bayesian approach to be under positive selection (models M2 and M8)?
Do NEB and BEB agree on the sites inferred?
Compare results from the LRT comparing M7 vs M8 and the LRT comparing M8a vs M8. Are they both significant (or both non-significant)? If they are both significant, does the Bayesian approach predict the same sites?
Please refer to
PaML/codeml documentation available here