Phylogenetic trees


1) Before calculating a tree, you must have an ALIGNMENT in memory. This can be input in any format or you should have just carried out a full multiple alignment and the alignment is still in memory. Remember YOU MUST ALIGN THE SEQUENCES FIRST!!!!

The method used is the NJ (Neighbour Joining) method of Saitou and Nei. First you calculate distances (percent divergence) between all pairs of sequence from a multiple alignment; second you apply the NJ method to the distance matrix.

2) EXCLUDE POSITIONS WITH GAPS? With this option, any alignment positions where ANY of the sequences have a gap will be ignored. This means that 'like' will be compared to 'like' in all distances. It also, automatically throws away the most ambiguous parts of the alignment, which are concentrated around gaps (usually). The disadvantage is that you may throw away much of the data if there are many gaps.

3) CORRECT FOR MULTIPLE SUBSTITUTIONS? For small divergence (say <10%) this option makes no difference. For greater divergence, this option corrects for the fact that observed distances underestimate actual evolutionary dist- ances. This is because, as sequences diverge, more than one substitution will happen at many sites. However, you only see one difference when you look at the present day sequences. Therefore, this option has the effect of stretching branch lengths in trees (especially long branches). The corrections used here (for DNA or proteins) are both due to Motoo Kimura. See the documentation for details.

For VERY divergent sequences, the distances cannot be reliably corrected. You will be warned if this happens. Even if none of the distances in a data set exceed the reliable threshold, if you bootstrap the data, some of the bootstrap distances may randomly exceed the safe limit.

4) To calculate a tree, use option 4 (DRAW TREE NOW). This gives an UNROOTED tree and all branch lengths. The root of the tree can only be inferred by using an outgroup (a sequence that you are certain branches at the outside of the tree .... certain on biological grounds) OR if you assume a degree of constancy in the 'molecular clock', you can place the root in the 'middle' of the tree (roughly equidistant from all tips).

5) BOOTSTRAPPING is a method for deriving confidence values for the groupings in a tree (first adapted for trees by Joe Felsenstein). It involves making N random samples of sites from the alignment (N should be LARGE, e.g. 500 - 1000); drawing N trees (1 from each sample) and counting how many times each grouping from the original tree occurs in the sample trees. You must supply a seed number for the random number generator. Different runs with the same seed will give the same answer. See the documentation for details.

6) OUTPUT FORMATS: three different formats are allowed. None of these displays the tree visually. You must make the tree yourself (on paper) using the results OR get the PHYLIP package and use the tree drawing facilities there. (Get the PHYLIP package anyway if you are interested in trees).