Phylogenetic Inference

What is a phylogenetic tree?

This section provides a brief description of phylogenetic trees, as they are conceptualized in Phylogenetic Investigator. A phylogenetic tree is a diagram (Fig. 1) with time on the Y axis and evolutionary change (in PI this is assumed to be morphological change) on the X axis that illustrates a hypothesis of evolutionary relationships and the sequence of evolutionary events that gave rise to some group of taxa of interest (termed 'the ingroup'). In PI, phylogenetic trees are constructed of three kinds of pieces: nodes, links, and transitions.

Nodes represent taxa, for example species. Designations for nodes can have the prefix R, F, or P. Nodes that correspond to the observed taxa that are being studied, are numbered and have a letter prefix that is either R for Recent or F for Fossil. The ingroup in Figure 1 consists of R80, R86, R84 and R82. F98 is a fossil taxon from which the ingroup is believed to have descended. During tree construction, common ancestors of taxa are postulated to have existed in order to explain the data. Each of these nodes has a letter (e.g. A, B, C, etc.) with the prefix P (for Postulated). Links connect nodes and represent hypothesized ancestor/descendant relationships between taxa. The slope of a link indicates the rate of morphological change: vertical lines indicate no change over time and the more a line tends to the horizontal, the more rapidly change is perceived as having taken place.

Transitions appear on links and represent the point at which evolutionary changes are believed to have occurred. Each transition represents some feature (character) of the taxa which has been numbered and described as having two conditions (states). One state is considered ancestral and is coded with a "0". The evolutionarily novel (or derived) state is coded with a "1". A transition shows the point where a character changes from "0" to "1" or from "1" to "0". Coded characters and states are organized by taxa in an associated data matrix.

Phylogenetic inference has three parts: Assigning characters and states, assigning polarity, and constructing a tree. We're going to look at what a phylogenetic tree is first, and then look at the

Phylogenetic Inference

Phylogenetic inference can be divided into 3 phases: identification of characters and states, assignment of polarity, and phylogenetic tree construction.

Determination of Characters and States

Any set of non-identical taxa can be divided by separating those that possess any feature "A", and those that do not. Any such feature can be used as a character for phylogenetic inference. For example, some plants contain enzyme A and some do not. "Enzyme A" would be the character and "present" and "absent" would be the two states of the character.

Some features do not seem to have just two states. For example, if we collected some evergreen branches, we might see that some have bundles of needles containing 1, 2, or 5 needles. This kind of multistate feature can be coded as a series of two binary characters in two different ways based on what is believed about the evolutionary sequence of events. If it is believed that 1 is ancestral to 2 and 2 is ancestral to 5 (1 -> 2 -> 5), then the first binary character will derived for those taxa with either 2 or 5 and the second binary character will be derived only for those with 5. If 1 is considered ancestral for both 2 and 5 (2 <- 1 -> 5) or if the sequence is unknown, then the first binary character will be derived only for those taxa with 2 and the second only for those with 5.

Assignment of Polarity

The assignment of character states as ancestral and derived, termed "polarity," is perhaps the most crucial step of phylogenetic inference. Phylogenetic methods require groupings based only on derived characters. Therefore, it is critical to be able to recognize them when they occur. Characters that have phylogenetic information will only contribute to the finished hypothesis if they are correctly polarized.

There are several methods for determining the polarity of characters. Three of the most important methods are outgroup, paleontological, and ingroup (Stuessy and Crisi, 1984). Each method has its strengths and weaknesses. Each can explain certain types of data and each has methods for explaining conflicting data. For all of the methods, conflicting data will be explained as homoplasy (convergent evolution) during tree construction.

Tree construction

Using parsimony, phylogenetic tree construction is a search among possible arrangements of relationships among taxa and characters that result in the fewest possible transitions of character states. For any data set, there are a finite number of possible arrangements of taxa and characters. For data sets with very few taxa, it is possible to construct all possible trees and see which require the fewest number of steps (transitions). The number of possible trees grows exponentially with the addition of taxa, however, and this method quickly becomes impractical to perform by hand. There are, however, strategies and heuristics which can allow the problem-solver to greatly limit the number of possibilities which must be considered. In most problems, only a few trees are actually supported by any of the data.

Each character in the data set, defines a group of taxa potentially descended from a postulated ancestor, and therefore can be seen as direct support for the existence of a postulated common ancestor or node. The real set of possible trees, then consists only of those trees which could be constructed from the available nodes.

Characters are inclusive/exclusive when they define identical, nested, or exclusive groups. For example, assume that character 1 defines a group of {R81, R82, and R83}. If another defines the same set of taxa, the characters are identical characters. If another character defines a subset or a superset of characters (e.g. {R81 and R82} or {R81, R82, R83, and R84}), the characters are nested with respect to each other. If another character defines completely different set of taxa (e.g. {R85 and R86}) the characters are exclusive with respect to one another. Characters conflict when they overlap incompletely. For example, assume that character 1 defines a group of {R81, R82, and R83} and character 4 defines a group of {R82, R83 and R84}. These two groups are contradictory because each character claims some, but not all of the taxa of the other. Character compatibility groups can be formed that place some or all of the characters into a hierarchical arrangement to evaluate how many of the characters will support a particular hypothesis (arrangement of the taxa) and how many extra steps will be needed to account for incompatible characters.

Ideally, all of the characters will agree in defining a single tree. In practice, some characters will define contradictory groups (groups that overlap incompletely). The largest possible group of inclusive/exclusive characters can serve as a working hypothesis from which to construct a phylogenetic tree. This tree can then be optimized for parsimony if so desired.

A phylogenetic tree is a branching path from a single point at which all of the character states are ancestral to several points where they are the same as the taxa in the ingroup. The lowest node, the node at the bottom of the tree, will be entirely ancestral, The postulated node above that will be linked to the lower node and will have a transition or transitions. Its states, then, are partially ancestral and partially derived. If it has the same states as any of the ingroup, they can be directly linked. The next postulated node has more derived states and may be linked to more recent taxa, until all of the taxa have been accounted for.

Constructing the phylogenetic tree involves adding postulated ancestors for each of the unique inclusive/exclusive characters, linking the ancestors together and to the taxa in the ingroup, adding the transitions for the characters which support the structure, and then distributing the homoplasious (conflicting) characters either as parallel gains or gains with subsequent reversals. (I suggest initially adding homoplasious characters as parallel gains, wherever possible. This makes it easy to spot duplicated characters each of which should be considered in order to evaluate alternate topologies and character optimizations.)

Once a tree has been constructed, it can be assessed and, if necessary, revised to ensure that it is a minimum length (most parsimonious) tree. Tree assessment should begin by examining each homoplasious character, beginning with the one that requires the most transitions, and considering (1) how many steps could be saved by "fixing" the character (rearranging the tree so that this character would have a single transition) and (2) how many more steps would be required in each other character that would be affected by those changes. If an arrangement is found that results in fewer steps, the tree should be restructured and then assessed again from the beginning. If an arrangement is discovered that results in an equal number of steps, assessment should continue until it is confirmed that no better tree is possible, and then all equally parsimoniously trees should be reported. The most difficult part of phylogenetic inference is assuring that all most parsimonious trees have been discovered. Rigorous assessment and systematic consideration of each homoplasious character provides the best probability of success.

For each most parsimonious tree, there should also be consideration of alternate character optimizations. Each homoplasious character should be considered for how it could be distributed on each most parsimonious tree. One of the most important aspects of the interpretation of phylogenetic trees involves describing alternate hypotheses that could explain the data set and suggesting subsequent investigation that could provide insight into these uncertainties.


Revised 11/7/96 Brewer