information about the evolutionary history of populations.
|A gene tree for a sample of four individuals.|
One major result of neutral coalescent theory is that the expected lengths of branches are proportional to the population size. This makes sense: in a randomly-mating, neutral population, the probability of any two individuals sharing a parent in the previous generation is simply 1/N, so the expected time until two individuals share a parent is N generations. If a sample contains n individuals, then the expected time to the first event is simply N/(n choose 2). Thus, in a neutral population, there is a specific relationship between the branch lengths that we expect to hold. However, many of the demographic scenarios discussed above lead to a distortion in the relative branch lengths. The simplest example is to imagine a population that is growing in time. In this case, the population size in the distant past is smaller than the population size in the recent past, such that the branch lengths in the distant past are short relative to the branch lengths in the recent past. This implies that there will be an excess of rare mutations (mutations that only appear in a small number of individuals) relative to more common mutations.
|In an expanding population, branch lengths in the distant past are short relative to those in the recent past.|
|In a selected population, branch lengths in the distant past are short relative to those in the recent past.|
My research has focused on better understanding this effect. In the strong purifying selection regime (Ne s >> 1), the main strategy for understanding the effects of selection is through the structured coalescent (originally developed by Hudson and Kaplan). In this case, the population is subdivided into classes based upon the fitnesses of individuals. We then trace the ancestry of individuals backwards-in-time as before, this time allowing individuals to jump between fitness classes. In order for two individuals to coalescence, they must co-exist in the same fitness class, in which case they will coalesce with probability equal to the inverse of the size of the class. This provides a mathematical framework for calculating various statistics. We used this framework in The Structure of Genealogies in the Presence of Purifying Selection: A "Fitness-Class Coalescent" and The Structure of Allelic Diversity in the Presence of Purifying Selection. to better understand the effects of selection on various genealogical statistics
More recently, we have made the additional assumption that we may treat each ancestral lineage independently, which is reasonable provided selection is sufficiently strong (Ne s >> 1). When this is the case, instead of jointly considering the paths of multiple lineages through the population, we can instead simply calculate the probability each independent lineage is in a particular class at a particular time. This drastically simplifies the analytical framework, and allows us to describe a population using a time-dependent effective population size Ne(t), which we calculate in Distortions in Genealogies Due to Purifying Selection. Thus, we find that a population experiencing strong purifying selection is virtually indistinguishable from a population that is evolving with this time-varying population size. This result has a number of significant advantages: It is extremely simple, such that we can incorporate it directly into the neutral coalescent framework to calculate virtually any statistic describing a genealogy. Furthermore, we can incorporate the effects of selection directly into any neutral method of inference or estimation simply by incorporating the appropriate time-dependent population size. However, this result also implies a significant drawback: since a strongly-selected population appears equivalent to a time-varying neutral population, it will be very difficult or even impossible to determine which effect is causing the distortions we see.
In a recent paper, Distortions in Genealogies Due to Purifying Selection and Recombination, we have shown that we can extend this result to also incorporate recombination, provided we make a further assumption that we may treat each site as independent.