Sunday, February 25, 2007
Study reduces Chimpanzee-Human split to 4 million years ago
Anthropology and Primatology - Excerpts from the February 23, 2007 PLoS Genetics paper "Genomic Relationships and Speciation Times of Human, Chimpanzee, and Gorilla Inferred from a Coalescent Hidden Markov Model" (Adapted) by Asger Hobolth, Ole F. Christensen, Thomas Mailund, and Mikkel H. Schierup:
[Related news story from Scientific American - "Humans, chimps split 4 million years ago: study": A new study, certain to be controversial, maintains that chimpanzees and humans split from a common ancestor just 4 million years ago - a much shorter time than current estimates of 5 million to 7 million years ago.]
Primate evolution is a central topic in biology and much information can be obtained from DNA sequence data. A key parameter is the time "when we became human," i.e., the time in the past when descendents of the human-chimp ancestor split into human and chimpanzee. Other important parameters are the time in the past when descendents of the human-chimp-gorilla ancestor split into descendents of the human-chimp ancestor and the gorilla ancestor, and population sizes of the human-chimp and human-chimp-gorilla ancestors. To estimate speciation times and ancestral population sizes we have developed a new methodology that explicitly utilizes the spatial information in contiguous genome alignments. Furthermore, we have applied this methodology to four long autosomal human-chimp-gorilla-orangutan alignments and estimated a very recent speciation time of human and chimp (around 4 million years) and ancestral population sizes much larger than the present-day human effective population size. We also analyzed X-chromosome sequence data and found that the X chromosome has experienced a different history from that of autosomes, possibly because of selection.
Citation: Hobolth A, Christensen OF, Mailund T, Schierup MH (2007) Genomic Relationships and Speciation Times of Human, Chimpanzee, and Gorilla Inferred from a Coalescent Hidden Markov Model. PLoS Genet 3(2): e7 doi:10.1371/journal.pgen.0030007
The genealogical relationship of human, chimpanzee, and gorilla varies along the genome. We develop a hidden Markov model (HMM) that incorporates this variation and relate the model parameters to population genetics quantities such as speciation times and ancestral population sizes. Our HMM is an analytically tractable approximation to the coalescent process with recombination, and in simulations we see no apparent bias in the HMM estimates. We apply the HMM to four autosomal contiguous human-chimp-gorilla-orangutan alignments comprising a total of 1.9 million base pairs. We find a very recent speciation time of human-chimp (4.1 plus/minus 0.4 million years), and fairly large ancestral effective population sizes (65,000 plus/minus 30,000 for the human-chimp ancestor and 45,000 plus/minus 10,000 for the human-chimp-gorilla ancestor). Furthermore, around 50% of the human genome coalesces with chimpanzee after speciation with gorilla. We also consider 250,000 base pairs of X-chromosome alignments and find an effective population size much smaller than 75% of the autosomal effective population sizes. Finally, we find that the rate of transitions between different genealogies correlates well with the region-wide present-day human recombination rate, but does not correlate with the fine-scale recombination rates and recombination hot spots, suggesting that the latter are evolutionarily transient.
The recent evolutionary history of the human species can be investigated by comparative approaches using the genomes of the great apes: chimpanzee, gorilla, and orangutan. Nucleotide differences, accumulated by fixation of mutations, carry a wealth of information on important issues such as speciation times, properties of ancestral species (e.g., population sizes), and how speciation occurred. Genes or genomic fragments with unusual patterns of nucleotide differences and divergence may have been under strong natural selection during recent evolution of the human species. Sequence analyses can also aid interpretations of the incomplete primate fossil records and aid assignment of dated fossils to evolutionary lineages. For instance, it is still debated whether the Millennium man, Orrorin tugenensis, which has been dated to 6 million years (Myr) ago, and Sahelanthropus tchadensis, which has been dated to 6-7 Myr ago, belong to the human lineage or the human-chimp (HC) lineage.
Comparative analyses of multiple alignments of small fragments of human, chimpanzee, gorilla, and orangutan sequence have revealed that the human genome is more similar to the gorilla genome than to the chimpanzee genome for a considerable fraction of single genes. Such a conflict between species and gene genealogy is expected if the time span between speciation events is small measured in the number of 2N generations, where N is the effective population of the ancestral species (see Figure 1). In that case, N can be estimated from the proportion of divergent genealogies if one assumes that speciation is an instantaneous event. Indeed, this has been done in several studies that find a HC ancestral effective population size NHC of 2-10 times the human present-day effective population size NH = 10,000. Recently, Patterson et al. studied a very large number of small human-chimp-gorilla-orangutan-macaque alignments. They found, in agreement with O'hUigin et al., that a large proportion of sites supporting alternative genealogies are caused by hypermutability and that the fraction of the genome with alternative genealogies therefore has been overestimated in previous studies. After using a statistical correction for substitution rate heterogeneity, Patterson et al. found that the variance in coalescence times is too large to be accounted for by instant speciation and a large ancestral effective population size, and that the speciation process therefore must have been complex. Particularly, the X chromosome shows a deviant pattern, which also led them to conclude that HC gene flow ceased and final speciation occurred as recently as 4 Myr ago. This date is generally believed to be the most recent time compatible with the fossil record, if the Millennium man and Sahelanthropus are not on the human lineage.
Whole genome sequences of gorilla and orangutan will soon supplement the already available whole genome sequences of human and chimpanzee. These four genomes are so closely related that alignments of large contiguous parts of the genomes can be constructed. Analysis of such large fragments is challenging because different parts of the alignment will have different evolutionary histories (and thus different genealogies, see Figure 1) because of recombination. Ideally, one would like to infer the genealogical changes directly from the data and then analyze each type of genealogy separately. A natural approach to this challenge is to move along the alignment, and simultaneously compute the probabilities of different relationships and speciation times. While recombination has been considered in previous likelihood models, the spatial information along the alignment has largely been ignored.
In this paper we describe a hidden Markov model (HMM) that allows the presence of different genealogies along large multiple alignments. The hidden states are different possible genealogies (labeled HC1, HC2, HG, and CG in Figures 1 and 2). Parameters of the HMM include population genetics parameters such as the HC and human-chimp-gorilla (HCG) ancestral effective population sizes, NHC and NHCG, and speciation times tau1 and tau2 (see Figure 1). We therefore name our approach a coalescent HMM (coal-HMM). The statistical framework of HMMs yields parameter estimates with associated standard errors, and posterior probabilities of hidden states. We show by simulation studies that the coal-HMM recovers parameters from the coalescence with recombination process, and we apply the coal-HMM to five long contiguous human-chimp-gorilla-orangutan (HCGO) alignments obtained from the NIH Intramural Sequencing Center comparative sequencing program (Targets 1, 106, 121, and 122 on four different autosomes and Target 46 on the X chromosome). We consistently find very recent estimates of HC speciation times and a large variance in the time to common ancestry along the genome. Similar to Patterson et al., we find that the X chromosome has a smaller effective population size than expected. The mapping of genealogical states further allows us to correlate transitions in genealogies with properties of the genome, and here we focus on fine-scale and region-wide recombination rate estimates.
Recent posts include:
Technorati: plos, genetics, human, chimpanzee, gorilla, markov, scientific american, split, study, chimpanzees, humans, common ancestor, biology, anthropology, primatology, genome, dna, chimp, ancestors, primate, evolution, hot spots