Let them fall where they may: congruence analysis in massive phylogenetically messy data sets

Jessica W Leigh, Klaus Schliep, Philippe Lopez, Eric Bapteste

    Research output: Contribution to journalArticlepeer-review

    Abstract

    Interest in congruence in phylogenetic data has largely focused on issues affecting multicellular organisms, and animals in particular, in which the level of incongruence is expected to be relatively low. In addition, assessment methods developed in the past have been designed for reasonably small numbers of loci and scale poorly for larger data sets. However, there are currently over a thousand complete genome sequences available and of interest to evolutionary biologists, and these sequences are predominantly from microbial organisms, whose molecular evolution is much less frequently tree-like than that of multicellular life forms. As such, the level of incongruence in these data is expected to be high. We present a congruence method that accommodates both very large numbers of genes and high degrees of incongruence. Our method uses clustering algorithms to identify subsets of genes based on similarity of phylogenetic signal. It involves only a single phylogenetic analysis per gene, and therefore, computation time scales nearly linearly with the number of genes in the data set. We show that our method performs very well with sets of sequence alignments simulated under a wide variety of conditions. In addition, we present an analysis of core genes of prokaryotes, often assumed to have been largely vertically inherited, in which we identify two highly incongruent classes of genes. This result is consistent with the complexity hypothesis.

    Original languageEnglish
    Pages (from-to)2773-85
    Number of pages13
    JournalMolecular biology and evolution
    Volume28
    Issue number10
    DOIs
    Publication statusPublished - Oct 2011

    Keywords

    • Algorithms
    • Archaea/genetics
    • Bacteria/genetics
    • Bayes Theorem
    • Cluster Analysis
    • Computational Biology/methods
    • Computer Simulation
    • Databases, Genetic
    • Evolution, Molecular
    • Fungi/genetics
    • Genetic Markers
    • Genetic Variation
    • Phylogeny
    • Sequence Alignment

    Fingerprint

    Dive into the research topics of 'Let them fall where they may: congruence analysis in massive phylogenetically messy data sets'. Together they form a unique fingerprint.

    Cite this