Approaches to the evolution of intron positions have become increasingly sophisticated since the early comparisons of GenBank data (1). Yet the prevalence with which new intron positions arise in evolution continues to be debated (2–5). At the root of the controversy are differences in methodological postulates, phylogenetic sampling scopes, and criteria for deciding intron positions.
Ancestral intron positions are inferred from a matrix of intron presence/absence built by projecting present positions onto automated multiple sequence alignments of genome scale sets of orthologous proteins. Rogozin et al. (6) compiled 684 clusters of orthologous genes (KOGs) from eight model eukaryotes, including one vertebrate (human), two arthropods (Drosophila melanogaster and Anopheles gambiae), one nematode (Caenorhabditis elegans), two fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe), one plant (Arabidopsis thaliana), and one protist (Plasmodium falciparum). The resulting 16,577 unique intron positions were condensed into 7,236 (≈43%) by retaining only those located within well conserved tracts of alignment. The full and conserved matrices were analyzed by Dollo parsimony (6). The conserved matrix was subsequently reanalyzed by other authors. Roy and Gilbert (7) devised a local maximum-likelihood (ML) approach that corrects for the known bias of Dollo parsimony toward the overestimation of intron gain at peripheral branches, owing to a failure to detect intron losses that are not directly observed. However, when the number of target sites (i.e., observed plus unobserved intron positions) is taken into account explicitly in ML simultaneous comparison of all species (8–10), the numbers of ancestral intron positions are fewer than those obtained previously (7). The reason could be that the method of ref. 7 does not allow for homoplastic gains (i.e., introns arising more than once at the same homologous position) (8, 9, 11), but it also could be that homoplastic gains are overestimated by ML methods (e.g., due to sparseness of phylogenetic sampling). Homoplastic gains seem to have been extremely overestimated by Qiu et al. (12), who claim that the vast majority of intron positions are new apparently because, in their Bayesian analysis of 10 gene families, the number of target sites is bounded to be equal to the number of observed intron positions (8, 9).