To the Editor:
The statistical selection of best-fit models of nucleotide substitution is routine in the phylogenetic analysis of DNA sequence alignments1. With the advent of next-generation sequencing technologies, most researchers are moving from phylogenetics to phylogenomics, in which large sequence alignments typically include hundreds or thousands of loci. Phylogenetic resources therefore need to be adapted to a high-performance computing paradigm so as to allow demanding analyses at the genomic level. Here we introduce jModelTest 2, a program for nucleotide-substitution model selection that incorporates more models, new heuristics, efficient technical optimizations and parallel computing.
jModelTest 2 includes important features not present in the previous versions2,3 (Supplementary Table 1). We expanded the set of candidate models from 88 to 1,624, and we implemented two heuristics for model selection: a greedy, hill-climbing hierarchical clustering approach (Supplementary Note 1) and a filtering algorithm based on similarity among parameter estimates (Supplementary Note 2). jModelTest 2 is written in Java, and it can run on Windows, Macintosh and Linux platforms. Source code and binaries are freely available from https://code.google.com/p/jmodeltest2/. The package includes detailed documentation and examples, and a discussion group is available at https://groups.google.com/forum/#!forum/jmodeltest/.
We evaluated the accuracy of jModelTest 2 using 10,000 data sets simulated under a large variety of conditions (Supplementary Note 3). Using the Bayesian information criterion4 for model selection, jModelTest 2 identified the generating model 89% of the time (Supplementary Table 2); in the remaining cases, jModelTest 2 selected a model similar to the generating one. Accordingly, model-averaged estimates of model parameters were highly precise (Supplementary Table 3). In these simulations, the two selection heuristics that we developed were accurate and efficient. Using the hierarchical clustering heuristic, we found the same best-fit model as the full search 95% of the time. With the similarity filtering approach, we reduced the number of models evaluated by 60% on average and found the global best-fit model 99% of the time (Fig. 1 and Supplementary Note 2).
The threshold of the filtering heuristic (Supplementary Note 2) is directly correlated with the probability of finding the true best-fit model (heuristic accuracy) and inversely related to the number of models for which we avoided the likelihood calculation (computational savings). AIC, Akaike information criterion5; BIC, Bayesian information criterion.
jModelTest 2 can be executed in high-performance computing environments as (i) a desktop version with a user-friendly interface for multicore processors, (ii) a cluster version that distributes the computational load among nodes, and (iii) as a hybrid version that can take advantage of a cluster of multicore nodes. An experimental study with real and simulated data sets showed remarkable computational speedups compared to previous versions (Supplementary Note 4). For example, the hybrid approach executed on the Amazon EC2 cloud with 256 processes was 182–211 times faster. For relatively large alignments (138 sequences and 10,693 sites), this could be equivalent to a reduction of the running time from nearly 8 days to around 1 hour.
References
Sullivan, J. & Joyce, P. Annu. Rev. Ecol. Evol. Syst. 36, 445–466 (2005).
Posada, D. & Crandall, K.A. Bioinformatics 14, 817–818 (1998).
Posada, D. Mol. Biol. Evol. 25, 1253–1256 (2008).
Schwarz, G. Ann. Stat. 6, 461–464 (1978).
Akaike, H. IEEE Trans. Automat. Contr. 19, 716–723 (1974).
Acknowledgements
This work was financially supported by the European Research Council (ERC-2007-Stg 203161-PHYGENOM to D.P.), Spanish Ministry of Science and Education (BFU2009-08611 to D.P.) and Xunta de Galicia (Galician Thematic Networks RGB 2010/90 to D.P. and GHPC2 2010/53 to R.D.).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Tables 1–3 and Supplementary Notes 1–4 (PDF 667 kb)
Rights and permissions
About this article
Cite this article
Darriba, D., Taboada, G., Doallo, R. et al. jModelTest 2: more models, new heuristics and parallel computing. Nat Methods 9, 772 (2012). https://doi.org/10.1038/nmeth.2109
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth.2109
This article is cited by
-
A taxonomic revision of the genus Angelica (Apiaceae) in Taiwan with a new species A. aliensis
Botanical Studies (2024)
-
Phylogeny, character evolution and historical biogeography of Scurrulinae (Loranthaceae): new insights into the circumscription of the genus Taxillus
BMC Plant Biology (2024)
-
Ancient reindeer mitogenomes reveal island-hopping colonisation of the Arctic archipelagos
Scientific Reports (2024)
-
Contrasting nidification behaviors facilitate diversification and colonization of the Music frogs under a changing paleoclimate
Communications Biology (2024)
-
Genomic epidemiology reveals geographical clustering of multidrug-resistant Escherichia coli ST131 associated with bacteraemia in Wales
Nature Communications (2024)