Review articles

By Mr. B P Niranjan Reddy
Corresponding Author Mr. B P Niranjan Reddy
School of Sciences in Biotechnology, Jiwaji University, - India
Submitting Author Mr. B.P.Niranjan Reddy

Phylogenetic tree, Model selection, Bootstrapping, Phylogeny free software

Niranjan Reddy BP. Basics for the Construction of Phylogenetic Trees. WebmedCentral BIOLOGY 2011;2(12):WMC002563
doi: 10.9754/journal.wmc.2011.002563
Submitted on: 03 Dec 2011 11:47:15 AM GMT
Published on: 03 Dec 2011 08:06:40 PM GMT


Phylogeny- A Diagram for Evolutionary Network-is used to infer the phylogenetic relationships among the species or genes. The phylogenetic analysis including morphological, biological, and bionomic characters, allozyme, RFLP data have been extensively used to infer the evolutionary relationship among the species during the pre-genomic era. With the advent of high throughput sequencing technologies and the development of extensive statistical analytical tools, an increased amount of sequence information is made available in the public domains. This particular situation has revolutionarized the field of phylogenetics, and has opened up opportunities for drawing and reconstructing the phylogenetic relationships with more confidence and accuracy. Consequently, today, phylogenetics has become an integral part of any sequencing associated research projects. Although, many publications related to the understanding of the phylogenetic tree are available, most of them are either for the experts in the field or for bioinformaticians. It is essentially needed for the beginner to start from a document that includes all the basics together with briefings of the modern developments in phylogenetics. Considering the importance of phylogenetic analysis in modern science, here in this review, an attempt was made to simplify the understanding of the phylogenetic tree construction, availability and usability of the different methods and software tools for inferring the trees.


The field of phylogenetics has become an integral part of any modern biological research. Construction of phylogenetic tree becoming such an easy task that novice can also construct relatively near to perfect phylogenetic tree with little hard work. This is majorly due to free availability of many tree construction, viewing and editing tools that demand very little knowledge regarding the phylogenetic construction procedures (i. e., it is not mandatory to know the basics of the models and algorithm procedures which involves in behind the scenes). Phylogenetic analysis can be performed to infer the evolutionary relationship among the members of the taxa, to understand the evolution of the genomes and gene families, to classify the genes into various classes like orthologs, paralogs, in- or out-paralogs, to understand the evolution of the new functions through duplications, horizontal gene transfers, gene conversion, recombination, and co-evolution etc. (Hafner and Nadler, 1988; Nei, 2003; Pagel, 2000). Phylogenetic analysis provides a powerful tool for comparative genomics (Pagel, 2000). Genome sequencing projects are providing valuable sequence information that is widely used to infer the evolutionary relationship between different species or genes. The species' phylogenies are generally inferred based on the paleontological/geological information or morphological traits (Nei, 2003). These phylogenies act as a reference to assess the veracity of the phylogenetic tree constructed based on any phylogenetic informative marker. With the increased availability of whole genome sequences, the field of phylogenomics (i.e. use of either whole genome or a large number of genes for phylogenetics analysis) is becoming popular among the evolutionary biologists (Fitz Gibbon and House, 1999; Korbel et al., 2002; Snel et al., 1999; Thornton and DeSalle, 2000). Many phylogenomics based reports have been published, and most of them are true reflective of reference species' phylogenies that are inferred from paleontological and/or geological information (Kumar and Filipski, 2001). Furthermore, phylogenomics reconstruction helps in supplementing or correcting the earlier working phylogenetic relationships (Kumar and Filipski, 2001). Phylogenetic trees can be drawn from genes (nucleotide or protein sequences), morphological, biological and bionomic characters, restriction fragment polymorphisms, or whole genome orthologs, or geological records (Horner and Pesole, 2004; Klenk and Göker, 2010; Nikaido et al., 2001; Snel et al., 1999). Although it is very easy to construct the phylogenetic trees using the user-friendly software tools, often it is observed that having basic information regarding the processes that undergo behind the scenes will greatly helps in improving the quality of the phylogenetic tree construction by giving better input values into the programs. Thus, in this review article, our writing centered in basic concepts of construction of phylogenetic analysis using nucleotide or amino acid sequences.

Basics concepts and definition

Phylogenetic tree also known as “evolutionary tree” is the graphical representation of the evolutionary relationship between the taxa/genes in question.  A dendrogram is a broad term for the diagrammatic representation of a phylogenetic tree. Different terminologies are used to describe the characteristics of a phylogenetic tree. The cladogram is a dendogram which explains only genealogy of the taxa but says nothing about the branch lengths or time periods of divergence (Page and Holmes, 1998; Procter et al., 2010). The phylogram (additive tree) is a phylogenetic tree that explicitly represents a number of character changes (nucleotide/amino acid changes/number of character variations) through its branch lengths (Page and Holmes, 1998; Procter et al., 2010). In case of phylogram the evolutionary distance between any two taxa is given by sum of the branch lengths connected them. Though these trees may be rooted or unrooted, often these trees lack a root. A chronogram (ultrametric) is a rooted phylogenetic tree that posses all the characteristics of an additive tree, in addition with the assumption of molecular clock determination of the molecular divergence time between taxa can be possible (Page and Holmes, 1998). The molecular clock hypothesis assumes that every site in a protein or coding nucleotide sequence from all the species evolve at a constant rate (Zuckerkandl and Pauling, 1962). Furthermore, the chronogram consists of taxa placed equidistant from the ancestor which cannot be seen in case of phylogram. Phenetics (taximetrics) infers the relationship between the taxa that usually involves morphology or other observable traits as phylogenetic informative markers (Duncan and Baum, 1981; Mayr, 1965; Page and Holmes, 1998).
A tree that shows the evolution of the genes is known as gene tree (Snel et al., 1999). While, tree that shows the evolution of species is known as species' tree. It is important to note that gene trees are not necessary to follow the species' tree. This is due to different selection constraints that can act on a gene may reflect distinct evolutionary rates from others.
How to read a phylogenetic tree:
1. A monophyletic grouping is one in which all species share a common ancestor, and all species derived from that common ancestor are included. This is the only form of grouping accepted as valid by cladists.
2. A paraphyletic grouping is one in which all species share a common ancestor, but not all species derived from that common ancestor are included.
3. A polyphyletic grouping is one in which species that do not share an immediate common ancestor are lumped together, while excluding other members that would link them.
The phylogenetic trees may be rooted/unrooted (pl. see figure 1 for typical phylogenetic tree with labeling). A rooted tree represents the divergence of a group of related species from their last common ancestor (root) by successive branching events over the time period. In contrary the unrooted phylogenetic tree reveal inter species/taxa relationships excluding the identification of most recent common ancestor or the root. The rooted phylogenies are constructed using unrelated species/genes involving the phylogenetic reconstruction. Very distantly related taxa or relatively related taxa are considered for tree rooting called out-group and in-group, respectively. The terminal nodes in the phylogenetic tree are called as operational taxonomic units (OTU). The branches that do not join any of the terminal/leaves/OTUs (fig. 1) directly but via internal nodes are called “ancestral states” or “hypothetical taxonomic units” that might have appeared during evolution and cannot be seen at present (Page and Holmes, 1998; Pagel, 2000). The internal branch points in a species phylogenetic tree represents the speciation events, while gene families' phylogenetic tree, they mean for duplication events (Pagel, 2000). The internal branches may be bifurcating or multi-furcating. Analysis of the gene families generally forms multi-furcating branches and each of the small multi-furcating branches forms a sub tree or a clade (Kao et al., 1999; Nei et al., 1997; Nei and Rooney, 2005).
The whole process of construction of the phylogenetic tree is divided into five different steps, viz.
Step 1: Choosing an appropriate markers for the phylogenetic analysis
Step 2: Multiple sequence alignments
Step 3: Selection of an evolutionary model
Step 4: Phylogenetic reconstruction
Step 5: Evaluation of the phylogenetic tree
Step 1: Choosing an appropriate markers for the phylogenetic analysis
Any biological information that can be used to infer the evolutionary relationship among the taxa is known as a phylogenetic information marker. It can be anything like DNA, RNA, protein, RFLP, AFLP, ISSR, allozymes, and conserved intronic positions, etc. Identification of conserved genetic loci (coding- or non-coding) is the first step in analyzing the phylogenetic relationship. Both coding (genes) and non-coding genetic region can be used for the analysis of phylogenetic relationships. However, selected sequence(s) must satisfy the defined necessary rules: (a) the sequence should have a long evolutionary history of conservation, as this feature facilitates, firstly in the preservation of long evolution-selection episodes, and secondly, aids in easy amplification of the target sequences from distant taxa (b) conserved, slow evolving genes may be used to resolve the evolutionary relationship between distantly related species while fast evolving genes should be choose for the recently evolved species or intra-species (c) amino acid sequences are more informative while inferring the evolutionary relationship among distantly related taxa, and conversely, nucleotide information for recently evolved/closely related species (d) the sequences need to be employed in the phylogenetic analysis should be tested for their usability in a given lineage (for instance, mitochondrial (cytochrome C oxidase subunit I & II (CoxI & II)),  chloroplast (trnH-psbA, matK, rpoC, rpoB, rbcL), and nuclear (16S ribosomal RNA) conserved genes are preferred to use for analyzing animal, plant, and microbial species, respectively-and are called “barcode genes”) (Chantangsi et al., 2007; Liu and Beckenbach, 1992; Raghavendra et al., 2009; Shneer, 2009) (e) finally, if, objective is to estimate the divergence periods between taxa, the selected gene or protein sequences should essentially follow the molecular clock hypothesis (Barton et al., 2007; Kumar and Filipski, 2001). However, recently relaxed molecular clock models have also been proposed. This step follows successful polymerase chain reaction amplification of the target gene/protein, followed by sequencing and editing of the sequences for further analysis.
Step 2: Multiple sequence alignments
The second step in the phylogenetic construction involves the alignment of edited sequences. Aligning two sequences is known as pair-wise sequence alignment, while the alignment that includes more than two sequences is known as multiple sequence alignments. The pair-wise sequence alignments (MSA) can be classified into global and local. The global pair-wise sequence alignment includes end-to-end alignment of two given sequences irrespective of their sequence sizes, while the local alignment is about finding the best alignment of the short sequence segments locally ( The main aim of multiple sequence alignment is to compare the three or more nucleotide or protein sequences and to provide the basis for calculation of the sequence diversities/divergences to infer the evolutionary relationship among the taxa. Different models (discussed below) have been proposed based on various assumptions to calculate the sequence divergences between the sequences or taxa. Hence, the correct sequence alignment is mandatory in order to get the true phylogeny that is representative of the evolutionary relationship among the taxa (Feng and Doolittle, 1987). Numerous algorithms have been proposed to perform the task of correct sequence alignment (Procter et al., 2010). Some algorithms are heuristic with a compromised accuracy, while other groups include slow but accurate algorithms, or group with both fast and accurate algorithms (Edgar, 2004; Notredame et al., 2000). Some of the algorithms have been proposed which carry the MSAs by combining the results obtained from more than one program, and hence, reasonably accurate multiple sequence alignment can be resulted (Rice et al., 2000). Although, many program both online and offline are available to perform MSA, often manual intervention is warranted to achieve correct MSAs (Zvelebil and Baum, 2008).
Step 3: Selection of an evolutionary model
Selection of an evolutionary model follows the multiple sequence alignment. According to the neutral theory of evolution, most of the mutations are neutral and can occur at the rate of 10-6 to 10-8. Considering this fact every site in a DNA sequence must have undergone numerous substitutions that are proportional to the evolutionary time period. Some sequences may evolve at a faster rate than other, and further, some lineages may undergo faster evolution than others (Lio and Goldman, 1998). Every site in a sequence may evolve differently (Van de Peer and De Wachter, 1997) and may have a differential tendency for mutational tolerance. The nucleotide substitutions can be classified into transitions and transversions, while amino acid substitutions as synonymous and non-synonymous mutations. The transitions have twice as many routes as transversions to occur. Consequently, in nature, the number of transitions always prevails over the transversions. Thus, the rate of transitions to transversions denoted as ‘R’ is absolutely necessary to infer the correct phylogenetic relationships. The R-value may vary from sequence to sequence, and thus it needs to be estimated for every set of sequences separately. The simplest evolutionary models do not consider the R-value in their analysis.
The rate of substitution also varies from a site to site for a given sequence (Van de Peer and De Wachter, 1997). The rates of substitutions are represented by gamma distribution where alpha acts as a measured parameter. This parameter is used to derive a gamma distribution corrected distance, referred to as gamma distance. Thus, inclusion of the gamma parameter will increase the probability of obtaining the correct phylogenetic tree. The actual number of mutations occurred during the evolution to yield the present sequence in question are significantly larger than the actual number of substitutions observed. Hence, evolutionary distance correction is required to obtain near to the actual value through applying best fit models appropriately.
All these facts complicate and make the situation that warrants for evolutionary models that can best calculate the actual rate of substitutions for a given set of sequences. Every phylogenetic reconstruction method considers simple to complex models of evolution in order to obtain the evolutionary relationship at least nearer to the reality. A number of different models have been proposed separately for the nucleotide, codon, and protein sequences with emphasis on assumptions made and parameters used (Lio and Goldman, 1998; Yang, 2007). It is important to note that any single model does not incorporate all the possible information; thus, choice of the best fit model for the sequences under study should be critically made before the analysis. Evolutionary model that best explain the observed sequence data can be inferred using the ModelTest or jModelTest software. It uses three different criterions as a measure to infer best fit model, namely hierarchical Likelihood Ratio Test (hLRT), Akaike Information Content (AIC), or Bayesian Information Content (BIC). For more information on how these estimates are calculated, how the parameter rich models influence these estimates, please refer to Posada (2008).
Technical details of the different available evolutionary models are beyond the scope of this chapter, and the readers are advised for further reading given in Reference section (Barton et al., 2007; Delport et al., 2008; Lio and Goldman, 1998; Yang, 2007; Yang and Nielsen, 2002).
Step 4: Phylogenetic reconstruction
Two different methodologies are employed by the presently available programs to generate the dendograms; (a) clustering methods-where two most closely related taxa are placed under single inter-node and further add third taxa considering within internodes taxa as a single group. In this way, the program progressively adds the other remaining taxa to yield final phylogenetic tree (b) second type of methods generate the 'n' number of trees proportional to the number of taxa  involved in the phylogenetic analysis followed by the selection of best fit tree topology (increased likelihood or probability) for a given evolutionary model. Choosing the correct substitution model is crucial for inferring the most accurate phylogenetic relationship. The list of freely available software for model selection is listed in the popular software section at the end of the chapter (Table 1).
Phylogenetic tree construction methods can be classified into distance methods, minimum evolution, parsimony, probabilistic, and likelihood methods (Table 1). Basically, the distance based methods are simple and the Operational Taxonomic Units (OTUs) clustering is done based on the sequence divergences that are calculated using different evolutionary models. The Unweighted pairwise group of multiple alignments (UPGMA), Neighbor Joining (NJ), Minimum Evolution and Fitch-Margoliash are examples for the distance based methods (Saitou and Imanishi, 1989). These methods produce a single phylogenetic tree with branch lengths using the clustering methods. Further, distance methods can handle a huge number of sequences; for example, to construct the “Tree of Life'. Distance based methods derive the pair-wise distances from MSA. While others will consider MSA directly into consideration and construct the phylogenies, that tries to consider every single site variation into the account to derivate branch lengths.
The distance matrix is derived from measured distances or morphometric analyses. The various pair-wise distance formulae (Jaccard Coefficient) can be applied for morphological characters or genetic distance data that comes from sequences, restriction site polymorphisms, different methods of  marker analysis (for example, micro- or mini-satellites, RAPDs, etc.) or allozyme data. Distance-matrix based methods are generally depending upon the MSA to calculate the pair-wise distances between OTUs. The gaps and missing data can be handled in different ways; a) mismatches (indel/deletions/gaps) can be deleted either pair-wise or completely b) mismatches can be included as mutations in the analysis. The pair-wise distance matrix generated will be used by different phylogenetic reconstruction programs for clustering the taxa. The internal node is placed between two similar taxa. Following which progressive clustering will be done by considering each internal node as single taxa.
The NJ based method follows the minimum evolution. The concept of minimum evolution is based on the least number of mutations that are required to obtain a given tree. The maximum parsimony also follows a minimum evolution principle, but are directly on the alignment and minimize the number of mutations required to get the given tree topology. Parsimony methods can be affected by the long-branch attraction (fast evolving species were inferred as closely related because of highly saturated phylogenetically informative sites), while the likelihood methods are best for drawing correct phylogenies with strong statistical support in such cases (Zvelebil and Baum, 2008).
Among all, the maximum likelihood and Bayesian probability methods are highly sophisticated that depends on likelihood or probability models to infer the evolutionary distances. To date, these two methods are increasingly become popular to construct the phylogenies. However, these methods are computer intensive and limit the large number of sequences that can be used for constructing larger phylogenies. Finally, every method available till to date can produce wrong phylogenetic relationship under certain conditions and thus, every method has their own followers and discouragers (Nei, 2003).
Step 5: evaluating the phylogenetic tree
After successful construction of the phylogenetic tree, the next step involves evaluation of the tree topology. This process can be performed using two evaluation methods, namely bootstrap method and interior-branch test. The basic concept of bootstrap method is evaluation of the tree topology by constructing phylogenetic trees equal to the given number of pseudo-data replicates. Pseudo-replicates are nothing but complete data set with equal number of information sites (columns) by removing one column information site which is replaced with the complete column site from existing data set. In this way the user defined number of data pseudo-replicates is constructed followed by corresponding phylogenetic trees. The number of times each of the claimed node in initial phylogenetic tree which is under evaluation, is repeated in bootstrap phylogenetic trees will be given in percentages at the tree nodes called “bootstrapped values” or “bootstrapped percentages” (Felsenstein, 2004). The tree nodes having >70% bootstrapped values are generally considered as consistent. The computational speed of the bootstrapped testing depends upon the number of sequences, length of the sequences, and finally, the number of pseudo-replicates/bootstrap replicates is requested. This general method of bootstrapping is known as non-parametric bootstrapping. Another variant of the non-parametric bootstrapping is parametric bootstrapping where, the evolutionary model based sequence data sets (pseudo-replicates) are created. This follows the same procedure as non-parametric bootstrapping to evaluate the given phylogenetic tree (Makarenkov et al., 2010). While in case of bootstrap interior branch test, the data sampling is resembles the bootstrapped method, however, here it is used to calculate the branch lengths on the given original phylogenetic tree. In this test confidence of the interior branch length being non-zero is tested and the tree nodes indicated with the confidence of the obtained branch length. This method is considered as an improvement over the existing popular bootstrapped method (Zvelebil and Baum, 2008).


1. Barton, N.H., Briggs, D.E.G., Eisen, J.A., Goldstein, D.B., Patel, N.H. 2007. Evolution (New York, Cold Spring Harbor Laboratory Press).
2. Chantangsi, C., Lynn, D.H., Brandl, M.T., Cole, J.C., Hetrick, N., Ikonomi, P., 2007. Barcoding ciliates: a comprehensive study of 75 isolates of the genus Tetrahymena. Int. J. Syst. Evol. Microbiol. 57, 2412-2423.
3. Delport, W., Scheffler, K., Seoighe, C., 2008. Models of coding sequence evolution. Brief. Bioinform. 10, 97-109.
4. Duncan, T., Baum, B.R., 1981. Numerical phenetics: its uses in botanical systematics. Annu. Rev. Ecol. Syst. 12, 387-404.
5. Edgar, R.C., 2004. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113.
6. Felsenstein, J. 2004. Inferring phylogenies (Massachusetts, Sinauer Associates, Inc.), p. 644.
7. Felsenstein, J., 2005. PHYLIP (phylogeny inference package) version 3.6. Distributed by the author. Department of Genome Sciences, University of Washington, Seattle, 47-55.
8. Feng, D.F., Doolittle, R.F., 1987. Progressive sequence alignment as a prerequisitetto correct phylogenetic trees. J. Mol. Evol. 25, 351-360.
9. Fitz Gibbon, S.T., House, C.H., 1999. Whole genome-based phylogenetic analysis of free-living microorganisms. Nucleic Acids Res. 27, 4218.
10. Guindon, S., Lethiec, F., Duroux, P., Gascuel, O., 2005. PHYML Online--a web server for fast maximum likelihood-based phylogenetic inference. Nucleic Acids Res. 33, W557.
11. Hafner, M.S., Nadler, S.A., 1988. Phylogenetic trees support the coevolution of parasites and their hosts. Nature 332, 258-259.
12. Horner, D.S., Pesole, G., 2004. Phylogenetic analyses: a brief introduction to methods and their application. Expert Rev. Mol. Diagn. 4, 339-350.
13. Huelsenbeck, J.P., Ronquist, F., 2001. MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17, 754-755.
14. Kao, H.T., Porton, B., Hilfiker, S., Stefani, G., Pieribone, V.A., DeSalle, R., Greengard, P., 1999. Molecular evolution of the synapsin gene family. J. Exp. Zool. 285, 360-377.
15. Klenk, H., Göker, M., 2010. En route to a genome-based classification of Archaea and Bacteria? Syst. Appl. Microbiol. 33, 175-182.
16. Korbel, J.O., Snel, B., Huynen, M.A., Bork, P., 2002. SHOT: a web server for the construction of genome phylogenies. Trends Genet. 18, 158-162.
17. Kumar, S., Filipski, A.J. 2001. Molecular phylogeny reconstruction. In Encyclopedia of life sciences (Macmillan Publishers Ltd, Nature Publishing Group).
18. Lio, P., Goldman, N., 1998. Models of molecular evolution and phylogeny. Genome Res. 8, 1233-1244.
19. Liu, H., Beckenbach, A.T., 1992. Evolution of the mitochondrial cytochrome oxidase II gene among 10 orders of insects. Mol. Phylogenet. Evol. 1, 41-52.
20. Makarenkov, V., Boc, A., Xie, J., Peres-Neto, P., Lapointe, F., Legendre, P., 2010. Weighted bootstrapping: a correction method for assessing the robustness of phylogenetic trees. BMC Evol. Biol. 10, 250.
21. Mayr, E., 1965. Numerical phenetics and taxonomic theory. Syst. Biol. 14, 73.
22. Nei, M., 2003. Phylogenetic analysis in molecular evolutionary genetics. Annu. Rev. Genet. 30, 371-403.
23. Nei, M., Gu, X., Sitnikova, T., 1997. Evolution by the birth-and-death process in multigene families of the vertebrate immune system. Proc. Natl. Acad. Sci. U. S. A. 94, 7799-7806.
24. Nei, M., Rooney, A.P., 2005. Concerted and birth and death evolution of multigene families. Annu. Rev. Genet. 39, 121-152.
25. Nikaido, M., Matsuno, F., Hamilton, H., Brownell, R.L., Cao, Y., Ding, W., Zuoyan, Z., Shedlock, A.M., Fordyce, R.E., Hasegawa, M., Okada, N., 2001. Retroposon analysis of major cetacean lineages: the monophyly of toothed whales and the paraphyly of river dolphins. Proc. Natl. Acad. Sci. U. S. A. 98, 7384-7389.
26. Notredame, C., Higgins, D.G., Heringa, J., 2000. T-Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205-217.
27. Page, R.D.M., Holmes, E.C., 1998, Molecular evolution: a phylogenetic approach. Wiley-Blackwell, 417 p.
28. Pagel, M., 2000. Phylogenetic-evolutionary approaches to bioinformatics. Brief. Bioinform. 1, 117.
29. Pond, S.L.K., Frost, S.D.W., Muse, S.V., 2005. HyPhy: hypothesis testing using phylogenies. Bioinformatics 21, 676-679.
30. Posada, D., 2008. jModelTest: Phylogenetic Model Averaging. Mol. Biol. Evol. 25, 1253-1256.
31. Procter, J.B., Thompson, J., Letunic, I., Creevey, C., Jossinet, F., Barton, G.J., 2010. Visualization of multiple alignments, phylogenies and gene family evolution. Nat. Meth. 7, S16-25.
32. Raghavendra, K., Cornel, A.J., Reddy, B.P.N., Collins, F.H., Nanda, N., Chandra, D., Verma, V., Dash, A.P., Subbarao, S.K., 2009. Multiplex PCR assay and phylogenetic analysis of sequences derived from D2 domain of 28S rDNA distinguished members of the Anopheles culicifacies complex into two groups, A/D and B/C/E. Infect. Genet. Evol. 9, 271-277.
33. Rice, P., Longden, I., Bleasby, A., 2000. EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 16, 276--277.
34. Ronquist, F., Huelsenbeck, J.P., 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572.
35. Saitou, N., Imanishi, T., 1989. Relative efficiencies of the Fitch-Margoliash, maximum-parsimony, maximum-likelihood, minimum-evolution, and neighbor-joining methods of phylogenetic tree construction in obtaining the correct tree. Mol. Biol. Evol. 6, 51.
36. Schmidt, H.A., Strimmer, K., Vingron, M., Von Haeseler, A., 2002. TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18, 502.
37. Shneer, V.S., 2009. DNA barcoding is a new approach in comparative genomics of plants. Genetika 45, 1436-1448.
38. Simon, D.L., Larget, B., 1998. Bayesian analysis in molecular biology and evolution (BAMBE). Department of Mathematics and Computer Science, Dequesne University, Pittsburgh.
39. Snel, B., Bork, P., Huynen, M.A., 1999. Genome phylogeny based on gene content. Nat. Genet. 21, 108-110.
40. Tamura, K., Dudley, J., Nei, M., Kumar, S., 2007. MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol. Biol. Evol. 24, 1596-1599.
Thornton, J.W., DeSalle, R., 2000. Gene family evolution and homology: genomics meets phylogenetics. Annu. Rev. Genomics Hum. Genet. 1, 41-73.
41. Van de Peer, Y., De Wachter, R., 1997. Construction of evolutionary distance trees with TREECON for Windows: accounting for variation in nucleotide substitution rate among sites. Comput. Appl. Biosci. 13, 227-230.
42. Yang, Z., 2007. PAML 4: Phylogenetic Analysis by Maximum Likelihood. Mol. Biol. Evol. 24, 1586-1591.
43. Yang, Z., Nielsen, R., 2002. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol. Biol. Evol. 19, 908.
44. Zuckerkandl, E., Pauling, L.B. 1962. Molecular disease, evolution, and genetic heterogeneity. In Horizons in Biochemistry, Kasha, M., Pullman, B., eds. (New York, Academic Press), pp. 189-225.
45. Zvelebil, M., Baum, J.O. 2008. Understanding bioinformatics, Holdsworth, D., ed. (Garland Science, Taylor & Francis Group, LLC, an informa business).

Source(s) of Funding


Competing Interests



This article has been downloaded from WebmedCentral. With our unique author driven post publication peer review, contents posted on this web portal do not undergo any prepublication peer or editorial review. It is completely the responsibility of the authors to ensure not only scientific and ethical standards of the manuscript but also its grammatical accuracy. Authors must ensure that they obtain all the necessary permissions before submitting any information that requires obtaining a consent or approval from a third party. Authors should also ensure not to submit any information which they do not have the copyright of or of which they have transferred the copyrights to a third party.
Contents on WebmedCentral are purely for biomedical researchers and scientists. They are not meant to cater to the needs of an individual patient. The web portal or any content(s) therein is neither designed to support, nor replace, the relationship that exists between a patient/site visitor and his/her physician. Your use of the WebmedCentral site and its contents is entirely at your own risk. We do not take any responsibility for any harm that you may suffer or inflict on a third person by following the contents of this website.

1 review posted so far

0 comments posted so far

Please use this functionality to flag objectionable, inappropriate, inaccurate, and offensive content to WebmedCentral Team and the authors.


Author Comments
0 comments posted so far


What is article Popularity?

Article popularity is calculated by considering the scores: age of the article
Popularity = (P - 1) / (T + 2)^1.5
P : points is the sum of individual scores, which includes article Views, Downloads, Reviews, Comments and their weightage

Scores   Weightage
Views Points X 1
Download Points X 2
Comment Points X 5
Review Points X 10
Points= sum(Views Points + Download Points + Comment Points + Review Points)
T : time since submission in hours.
P is subtracted by 1 to negate submitter's vote.
Age factor is (time since submission in hours plus two) to the power of 1.5.factor.

How Article Quality Works?

For each article Authors/Readers, Reviewers and WMC Editors can review/rate the articles. These ratings are used to determine Feedback Scores.

In most cases, article receive ratings in the range of 0 to 10. We calculate average of all the ratings and consider it as article quality.

Quality=Average(Authors/Readers Ratings + Reviewers Ratings + WMC Editor Ratings)