Phylip : protdist - Program to compute distance matrix from protein sequences (Felsenstein)
Some explanations about the options
Main parameters
- enter either the name of a file or the actual data
- if you are using Netscape 2.x or later, you can select a file by typing its name, or better, by selecting it with the Netscape file browser (Browse button)
- OR you can type your data in the next area, or cut and paste it from another application.
- (but not both)
-
Categories model options
- Categorization of amino acids (A)
- All have groups: (Glu Gln Asp Asn), (Lys Arg His), (Phe Tyr Trp) plus:
- George/Hunt/Barker: (Cys), (Met Val Leu Ileu), (Gly Ala Ser Thr Pro)
- Chemical: (Cys Met), (Val Leu Ileu Gly Ala Ser Thr), (Pro)
- Hall: (Cys), (Met Val Leu Ileu), (Gly Ala Ser Thr), (Pro)
-
-
Bootstrap options
- Perform a bootstrap before analysis
- By selecting this option, the bootstrap will be performed on your sequence file. So you don't need to perform a separated seqboot before.
- Don't give an already bootstrapped file to the program, this won't work!
- Resampling methods
- 1. The bootstrap. Bootstrapping was invented by Bradley Efron in 1979, and its use in phylogeny estimation was introduced by me (Felsenstein, 1985b). It involves creating a new data set by sampling N characters randomly with replacement, so that the resulting data set has the same size as the original, but some characters have been left out and others are duplicated. The random variation of the results from analyzing these bootstrapped data sets can be shown statistically to be typical of the variation that you would get from collecting new data sets. The method assumes that the characters evolve independently, an assumption that may not be realistic for many kinds of data.
- 2. Delete-half-jackknifing. This alternative to the bootstrap involves sampling a random half of the characters, and including them in the data but dropping the others. The resulting data sets are half the size of the original, and no characters are duplicated. The random variation from doing this should be very similar to that obtained from the bootstrap. The method is advocated by Wu (1986).
- 3. Permuting species within characters. This method of resampling (well, OK, it may not be best to call it resampling) was introduced by Archie (1989) and Faith (1990; see also Faith and Cranston, 1991). It involves permuting the columns of the data matrix separately. This produces data matrices that have the same number and kinds of characters but no taxonomic structure. It is used for different purposes than the bootstrap, as it tests not the variation around an estimated tree but the hypothesis that there is no taxonomic structure in the data: if a statistic such as number of steps is significantly smaller in the actual data than it is in replicates that are permuted, then we can argue that there is some taxonomic structure in the data (though perhaps it might be just a pair of sibling species).
- Sequence format
- The sequence will be automatically converted in the format needed for the program
- providing you enter a sequence either:
- in plain (raw) sequence format or in one of the following known formats:
- IG,GenBank,NBRF,EMBL,GCG,DNAStrider,Fitch,fasta,Phylip,PIR,MSF,ASN,PAUP,CLUSTALW
- You may enter in the text area a database entry code, or an accession number, in this form:
database:entry_name
or:
database:accession.
References:
Felsenstein, J. 1993. PHYLIP (Phylogeny Inference Package) version 3.5c. Distributed by the author. Department of Genetics, University of Washington, Seattle.
Felsenstein, J. 1989. PHYLIP -- Phylogeny Inference Package (Version 3.2). Cladistics 5: 164-166.
Pise form generator version: 5.a (19 Oct 2006 12:42)