Phylip : dnadist - Compute distance matrix from nucleotide sequences (Felsenstein)



your e-mail

( = required, = conditionally required)



Alignment File : please enter either :
  1. the name of a file:
  2. or the actual data here:

(sequence format)



dnadist options

Categories options

Weight options

Bootstrap options

Output options


dnadist options

Distance (D)

Transition/transversion ratio (T)

Gamma distributed rates across sites (G) ? [default] No Yes Gamma+Invariant

Coefficient of variation of substitution rate among sites (must be positive) (if Gamma)

Fraction of invariant sites (if Gamma)

Use empirical base frequencies (F)

Base frequencies for A, C, G, T/U (if not empirical) (separated by commas)



[Return to the main part with your favorite browser's Back function]


Categories options

One category of substitution rates (C)

Number of categories (1 to 9)

Rate for each category (separated by commas)



[Return to the main part with your favorite browser's Back function]


Weight options

Use weights for sites (W)



Weights file : please enter
either :
  1. the name of a file:
  2. or the actual data here:





[Return to the main part with your favorite browser's Back function]


Bootstrap options

Perform a bootstrap before analysis

Resampling methods

Random number seed (must be odd)

How many replicates



[Return to the main part with your favorite browser's Back function]


Output options

Lower-triangular distance matrix (L)

Print out the data at start of run (1)



[Return to the main part with your favorite browser's Back function]


your e-mail


Some explanations about the options



Main parameters
enter either the name of a file or the actual data
if you are using Netscape 2.x or later, you can select a file by typing its name, or better, by selecting it with the Netscape file browser (Browse button)
OR you can type your data in the next area, or cut and paste it from another application.
(but not both)


Categories options
The alignment file MUST contain a C on the first line and the description of the categories of each site on the following line. Here is a toy example of a file of 5 species with 12 sites, and 2 different categories (first 2 lines):
5 12 C
CATEGORIES 111111222222


dnadist options
Coefficient of variation of substitution rate among sites (must be positive) (if Gamma)
In gamma distribution parameters, this is 1/(square root of alpha)




Bootstrap options
Perform a bootstrap before analysis
By selecting this option, the bootstrap will be performed on your sequence file. So you don't need to perform a separated seqboot before.
Don't give an already bootstrapped file to the program, this won't work!
Resampling methods
1. The bootstrap. Bootstrapping was invented by Bradley Efron in 1979, and its use in phylogeny estimation was introduced by me (Felsenstein, 1985b). It involves creating a new data set by sampling N characters randomly with replacement, so that the resulting data set has the same size as the original, but some characters have been left out and others are duplicated. The random variation of the results from analyzing these bootstrapped data sets can be shown statistically to be typical of the variation that you would get from collecting new data sets. The method assumes that the characters evolve independently, an assumption that may not be realistic for many kinds of data.
2. Delete-half-jackknifing. This alternative to the bootstrap involves sampling a random half of the characters, and including them in the data but dropping the others. The resulting data sets are half the size of the original, and no characters are duplicated. The random variation from doing this should be very similar to that obtained from the bootstrap. The method is advocated by Wu (1986).
3. Permuting species within characters. This method of resampling (well, OK, it may not be best to call it resampling) was introduced by Archie (1989) and Faith (1990; see also Faith and Cranston, 1991). It involves permuting the columns of the data matrix separately. This produces data matrices that have the same number and kinds of characters but no taxonomic structure. It is used for different purposes than the bootstrap, as it tests not the variation around an estimated tree but the hypothesis that there is no taxonomic structure in the data: if a statistic such as number of steps is significantly smaller in the actual data than it is in replicates that are permuted, then we can argue that there is some taxonomic structure in the data (though perhaps it might be just a pair of sibling species).
Sequence format
The sequence will be automatically converted in the format needed for the program
providing you enter a sequence either:
in plain (raw) sequence format or in one of the following known formats:
IG,GenBank,NBRF,EMBL,GCG,DNAStrider,Fitch,fasta,Phylip,PIR,MSF,ASN,PAUP,CLUSTALW
You may enter in the text area a database entry code, or an accession number, in this form:

database:entry_name

or:

database:accession.

References:

Felsenstein, J. 1993. PHYLIP (Phylogeny Inference Package) version 3.5c. Distributed by the author. Department of Genetics, University of Washington, Seattle.

Felsenstein, J. 1989. PHYLIP -- Phylogeny Inference Package (Version 3.2). Cladistics 5: 164-166.

Pise form generator version: 5.a (20 Feb 2009 15:48)