Software Tools :: Format Conversions :: Sequences & Other Files
Many different formats have been devised for storing sequence data and
associated annotation data. There is no universal format because some
formats are better suited for certain uses than other formats. For
example, there are formats designed for data exchange, formats that
facilitate rapid searches of sequence data, and formats for storing
multiple sequence alignments and their associated annotation
data. Some sequence analysis programs can read many different formats,
but others read only one or a few particular formats; therefore
conversion programs are needed to convert sequence files into a format
that the program can read.
Commonly used formats are:
- FASTA (sequence data with minimal annotation)
- GenBank flat file (sequence data plus full annotation)
- GCG (can be used to "wrap" some other formats)
- MSF (multiple sequence format for aligned sequences)
- PHYLIP (used by the PHYLIP phylogenetics programs)
Standard sequence formats for certain application areas include:
- NEXUS, supported by many phylogenetics packages and containing sequence data plus program commands [Maddison, et al. (1997), "NEXUS: An extensible file format for systematic information," Syst. Biol. 46, 590-621]
- Stockholm, recently adopted as the standard format of the Pfam Consortium, providing extensible markup and annotation capabilities for multiple sequence alignments
- ASN.1, a data exchange format
The PISE web interface can read sequences in many different formats
and automatically converts them into the proper format needed by the
underlying analysis program. The tools on this page are for your own
needs or for occasions when the automatic conversion fails. Some are
strictly format converters; others have additional functionality for
formatting and manipulating sequences. Some are used for creating
databases in a particular format.
Back to
Formats
This website will look much better in a browser that supports
web standards, but it has been designed so
that it is still usable and accessible to any browser or web-enabled device.
|