Software Tools :: Format Conversions :: Sequences & Other Files

Many different formats have been devised for storing sequence data and associated annotation data. There is no universal format because some formats are better suited for certain uses than other formats. For example, there are formats designed for data exchange, formats that facilitate rapid searches of sequence data, and formats for storing multiple sequence alignments and their associated annotation data. Some sequence analysis programs can read many different formats, but others read only one or a few particular formats; therefore conversion programs are needed to convert sequence files into a format that the program can read.

Commonly used formats are:

  • FASTA (sequence data with minimal annotation)
  • GenBank flat file (sequence data plus full annotation)
  • GCG (can be used to "wrap" some other formats)
  • MSF (multiple sequence format for aligned sequences)
  • PHYLIP (used by the PHYLIP phylogenetics programs)

Standard sequence formats for certain application areas include:

  • NEXUS, supported by many phylogenetics packages and containing sequence data plus program commands [Maddison, et al. (1997), "NEXUS: An extensible file format for systematic information," Syst. Biol. 46, 590-621]
  • Stockholm, recently adopted as the standard format of the Pfam Consortium, providing extensible markup and annotation capabilities for multiple sequence alignments
  • ASN.1, a data exchange format

The PISE web interface can read sequences in many different formats and automatically converts them into the proper format needed by the underlying analysis program. The tools on this page are for your own needs or for occasions when the automatic conversion fails. Some are strictly format converters; others have additional functionality for formatting and manipulating sequences. Some are used for creating databases in a particular format.

Back to Formats

This website will look much better in a browser that supports web standards, but it has been designed so that it is still usable and accessible to any browser or web-enabled device.