Software Tools :: DNA Sequence Assembly

Sequence assembly (also known as fragment assembly) is the process of joining overlapping short pieces of DNA to form aligned contiguous sections (contigs). Usually, raw sequence data comes from the sequencer in the form of graphical trace files. These can be viewed and converted into textual sequence files either by the assembly software package itself or by a stand-alone utility program. The sequence fragments should be processed to remove any contaminating vector sequences that can interfere with the assembly process; again this can be done by a stand-alone program or is offered as an option within many of the assembly systems.

A sequence assembly system typically contains a database to store your fragments and an editor to allow manual editing of fragments. It keeps track of the original data as well as the most recently edited version. The fragments are processed by a sequence assembly engine that computes overlaps between the sequence fragments and aligns the fragments into contiguous units. There may be a viewer that lets you see the quality of the alignment and check the read coverage for each contig, or you may have to print that information in order to see it.

While there are several assembly systems available on Sunflower (GCG, Staden, phred/phrap/consed, TIGR), only the GCG Fragment Assembly System is accessible through the web interface. This software is suitable for uncomplicated smaller projects; if you have larger projects, contact the CGB for information about the software available for command-line use.

There are programs available through the web interface that can assist you in preparing your sequence data for assembly projects: trace viewers that can convert trace information into sequence files, programs for removing vector sequences from sequence fragments, and other utilities for processing your fragments before you enter them into an assembler.

Back to Sequence Assembly

This website will look much better in a browser that supports web standards, but it has been designed so that it is still usable and accessible to any browser or web-enabled device.