Bioinformatics Support and Documentation :: Frequently Asked Questions

April 25, 2005 - Using FASTA-format Files

Question:


My fasta-format sequence file contains a LOT of sequences and I really don't want to create that many individual fasta-format sequence files in order to use them with GCG. What are my options?


Answer:

GCG supports a multiple sequence format called RSF (Rich Sequence Format). Unfortunately, converting a fasta file into an RSF file using GCG's tools is a multi-step process that involves creating many individual files at an intermediate step. Instead you can use some scripts we've created to directly convert between fasta-format files and RSF files in a single step. The scripts are in the directory /bio/mb/tools. The script fasta2rsf.py converts a fasta-format file into an RSF file and the script rsf2fasta.py converts an RSF file into a fasta-format file:

% /bio/mb/tools/fasta2rsf.py  mysequences.tfa  mysequences.rsf
% /bio/mb/tools/rsf2fasta.py  mysequences.rsf  mysequences.tfa

To use RSF files with GCG programs, you will need to use the {} notation with the file name. For example, to search through all of the sequences in an RSF file with a profile HMM:

% hmmersearch  hsp70.hmm_g  myproteins.rsf{*}

To use just a single sequence named orf57 from the RSF file:

% framesearch  myorfs.rsf{orf57}  pir:*


FAQ Archive >

This website will look much better in a browser that supports web standards, but it has been designed so that it is still usable and accessible to any browser or web-enabled device.