April 25, 2005 - Using FASTA-format Files
Question:
My fasta-format sequence file contains a LOT of sequences and I really don't want to create that many individual fasta-format sequence files in order to use them with GCG. What are my options?
Answer:
GCG supports a multiple sequence format called RSF (Rich Sequence Format). Unfortunately, converting a fasta file into an RSF file using GCG's tools is a multi-step process that involves creating many individual files at an intermediate step. Instead you can use some scripts we've created to directly convert between fasta-format files and RSF files in a single step. The scripts are in the directory /bio/mb/tools. The script fasta2rsf.py converts a fasta-format file into an RSF file and the script rsf2fasta.py converts an RSF file into a fasta-format file:
% /bio/mb/tools/fasta2rsf.py mysequences.tfa mysequences.rsf
% /bio/mb/tools/rsf2fasta.py mysequences.rsf mysequences.tfa
To use RSF files with GCG programs, you will need to use the {} notation with the file name. For example, to search through all of the sequences in an RSF file with a profile HMM:
% hmmersearch hsp70.hmm_g myproteins.rsf{*}
To use just a single sequence named orf57 from the RSF file:
% framesearch myorfs.rsf{orf57} pir:*