Bioinformatics Support and Documentation :: Getting Started with EMBOSS
Using EMBOSS from the command line
EMBOSS has been developed with an extremely powerful command-line
syntax. Unfortunately this may be a little bewildering for those
who aren't used to such things so this page aims to introduce the
key concepts.
You will probably find it useful to log into research.cgb.indiana.edu
and try this out yourself as you follow through this introduction.
Key Concepts
All EMBOSS programs know which parameters they need to run. You can
get a list of a program's parameters by using the command
where programname is the name of the program for which you
wish to find out information.
A parameter (also known as an option, switch, or
qualifier ) is a name following a - sign written after the
program name
- Like this:
-name
- It may or may not have a value associated with it like
this:
-name=value
- For a value which contains spaces or UNIX wildcards like *, the
value should be put in '' like this:
-name='value that contains spaces or *'
- EMBOSS doesn't even mind if you don't use the '=' like
this:
-name value
An example:
research> transeq -help
Mandatory qualifiers:
[-sequence] seqall Sequence database USA
[-outseq] seqoutall Output sequence(s) USA
Optional qualifiers:
-frame list Frame(s) to translate
-table list Code to use
-regions range Regions to translate.
If this is left blank, then the complete
sequence is translated.
A set of regions is specified by a set of
pairs of positions.
The positions are integers.
They are separated by any non-digit,
non-alpha character.
Examples of region specifications are:
24-45, 56-78
1:45, 67=99;765..888
1,5,8,10,23,45,57,99
-trim bool
This removes all X and * characters from the
right end of the translation. The trimming
process starts at the end and continues
until the next character is not a X or a *
Advanced qualifiers: (none)
|
Let's work through this table to see what it means.
- Each qualifier (in blue) has a
type (in red). This tells you what
input the program is expecting. For the first parameter the program
is expecting a Uniform Sequence Address or USA
to tell it where to find a sequence on which to work.
If the parameter name is in square
brackets like this [-name] then you don't have to
specify the parameter name, just the value. The values must be in
the right order though otherwise you will probably get an error, or
worse have the wrong output.
- Qualifiers (parameters that can be specified on the command
line) are split into five groups.
- Mandatory qualifiers are the
inputs the program needs in order to run. The program will always
ask you for these (except if you use the -auto
qualifier).
- Optional qualifiers allow you
finer control over how the program runs but will only be used by
more advanced users. EMBOSS programs will only prompt you for these
if you use the -options qualifier.
- Advanced qualifiers are for the
real expert user. These are not normally needed for normal use but
are there for the specialist user when they need them. EMBOSS will
NEVER prompt you for these, you have to know they are there.
- General qualifiers are those which
apply to all programs. A table can be found here
- Associated qualifiers are
qualifiers associated with parameter typoes that allow you fine
control over a parameter. These are covered in more detail below
General qualifiers apply to all EMBOSS programs.
| Option |
Type |
Description |
| -auto |
bool |
Do not promt for input. The program will run without asking for
any input, taking the values given on the command line or the
default values if no value was given on the command line. If you
haven't specified all mandatory values or specify a value
incorrectly the program will exit with an error message. |
| -options |
bool |
Prompt for optional parameters as well as required
parameters. |
| -help |
bool |
Print the command-line options for the program |
| -verbose |
bool |
Print all the command-line options, including associated and
general qualifiers |
| -stdout |
bool |
Write the results to standard output (typically the screen)
instead of to a file. |
| -filter |
bool |
Read input from standard input, write output to standard
output. |
| -acdlog |
bool |
write ACD processing log to program.acdlog |
| -debug |
bool |
write debug output to program.dbg |
| -acdpretty |
bool |
rewrite ACD file as program.acdpretty |
| -acdtable |
bool |
write HTML table of options |
By now you should be happy about getting help on an EMBOSS program.
You should be familiar with the following points:
- All EMBOSS programs can be run interactively or completely from
the command line by specifying all the parameters as options.
- You can control the degree of prompting you get using
-auto for no prompting at all, no additional option for
prompting for just the essential parameters, or -options
for full prompting.
- You can get help using the option -help
Now we progress to getting better control over input and output
using associated qualifiers.
Associated qualifiers are qualifiers that are associated with a
sequence type. to take a closer look at associated qualifiers let's
look more closely at out first example by adding the general
qualifier -verbose to give a listing of the associated and
general qualifiers.
research> transeq -help -verbose
Mandatory qualifiers:
[-sequence] seqall Sequence database USA
[-outseq] seqoutall Output sequence(s) USA
Optional qualifiers:
-frame list Frame(s) to translate
-table list Code to use
-regions range Regions to translate.
If this is left blank, then the complete
sequence is translated.
A set of regions is specified by a set of
pairs of positions.
The positions are integers.
They are separated by any non-digit,
non-alpha character.
Examples of region specifications are:
24-45, 56-78
1:45, 67=99;765..888
1,5,8,10,23,45,57,99
-trim bool
This removes all X and * characters from the
right end of the translation. The trimming
process starts at the end and continues
until the next character is not a X or a *
Advanced qualifiers: (none)
Associated qualifiers:
"-sequence" related qualifiers
-sbegin1 integer first base used
-send1 integer last base used, def=seq length
-sreverse1 bool reverse (if DNA)
-sask1 bool ask for begin/end/reverse
-snucleotide1 bool sequence is nucleotide
-sprotein1 bool sequence is protein
-slower1 bool make lower case
-supper1 bool make upper case
-sformat1 string input sequence format
-sopenfile1 string input filename
-sdbname1 string database name
-sid1 string entryname
-ufo1 string UFO features
-fformat1 string features format
-fopenfile1 string features file name
"-outseq" related qualifiers
-osformat2 string output seq format
-osextension2 string file name extension
-osname2 string base file name
-osdbname2 string database name to add
-ossingle2 bool separate file for each entry
-oufo2 string UFO features
-offormat2 string features format
-ofname2 string features file name
General qualifiers:
-debug bool write debug output to program.dbg
-auto bool turn off prompts
-stdout bool write standard output
-filter bool read standard input, write standard output
-options bool prompt for required and optional values
-verbose bool report some/full command-line options
-help bool report command-line options
-acdlog bool write ACD processing log to program.acdlog
-acdpretty bool rewrite ACD file as program.acdpretty
-acdtable bool write HTML table of options
|
There are a lot more options listed here. The first part of the
output you have seen before. The second part, the associated
qualifiers, will be explained now.
Look for the line
"-sequence" related qualifiers
The qualifiers following this line are qualifiers associated
with the qualifier -sequence. These include options such as
which portion of the sequence should be analysed (with
-sbegin1 and -send1) and an option to prompt you
for start, end and reverse (-sask). Other associated
qualifiers include filenames, database access, formats and so
on.
A list of associated qualifiers can be found in a handy
quick reference guide (PDF format).
This section will explain a little more about the Uniform
Sequence Address. The USA is a concept allowing many different
entries and databases to be specified in a common form. THis makes
life easier for everyone. EMBOSS can read and write almost all the
common sequence and feature file formats, but occasionally needs a
little hint from the user.
The general form of a USA is:
format::database:entry ID
where:
- format is a
sequence format that EMBOSS knows how to read
- database is a
database defined for EMBOSS or a filename of a sequence file
- entry ID is the
entry name or accession number that uniquely identifies the
sequence in the database.
The following are all valid USAs:
| embl::mysequence.fil |
reads a sequence from the file mysequence.fil and
expects it to be in EMBL format. |
| sw:tf_human |
Reads the sequence with the ID tf_human from the
database sw (Swissprot at EMBnet Norway). |
| asis::actggtaccgattgtaacaccgatatatcg |
Reads in the sequence actggtaccgattgtaacaccgatatatcg and
works with that. |
|
| myseq.pep |
Reads the file myseq.pep and guesses the sequence
format |
EMBOSS is generally very good at guessing sequence formats and the
type of a sequence. However, if you have a protein sequence rich in
A, C, T, and G or a very ambiguous nucleotide
sequence you probably want to use the associated qualifiers
-sprot or -snuc to ensure that EMBOSS gets it
right.
EMBOSS can output to a variety of graphical devices. One would
typically specify the output choice using -graph
choice. When you are prompted for a graphical
device, typing some rubbish will cause EMBOSS to give you a list of
devices available on the system. Not all systems support all
devices. EMBnet Norway supports all available EMBOSS graphics
devices.
Example output:
research> pepwheel sw:tf_human
Shows protein sequences as helices
Graph type [x11]: hfjka
ERROR: option -graph: Invalid graph value 'hfjka'
Devices allowed are:-
postscript
ps
hpgl
hp7470
hp7580
meta
colourps
cps
xwindows
x11
tektronics
tekt
tek4107t
tek
none
null
text
data
xterm
png
Graph type [x11]:
|
In this example you can see a list of all the options. More detail
on each of these graphics types can be found in the table below.
If you select a graphics output format that writes to a file, the
default setting is for the program to write a file
programname.format where format is the
graphics format. This name can be changed using the associated
qualifier -goutfile filename.
Associated qualifiers for -graph parameters
| Graph associated qualifiers |
Allowed values |
Default |
Description |
| -graph |
Graph type |
EMBOSS has a list of known devices, including postscript, ps,
hpgl, hp7470, hp7580, meta, colourps, cps, xwindows, x11,
tektronics, tekt, tek4107t, tek, none, null, text, data, xterm,
png |
EMBOSS_GRAPHICS value, or x11 |
The graphics device or format to which images should be
output |
| -gprompt |
bool value |
Yes/No |
|
prompt for graph associated qualifiers |
| -gtitle |
string value |
Any string is accepted |
An empty string is accepted |
The graph title |
| -gsubtitle |
string value |
Any string is accepted |
An empty string is accepted |
The graph subtitle |
| -gxtitle |
string value |
Any string is accepted |
An empty string is accepted |
The x axis title |
| -gytitle |
string value |
Any string is accepted |
An empty string is accepted |
The y axis title |
| -grtitle |
string value |
Any string is accepted |
An empty string is accepted |
The graph running title |
| -gpages |
integer value |
Any integer value |
0 |
Number of pages over which to print the graph |
| -goutfile |
string value |
Any string is accepted |
An empty string is accepted |
The filename to which graphics output should be saved |
Graph types
| On-screen options |
xwindows
x11
xterm |
Draw output on an X-windows terminal screen. This option
requires X-windows either natively (Linux/Unix) or using an
X-emulator. |
tektronics
tekt
tek4107t
tek |
Draw output on a Tektronics terminal screen. This option
requires a tektronics terminal either natively or using an
emulator. |
| Graphics File formats |
| png |
Write the output to a file in PNG format (good for web pages
and for importing into presentations) |
postscript
ps |
Write output in black and white postscript to a file. |
hpgl
hp7470
hp7580
|
Write output in Hewlett-Packard Graphics Language to a file.
(can then be printed on a HP Laserjet) |
| meta |
Write output in meta format to a file. |
colourps
cps |
Write output in colour postscript to a file. |
| Text file formats |
| text |
Write an ASCII representation of the output in a text file |
| data |
Write the data for an xygraph to a file for import into other
plotting programs (eg gnuplot) |
| Other formats |
none
null |
Suppress graphical output |
This website will look much better in a browser that supports
web standards, but it has been designed so
that it is still usable and accessible to any browser or web-enabled device.
|