Bioinformatics Support and Documentation :: Getting Started with EMBOSS

Using EMBOSS from the command line

EMBOSS has been developed with an extremely powerful command-line syntax. Unfortunately this may be a little bewildering for those who aren't used to such things so this page aims to introduce the key concepts.

You will probably find it useful to log into research.cgb.indiana.edu and try this out yourself as you follow through this introduction.

Key Concepts

Command-line parameters

All EMBOSS programs know which parameters they need to run. You can get a list of a program's parameters by using the command

 programname -help

where programname is the name of the program for which you wish to find out information.
A parameter (also known as an option, switch, or qualifier ) is a name following a - sign written after the program name

  • Like this:
    -name
  • It may or may not have a value associated with it like this:
    -name=value
  • For a value which contains spaces or UNIX wildcards like *, the value should be put in '' like this:
    -name='value that contains spaces or *'
  • EMBOSS doesn't even mind if you don't use the '=' like this:
    -name value

An example:

research> transeq -help
 Mandatory qualifiers:
  [-sequence]      seqall     Sequence database USA
  [-outseq]        seqoutall  Output sequence(s) USA

 Optional qualifiers:
   -frame          list       Frame(s) to translate
   -table          list       Code to use
   -regions        range      Regions to translate.
              If this is left blank, then the complete
              sequence is translated.
              A set of regions is specified by a set of
              pairs of positions.
              The positions are integers.
              They are separated by any non-digit,
              non-alpha character.
              Examples of region specifications are:
              24-45, 56-78
              1:45, 67=99;765..888
              1,5,8,10,23,45,57,99
   -trim            bool      
              This removes all X and * characters from the
              right end of the translation. The trimming
              process starts at the end and continues
              until the next character is not a X or a *

Advanced qualifiers: (none)

Let's work through this table to see what it means.

  • Each qualifier (in blue) has a type (in red). This tells you what input the program is expecting. For the first parameter the program is expecting a Uniform Sequence Address or USA to tell it where to find a sequence on which to work. If the parameter name is in square brackets like this [-name] then you don't have to specify the parameter name, just the value. The values must be in the right order though otherwise you will probably get an error, or worse have the wrong output.
  • Qualifiers (parameters that can be specified on the command line) are split into five groups.
    • Mandatory qualifiers are the inputs the program needs in order to run. The program will always ask you for these (except if you use the -auto qualifier).
    • Optional qualifiers allow you finer control over how the program runs but will only be used by more advanced users. EMBOSS programs will only prompt you for these if you use the -options qualifier.
    • Advanced qualifiers are for the real expert user. These are not normally needed for normal use but are there for the specialist user when they need them. EMBOSS will NEVER prompt you for these, you have to know they are there.
    • General qualifiers are those which apply to all programs. A table can be found here
    • Associated qualifiers are qualifiers associated with parameter typoes that allow you fine control over a parameter. These are covered in more detail below

General Qualifiers

General qualifiers apply to all EMBOSS programs.

Option Type Description
-auto bool Do not promt for input. The program will run without asking for any input, taking the values given on the command line or the default values if no value was given on the command line. If you haven't specified all mandatory values or specify a value incorrectly the program will exit with an error message.
-options bool Prompt for optional parameters as well as required parameters.
-help bool Print the command-line options for the program
-verbose bool Print all the command-line options, including associated and general qualifiers
-stdout bool Write the results to standard output (typically the screen) instead of to a file.
-filter bool Read input from standard input, write output to standard output.
-acdlog bool write ACD processing log to program.acdlog
-debug bool write debug output to program.dbg
-acdpretty bool rewrite ACD file as program.acdpretty
-acdtable bool write HTML table of options

By now you should be happy about getting help on an EMBOSS program. You should be familiar with the following points:

  • All EMBOSS programs can be run interactively or completely from the command line by specifying all the parameters as options.
  • You can control the degree of prompting you get using -auto for no prompting at all, no additional option for prompting for just the essential parameters, or -options for full prompting.
  • You can get help using the option -help

Now we progress to getting better control over input and output using associated qualifiers.


Associated qualifiers

Associated qualifiers are qualifiers that are associated with a sequence type. to take a closer look at associated qualifiers let's look more closely at out first example by adding the general qualifier -verbose to give a listing of the associated and general qualifiers.

research> transeq -help -verbose
 Mandatory qualifiers:
  [-sequence]      seqall     Sequence database USA
  [-outseq]        seqoutall  Output sequence(s) USA

Optional qualifiers:
  -frame           list       Frame(s) to translate
  -table           list       Code to use
  -regions         range      Regions to translate.
              If this is left blank, then the complete
              sequence is translated.
              A set of regions is specified by a set of
              pairs of positions.
              The positions are integers.
              They are separated by any non-digit,
              non-alpha character.
              Examples of region specifications are:
              24-45, 56-78
              1:45, 67=99;765..888
              1,5,8,10,23,45,57,99
 -trim             bool
              This removes all X and * characters from the
              right end of the translation. The trimming
              process starts at the end and continues
              until the next character is not a X or a *

Advanced qualifiers: (none)

  Associated qualifiers:
"-sequence" related qualifiers
-sbegin1              integer       first base used
-send1                integer    last base used, def=seq length
-sreverse1            bool       reverse (if DNA)
-sask1                bool       ask for begin/end/reverse
-snucleotide1         bool       sequence is nucleotide
-sprotein1            bool       sequence is protein
-slower1              bool       make lower case
-supper1              bool       make upper case
-sformat1             string     input sequence format
-sopenfile1           string     input filename
-sdbname1             string     database name
-sid1                 string     entryname
-ufo1                 string     UFO features
-fformat1             string     features format
-fopenfile1           string     features file name
"-outseq" related qualifiers
-osformat2            string     output seq format
-osextension2         string     file name extension
-osname2              string     base file name
-osdbname2            string     database name to add
-ossingle2            bool       separate file for each entry
-oufo2                string     UFO features
-offormat2            string     features format
-ofname2              string     features file name

General qualifiers:
-debug                bool       write debug output to program.dbg
-auto                 bool       turn off prompts
-stdout               bool       write standard output
-filter               bool       read standard input, write standard output
-options              bool       prompt for required and optional values
-verbose              bool       report some/full command-line options
-help                 bool       report command-line options
-acdlog               bool       write ACD processing log to program.acdlog
-acdpretty            bool       rewrite ACD file as program.acdpretty
-acdtable             bool       write HTML table of options

There are a lot more options listed here. The first part of the output you have seen before. The second part, the associated qualifiers, will be explained now.

Look for the line

"-sequence" related qualifiers

The qualifiers following this line are qualifiers associated with the qualifier -sequence. These include options such as which portion of the sequence should be analysed (with -sbegin1 and -send1) and an option to prompt you for start, end and reverse (-sask). Other associated qualifiers include filenames, database access, formats and so on.

A list of associated qualifiers can be found in a handy quick reference guide (PDF format).

Uniform Sequence Address

This section will explain a little more about the Uniform Sequence Address. The USA is a concept allowing many different entries and databases to be specified in a common form. THis makes life easier for everyone. EMBOSS can read and write almost all the common sequence and feature file formats, but occasionally needs a little hint from the user.

The general form of a USA is:

format::database:entry ID

where:

  • format is a sequence format that EMBOSS knows how to read
  • database is a database defined for EMBOSS or a filename of a sequence file
  • entry ID is the entry name or accession number that uniquely identifies the sequence in the database.

The following are all valid USAs:

embl::mysequence.fil reads a sequence from the file mysequence.fil and expects it to be in EMBL format.
sw:tf_human Reads the sequence with the ID tf_human from the database sw (Swissprot at EMBnet Norway).
asis::actggtaccgattgtaacaccgatatatcg Reads in the sequence actggtaccgattgtaacaccgatatatcg and works with that.
myseq.pep Reads the file myseq.pep and guesses the sequence format

EMBOSS is generally very good at guessing sequence formats and the type of a sequence. However, if you have a protein sequence rich in A, C, T, and G or a very ambiguous nucleotide sequence you probably want to use the associated qualifiers -sprot or -snuc to ensure that EMBOSS gets it right.

Graphical output

EMBOSS can output to a variety of graphical devices. One would typically specify the output choice using -graph choice. When you are prompted for a graphical device, typing some rubbish will cause EMBOSS to give you a list of devices available on the system. Not all systems support all devices. EMBnet Norway supports all available EMBOSS graphics devices.

Example output:

research> pepwheel sw:tf_human
Shows protein sequences as helices
Graph type [x11]: hfjka
ERROR: option -graph: Invalid graph value 'hfjka'
Devices allowed are:-
postscript
ps
hpgl
hp7470
hp7580
meta
colourps
cps
xwindows
x11
tektronics
tekt
tek4107t
tek
none
null
text
data
xterm
png
Graph type [x11]:

In this example you can see a list of all the options. More detail on each of these graphics types can be found in the table below.

If you select a graphics output format that writes to a file, the default setting is for the program to write a file programname.format where format is the graphics format. This name can be changed using the associated qualifier -goutfile filename.

Associated qualifiers for -graph parameters

Graph associated qualifiers Allowed values Default Description
-graph Graph type EMBOSS has a list of known devices, including postscript, ps, hpgl, hp7470, hp7580, meta, colourps, cps, xwindows, x11, tektronics, tekt, tek4107t, tek, none, null, text, data, xterm, png EMBOSS_GRAPHICS value, or x11 The graphics device or format to which images should be output
-gprompt bool value Yes/No   prompt for graph associated qualifiers
-gtitle string value Any string is accepted An empty string is accepted The graph title
-gsubtitle string value Any string is accepted An empty string is accepted The graph subtitle
-gxtitle string value Any string is accepted An empty string is accepted The x axis title
-gytitle string value Any string is accepted An empty string is accepted The y axis title
-grtitle string value Any string is accepted An empty string is accepted The graph running title
-gpages integer value Any integer value 0 Number of pages over which to print the graph
-goutfile string value Any string is accepted An empty string is accepted The filename to which graphics output should be saved

Graph types

On-screen options
xwindows
x11
xterm
Draw output on an X-windows terminal screen. This option requires X-windows either natively (Linux/Unix) or using an X-emulator.
tektronics
tekt
tek4107t
tek
Draw output on a Tektronics terminal screen. This option requires a tektronics terminal either natively or using an emulator.
Graphics File formats
png Write the output to a file in PNG format (good for web pages and for importing into presentations)
postscript
ps
Write output in black and white postscript to a file.
hpgl
hp7470
hp7580
Write output in Hewlett-Packard Graphics Language to a file. (can then be printed on a HP Laserjet)
meta Write output in meta format to a file.
colourps
cps
Write output in colour postscript to a file.
Text file formats
text Write an ASCII representation of the output in a text file
data Write the data for an xygraph to a file for import into other plotting programs (eg gnuplot)
Other formats
none
null
Suppress graphical output

This website will look much better in a browser that supports web standards, but it has been designed so that it is still usable and accessible to any browser or web-enabled device.