Tutorial


Centre National De LA Recherche scientifique

Cédric Notredame
www.tcoffee.org

T-Coffee:
 Tutorial and FAQ

 


T-Coffee Tutorial
T-Coffee
3D-Coffee
M-Coffee
APDB and iRMSD

ã Cédric Notredame and Centre National de la Recherche Scientifique , France


Before You Start….. 5

Foreword. 5

Pre-Requisite. 5

What Is  T-COFFEE ?. 6

What is T-Coffee?. 6

What does it do?. 6

What can it align?. 6

How can I use it?. 6

Is T-Coffee different from ClustalW?. 7

What T-Coffee Can and Cannot do for you …... 7

(NOT) Fetching Sequences. 7

Aligning Sequences. 7

Combining Alignments. 7

Evaluating Alignments. 7

Combining Sequences and Structures. 8

Identifying Occurrences of a Motif: Mocca. 8

How Does T-Coffee works. 8

Preparing Your Data: Reformatting and Trimming. 10

Reformatting your data. 10

Changing MSA formats. 10

Removing the gaps from an alignment 10

Changing the case of your sequences. 11

Protecting Important Sequence Names. 11

Extracting Portions of Dataset 11

Extracting Sequences According to a Pattern. 11

Extracting Sequences by Names. 12

Removing Sequences by Names. 13

Extracting Blocks Within Alignment 13

Concatenating Alignments. 13

Reducing and Improving your dataset 13

Extracting the N most informative sequences. 14

Extracting all the sequences less than X% identical 14

Forcing Specific Sequences to be kept 14

Identifying and Removing Outlayers. 15

Chaining Important Sequences. 15

Manipulating DNA sequences. 16

Translating DNA sequences into Proteins. 16

Back-Translation With the Bona-Fide DNA sequences. 16

Finding the Bona-Fide Sequences for the Back-Translation. 16

Guessing Your Back Translation. 16

Fetching a Structure. 17

Fetching a PDB structure. 17

Fetching The Sequence of a PDB structure. 17

Building Multiple Sequence Alignments. 18

How to generate The Alignment You Need?. 18

What is a Good Alignment?. 18

The Main Methods and their Scope. 19

Choosing The Right Package. 20

Computing Multiple Sequence Alignments With T-Coffee. 21

A Simple Multiple Sequence Alignment 21

Controlling the Output Format 21

Computing a Phylogenetic tree. 21

Using Several Datasets. 21

How Good is Your Alignment 22

Doing it over the WWW... 22

Aligning Nucleic Acids With T-Coffee. 22

Aligning Many Sequences. 22

Modifying the default parameters of T-Coffee. 23

Changing the Substitution Matrix. 23

Comparing Two Alternative Alignments. 24

Changing Gap Penalties. 25

Can You Guess The Optimal Parameters?. 26

Using Many Methods at once. 26

Using All the Methods at the Same Time: M-Coffee. 26

Using Selected Methods to Compute your MSA.. 27

Combining pre-Computed Alignments. 27

Aligning Profiles. 28

Aligning One sequence to a Profile. 28

Aligning Many Sequences to a Profile. 28

Accurate/Slow Profile to Profile Alignment 28

Aligning Other Types of Sequences. 28

Splicing variants. 28

Noisy Coding DNA Sequences…... 29

Aligning DNA sequences. 29

Aligning RNA sequences. 30

Combining Sequences and Structures. 31

If you are in a Hurry: Expresso. 31

What is Expresso. 31

Using Expresso. 32

Aligning Sequences and Structures. 32

Mixing Sequences and Structures. 32

Using Sequences only. 32

Aligning Profile using Structural Information. 33

Post Processing Your Alignment 34

Removing unwanted portions. 34

Coloring Features in your alignment. 34

Evaluating Reliability of Your Alignment 35

Evaluating Alignments with The CORE index. 35

Computing The Local CORE Index. 35

Computing the CORE index of any alignment 35

Filtering Bad Residues. 35

Filtering Gap Columns. 36

Evaluating an Alignment Using Structural Information: iRMSD.. 36

What is the iRMSD?. 36

How to Efficiently Use Structural Information. 37

Evaluating an Alignment With the irmsd Package. 37

Evaluating Alternative Alignments. 38

Evaluating an Alignment according to your own Criterion. 38

Integrating Your Own Methods In T-Coffee. 39

Using New and Existing Methods as Plugins For T-Coffee. 39

Plug-In: Using Methods Integrated in T-Coffee. 39

Integrating External Methods. 42

Managing a collection of method files. 43

Advanced Method Integration. 43

The Mother of All method files…... 44

Weighting your Method. 46

Plug-Out: Using T-Coffee as a Plug-In. 46

Creating Your Own T-Coffee Libraries. 47

Using Pre-Computed Alignments. 47

Customizing the Weighting Scheme. 47

Generating Your Own Libraries. 47

Frequently Asked Questions. 49

Abnormal Terminations and Wrong Results. 49

Q: The program keeps crashing when I give my sequences. 49

Q: The default alignment is not good enough. 49

Q: The alignment contains obvious mistakes. 50

Q: The program is crashing. 50

Q: I am running out of memory. 50

Input/Output Control 50

Q: How many Sequences can t_coffee handle. 50

Q: Can I prevent the Output of all the warnings?. 50

Q: How many ways to pass parameters to t_coffee?. 50

Q: How can I change the default output format?. 50

Q: My sequences are slightly different between all the alignments. 51

Q: Is it possible to pipe stuff OUT of t_coffee?. 51

Q: Is it possible to pipe stuff INTO t_coffee?. 51

Q: Can I read my parameters from a file?. 51

Q: I want to  decide myself on the name of the output files!!! 51

Q: I want to use the sequences in an alignment file. 51

Q: I only want to produce a library. 52

Q: I want to turn an alignment into a library. 52

Q: I want to concatenate two libraries. 52

Q: What happens to the gaps when an alignment is fed to T-Coffee. 52

Q: I cannot print the html graphic display!!! 52

Q: I want to output an html file and a regular file. 53

Q: I would like to output more than one alignment format at the same time. 53

Alignment Computation. 53

Q: Is T-Coffee the best? Why Not Using Muscle, or Mafft, or ProbCons???. 53

Q: Can t_coffee align Nucleic Acids ???. 53

Q: I do not want to compute the alignment. 53

Q: I would like to force some residues to be aligned. 54

Q: I would like to use structural alignments. 54

Q: I want to build my own libraries. 54

Q: I want to use my own tree. 55

Q: I want to align coding DNA.. 55

Q: I do not want to use all the possible pairs when computing the library. 55

Q: I only want to use specific pairs to compute the library. 55

Q: There are duplicates or quasi-duplicates in my set 55

Using Structures and Profiles. 56

Q: Can I align sequences to a profile with T-Coffee?. 56

Q: Can I align sequences Two or More Profiles?. 56

Q: Can I align two profiles according to the structures they contain?. 56

Q: T-Coffee becomes very slow when combining sequences and structures. 56

Q: Can I use a local installation of PDB?. 56

Alignment Evaluation. 57

Q: How good is my alignment?. 57

Q: What is that color index?. 57

Q: Can I evaluate alignments NOT produced with T-Coffee?. 57

Q: Can I Compare Two Alignments?. 58

Q: I am aligning sequences with long regions of very good overlap. 58

Q: Why is T-Coffee changing the names of my sequences!!!! 58

Addresses and Contacts. 60

Contributors. 60

Addresses. 60

Refernces. 62

T-Coffee. 62

Mocca. 63

CORE.. 64

Other Contributions. 64

Bug Reports and Feedback. 64

Before You Start…

Foreword

A lot of the stuff presented here emanates form two summer school that were tentatively called the "Prosite Workshops" and were held in Marseille, in 2001 and 2002. These workshops were mostly an excuse to go rambling and swimming in the callanques. Yet, when we got tired of lazing in the sun, we eventually did a bit of work to chill out. Most of our experiments were revolving around the development of sequence analysis tools. Many of the most advanced ideas in T-Coffee were launched during these fruitful sessions. Participants included Phillip Bucher, Laurent Falquet, Marco Pagni, Alexandre Gattiker, Nicolas Hulo, Christian Siegfried, Anne-Lise Veuthey, Virginie Leseau, Lorenzo Ceruti and Cedric Notredame.

This Document contains two main sections. The first one is a tutorial, where we go from simple things to more complicated and show you how to use all the subtleties of T-Coffee. We have tried to put as many of these functionalities on the web (www.tcoffee.org) but if you need to do something special and highly reproducible, the Command Line is the only way.  

Pre-Requisite

This tutorial relies on the assumption that you have installed T-Coffee, version 4.30 or higher. T-Coffee is a freeware open source running on all Unix-like platforms, including MAC-osX and Cygwin. All the relevant information for installing T-Coffee is contained in the Technical Documentation (tcoffee_technical.doc in the doc directory.)

T-Coffee cannot run on the Microsoft Windows shell. If you need to run T -Coffee on windows, start by installing cygwin (www.cygwin.com). Cygwin is a freeware open source that makes it possible to run a unix-like command line on your Microsoft Windows PC without having to reboot. Cygwin is free of charge and very easy to install. Yet, as the first installation requires downloading substantial amounts of data, you should make sure you have access to a broad-band connection.

In the course of this tutorial, we expect you to use a unix-like command line shell. If you work on Cygwin, this means clicking on the cygwin icon and typing commands in the window that appears. If you don't want to bother with command line stuff, try using the online tcoffee webserver at: www.tcoffee.org

Getting The Example Files of The Tutorial

We encourage you to try all the following examples with your own sequences/structures. If you want to try with ours, you can get the material from the example directory of the distribution. If you do not know where this file leaves or if you do not have access to it, the simplest thing to do is to:

1-      download T-Coffee's latest version from www.tcoffee.org (Follow the link to the T-Coffee Home Page)

2-      Download the latest distribution

3-      gunzip <distrib>.tar.gz

4-      tar -xvf <distrib>.tar

5-      go into <distrib>/example

This is all you need to do to run ALL the examples provided in this tutorial.

What Is
T-COFFEE
?

What is T-Coffee?

Before going deep into the core of the matter, here are a few words to quickly explain some of the things T-Coffee will do for you.

What does it do?

T-Coffee is a multiple sequence alignment program: given a set of sequences previously gathered using database search programs like BLAST, FASTA or Smith and Waterman, T-Coffee will produce a multiple sequence alignment. To use T-Coffee you must already have your sequences ready.

T-Coffee can also be used to compare alignments, reformat them or evaluate them using structural information.

What can it align?

T-Coffee will align nucleic and protein sequences alike, although it does better at aligning proteins than nucleic acids. It will be able to use structural information for protein sequences with a known structure. We recently introduced a new mode that makes T-Coffee able to accurately align large datasets.

How can I use it?

T-Coffee is not an interactive program. It runs from your UNIX or Linux command line and you must provide it with the correct parameters. If you do not like typing commands, here is the simplest available mode where T-Coffee only needs the name of the sequence file:

         PROMPT: t_coffee sample_seq1.fasta

Installing and using T-Coffee requires a minimum acquaintance with the Linux/Unix operating system. If you feel this is beyond your computer skills, we suggest you use one of the available online servers.

Is There an Online Server

Yes, at www.tcoffee.org

Is T-Coffee different from ClustalW?

According to several benchmarks, T-Coffee appears to be more accurate than ClustalW. Yet, this increased accuracy comes at a price: T-Coffee is slower than Clustal (about N times fro N Sequences).

If you are familiar with ClustalW, or if you run a ClustalW server, you will find that we have made some efforts to ensure as much compatibility as possible between ClustalW and T-COFFEE. Whenever it was relevant, we have kept the flag names and the flag syntax of ClustalW. Yet, you will find that T-Coffee also has many extra possibilities…

If you want to align closely related sequences, T-Coffee can also be used in a fast mode, much faster than ClustalW, and about as accurate ( T-Coffee -very_fast) This mode is especially useful to align long sequences.

        

What T-Coffee Can and Cannot do for you

IMPORTANT: All the files mentioned here (sample_seq...) can be found in the example directory of the distribution.

(NOT) Fetching Sequences

T-Coffee will NOT fetch sequences for you: you must select the sequences you want to align before hand. We suggest you use any BLAST server and format your sequences in FASTA so that T-COFFEE can use them easily. The expasy BLAST server (www.expasy.ch) provides a nice interface for integrating database searches.

Aligning Sequences

T-Coffee will compute (or at least try to compute!) accurate multiple alignments of DNA, RNA or Protein sequences.

Combining Alignments

T-Coffee allows you to combine results obtained with several alignment methods. For instance if you have an alignment coming from ClustalW, an other alignment coming from Dialign, and a structural alignment of some of your sequences, T-Coffee will combine all that information and produce a new multiple sequence alignment having the best agreement with all these methods (see the FAQ for more details)

PROMPT: t_coffee –aln=sproteases_small.cw_aln, sproteases_small.muscle, sproteases_small.tc_aln –outfile=combined_aln.aln

Evaluating Alignments

You can use T-Coffee to measure the reliability of your Multiple Sequence alignment. If you want to find out about that, read the FAQ or the documentation for the -output flag.

PROMPT: t_coffee –infile=sproteases_small.aln –special_mode=evaluate

Combining Sequences and Structures

One of the latest improvements of T-Coffee is to let you combine sequences and structures, so that your alignments are of higher quality. You need to have sap package installed to fully benefit of this facility.

PROMPT: t_coffee 3d.fasta –special_mode=3dcoffee

Using this mode will cause T-Coffee to automatically identify the target corresponding to your sequence as indicated