|
Tutorial |
Centre National De LA Recherche scientifique
Cédric Notredame
www.tcoffee.org
T-Coffee:
Tutorial and FAQ
T-Coffee Tutorial
T-Coffee
3D-Coffee
M-Coffee
APDB and iRMSD
ã Cédric Notredame and Centre National de la Recherche Scientifique , France
Is T-Coffee different from ClustalW?
What T-Coffee Can and Cannot do for you …
Combining Sequences and Structures
Identifying Occurrences of a Motif: Mocca
Preparing Your Data: Reformatting and Trimming
Removing the gaps from an alignment
Changing the case of your sequences
Protecting Important Sequence Names
Extracting Portions of Dataset
Extracting Sequences According to a Pattern
Extracting Blocks Within Alignment
Reducing and Improving your dataset
Extracting the N most informative sequences
Extracting all the sequences less than X% identical
Forcing Specific Sequences to be kept
Identifying and Removing Outlayers
Translating DNA sequences into Proteins
Back-Translation With the Bona-Fide DNA sequences
Finding the Bona-Fide Sequences for the Back-Translation
Guessing Your Back Translation
Fetching The Sequence of a PDB structure
Building Multiple Sequence Alignments
How to generate The Alignment You Need?
The Main Methods and their Scope
Computing Multiple Sequence Alignments With T-Coffee
A Simple Multiple Sequence Alignment
Aligning Nucleic Acids With T-Coffee
Modifying the default parameters of T-Coffee
Changing the Substitution Matrix
Comparing Two Alternative Alignments
Can You Guess The Optimal Parameters?
Using All the Methods at the Same Time: M-Coffee
Using Selected Methods to Compute your MSA
Combining pre-Computed Alignments
Aligning One sequence to a Profile
Aligning Many Sequences to a Profile
Accurate/Slow Profile to Profile Alignment
Aligning Other Types of Sequences
Combining Sequences and Structures
If you are in a Hurry: Expresso
Aligning Sequences and Structures
Mixing Sequences and Structures
Aligning Profile using Structural Information
Post Processing Your Alignment
Coloring Features in your alignment.
Evaluating Reliability of Your Alignment
Evaluating Alignments with The CORE index
Computing The Local CORE Index
Computing the CORE index of any alignment
Evaluating an Alignment Using Structural Information: iRMSD
How to Efficiently Use Structural Information
Evaluating an Alignment With the irmsd Package
Evaluating Alternative Alignments
Evaluating an Alignment according to your own Criterion
Integrating Your Own Methods In T-Coffee
Using New and Existing Methods as Plugins For T-Coffee
Plug-In: Using Methods Integrated in T-Coffee
Managing a collection of method files
The Mother of All method files…
Plug-Out: Using T-Coffee as a Plug-In
Creating Your Own T-Coffee Libraries
Customizing the Weighting Scheme
Abnormal Terminations and Wrong Results
Q: The program keeps crashing when I give my sequences
Q: The default alignment is not good enough
Q: The alignment contains obvious mistakes
Q: How many Sequences can t_coffee handle
Q: Can I prevent the Output of all the warnings?
Q: How many ways to pass parameters to t_coffee?
Q: How can I change the default output format?
Q: My sequences are slightly different between all the alignments.
Q: Is it possible to pipe stuff OUT of t_coffee?
Q: Is it possible to pipe stuff INTO t_coffee?
Q: Can I read my parameters from a file?
Q: I want to decide myself on the name of the output files!!!
Q: I want to use the sequences in an alignment file
Q: I only want to produce a library
Q: I want to turn an alignment into a library
Q: I want to concatenate two libraries
Q: What happens to the gaps when an alignment is fed to T-Coffee
Q: I cannot print the html graphic display!!!
Q: I want to output an html file and a regular file
Q: I would like to output more than one alignment format at the same time
Q: Is T-Coffee the best? Why Not Using Muscle, or Mafft, or ProbCons???
Q: Can t_coffee align Nucleic Acids ???
Q: I do not want to compute the alignment.
Q: I would like to force some residues to be aligned.
Q: I would like to use structural alignments.
Q: I want to build my own libraries.
Q: I do not want to use all the possible pairs when computing the library
Q: I only want to use specific pairs to compute the library
Q: There are duplicates or quasi-duplicates in my set
Q: Can I align sequences to a profile with T-Coffee?
Q: Can I align sequences Two or More Profiles?
Q: Can I align two profiles according to the structures they contain?
Q: T-Coffee becomes very slow when combining sequences and structures
Q: Can I use a local installation of PDB?
Q: Can I evaluate alignments NOT produced with T-Coffee?
Q: Can I Compare Two Alignments?
Q: I am aligning sequences with long regions of very good overlap
Q: Why is T-Coffee changing the names of my sequences!!!!
A lot of the stuff presented here emanates form two summer school that were tentatively called the "Prosite Workshops" and were held in Marseille, in 2001 and 2002. These workshops were mostly an excuse to go rambling and swimming in the callanques. Yet, when we got tired of lazing in the sun, we eventually did a bit of work to chill out. Most of our experiments were revolving around the development of sequence analysis tools. Many of the most advanced ideas in T-Coffee were launched during these fruitful sessions. Participants included Phillip Bucher, Laurent Falquet, Marco Pagni, Alexandre Gattiker, Nicolas Hulo, Christian Siegfried, Anne-Lise Veuthey, Virginie Leseau, Lorenzo Ceruti and Cedric Notredame.
This Document contains two main sections. The first one is a tutorial, where we go from simple things to more complicated and show you how to use all the subtleties of T-Coffee. We have tried to put as many of these functionalities on the web (www.tcoffee.org) but if you need to do something special and highly reproducible, the Command Line is the only way.
This tutorial relies on the assumption that you have installed T-Coffee, version 4.30 or higher. T-Coffee is a freeware open source running on all Unix-like platforms, including MAC-osX and Cygwin. All the relevant information for installing T-Coffee is contained in the Technical Documentation (tcoffee_technical.doc in the doc directory.)
T-Coffee cannot run on the Microsoft Windows shell. If you need to run T -Coffee on windows, start by installing cygwin (www.cygwin.com). Cygwin is a freeware open source that makes it possible to run a unix-like command line on your Microsoft Windows PC without having to reboot. Cygwin is free of charge and very easy to install. Yet, as the first installation requires downloading substantial amounts of data, you should make sure you have access to a broad-band connection.
In the course of this tutorial, we expect you to use a unix-like command line shell. If you work on Cygwin, this means clicking on the cygwin icon and typing commands in the window that appears. If you don't want to bother with command line stuff, try using the online tcoffee webserver at: www.tcoffee.org
We encourage you to try all the following examples with your own sequences/structures. If you want to try with ours, you can get the material from the example directory of the distribution. If you do not know where this file leaves or if you do not have access to it, the simplest thing to do is to:
1- download T-Coffee's latest version from www.tcoffee.org (Follow the link to the T-Coffee Home Page)
2- Download the latest distribution
3- gunzip <distrib>.tar.gz
4- tar -xvf <distrib>.tar
5- go into <distrib>/example
This is all you need to do to run ALL the examples provided in this tutorial.
Before going deep into the core of the matter, here are a few words to quickly explain some of the things T-Coffee will do for you.
T-Coffee is a multiple sequence alignment program: given a set of sequences previously gathered using database search programs like BLAST, FASTA or Smith and Waterman, T-Coffee will produce a multiple sequence alignment. To use T-Coffee you must already have your sequences ready.
T-Coffee can also be used to compare alignments, reformat them or evaluate them using structural information.
T-Coffee will align nucleic and protein sequences alike, although it does better at aligning proteins than nucleic acids. It will be able to use structural information for protein sequences with a known structure. We recently introduced a new mode that makes T-Coffee able to accurately align large datasets.
T-Coffee is not an interactive program. It runs from your UNIX or Linux command line and you must provide it with the correct parameters. If you do not like typing commands, here is the simplest available mode where T-Coffee only needs the name of the sequence file:
PROMPT: t_coffee sample_seq1.fasta
Installing and using T-Coffee requires a minimum acquaintance with the Linux/Unix operating system. If you feel this is beyond your computer skills, we suggest you use one of the available online servers.
Yes, at www.tcoffee.org
According to several benchmarks, T-Coffee appears to be more accurate than ClustalW. Yet, this increased accuracy comes at a price: T-Coffee is slower than Clustal (about N times fro N Sequences).
If you are familiar with ClustalW, or if you run a ClustalW server, you will find that we have made some efforts to ensure as much compatibility as possible between ClustalW and T-COFFEE. Whenever it was relevant, we have kept the flag names and the flag syntax of ClustalW. Yet, you will find that T-Coffee also has many extra possibilities…
If you want to align closely related sequences, T-Coffee can also be used in a fast mode, much faster than ClustalW, and about as accurate ( T-Coffee -very_fast) This mode is especially useful to align long sequences.
IMPORTANT: All the files mentioned here (sample_seq...) can be found in the example directory of the distribution.
T-Coffee will NOT fetch sequences for you: you must select the sequences you want to align before hand. We suggest you use any BLAST server and format your sequences in FASTA so that T-COFFEE can use them easily. The expasy BLAST server (www.expasy.ch) provides a nice interface for integrating database searches.
T-Coffee will compute (or at least try to compute!) accurate multiple alignments of DNA, RNA or Protein sequences.
T-Coffee allows you to combine results obtained with several alignment methods. For instance if you have an alignment coming from ClustalW, an other alignment coming from Dialign, and a structural alignment of some of your sequences, T-Coffee will combine all that information and produce a new multiple sequence alignment having the best agreement with all these methods (see the FAQ for more details)
PROMPT: t_coffee –aln=sproteases_small.cw_aln, sproteases_small.muscle, sproteases_small.tc_aln –outfile=combined_aln.aln
You can use T-Coffee to measure the reliability of your Multiple Sequence alignment. If you want to find out about that, read the FAQ or the documentation for the -output flag.
PROMPT: t_coffee –infile=sproteases_small.aln –special_mode=evaluate
One of the latest improvements of T-Coffee is to let you combine sequences and structures, so that your alignments are of higher quality. You need to have sap package installed to fully benefit of this facility.
PROMPT: t_coffee 3d.fasta –special_mode=3dcoffee
Using this mode will cause T-Coffee to automatically identify the target corresponding to your sequence as indicated