General help for CLUSTAL W
Main Menu


Clustal W is a general purpose multiple alignment program for DNA or proteins.

SEQUENCE INPUT : all sequences must be in 1 file, one after another. 7 formats are automatically recognised: NBRF/PIR, EMBL/SWISSPROT, Pearson (Fasta), Clustal (*.aln), GCG/MSF (Pileup), GCG9/RSF and GDE flat file. All non-alphabetic characters (spaces, digits, punctuation marks) are ignored except "-" which is used to indicate a GAP ("." in GCG/MSF).

To do a MULTIPLE ALIGNMENT on a set of sequences, use item 1 from the main menu to INPUT them; go to menu item 2 to do the multiple alignment.

PROFILE ALIGNMENTS (menu item 3) are used to align 2 alignments. Use this to add a new sequence to an old alignment, or to use secondary structure to guide the alignment process. GAPS in the old alignments are indicated using the "-" character. PROFILES can be input in ANY of the allowed formats; just use "-" (or "." for MSF/RSF) for each gap position.

PHYLOGENETIC TREES (menu item 4) can be calculated from old alignments (read in with "-" characters to indicate gaps) OR after a multiple alignment while the alignment is still in memory.


The program tries to automatically recognise the different file formats used and to guess whether the sequences are amino acid or nucleotide. This is not always foolproof.

FASTA and NBRF/PIR formats are recognised by having a ">" as the first character in the file.

EMBL/Swiss Prot formats are recognised by the letters ID at the start of the file (the token for the entry name field).

CLUSTAL format is recognised by the word CLUSTAL at the beginning of the file.

GCG/MSF format is recognised by one of the following: Note from the htmlizer (sorry): This is not the best way to input sequences from GCG. For more details see this additional note.

If 85% or more of the characters in the sequence are from A,C,G,T,U or N, the sequence will be assumed to be nucleotide. This works in 97.3% of cases but watch out!