1、choice. Nucleic Acids Research, 22:4673-4680.-Whats New (March 1996) in Version 1.6 (since version 1.5).1) Improved handling of sequences of unequal length. Previously, weincreased the gap extension penalties for both sequences if the two sequences(or groups of previously aligned sequences) were of
2、different lengths. Now, we increase the gap opening and extension penalties for the shorter sequence only. This helps prevent short sequences being stretched outalong longer ones.2) Added the Gonnet series of weight matrices (from Gaston Gonnet and co-workers at the ETH in Zurich). Fixed a bug in th
3、e matrixchoice menu; now PAM matrices can be selected ok.3) Added secondary structure/gap penalty masks. These allow you to include, in an alignment, a position specific set of gap penalties. You can either set a gap opening penalty at each position or specifythe secondary strcuture (if protein; alp
4、ha helix, beta strand or loop)and have gap penalties set automatically. This, basically, is used to make gaps harder to open inside helices or strands. These masks are only used in the profile alignment menu. They may be read inas part of an alignment in a special format (see the on-line help fordet
5、ails) or associated with each sequence, if the sequences are in Swiss Prot format and secondary structure information is given. All of the mask parameters can be set from the profile alignment menu. Basically, themask is made up of a series of numbers between 1 and 9, one per position.The gap openin
6、g penalty at a position is calculated as the starting penaltymultipleied by the mask value at that site. 4) Added command line options /profile and /sequences.These allow uses to choose between normal profile alignment where thetwo profiles (pre-existing alignments specified in the files/profile1= a
7、nd /profile2=) are merged/aligned with each other (/profile)and the case where the individual sequences in /profile2 are alignedsequentially with the alignment in /profile1 (/sequences).5) Fixed bug in modified Myers and Miller algorithm - gap penalty scorewas not always calculated properly for type
8、 2 midpoints. This is the corealignment algorithm.6) Only allows one output file format to be selected from command line- ie. multiple output alignment files are not allowed.7) Fixed bad calls to ckfree error during calculation of phylip distancematrix.8) Fixed command line options /gapopen /gapext
9、/type=protein /negative.9) Allowed user to change command line separator on UNIX from / to -.This allows unix users to use the more conventinal symbolfor seperating command line options. / can then be used in unixfile names on the command line. The symbol that is used,is specified in the file clusta
10、lw.h which must be edited if you wish to change it (and the program must then be recompiled). Find the block of code in clustalw.h that corrsponds to the operating system youare using. These blocks are started by one of the following:#ifdef VMS #elif MAC#elif MSDOS#elif UNIXOn the next line after ea
11、ch is the line:#define COMMANDSEP Change this in the appropriate block of code (e.g. the UNIX block) to if you wish to use the - character as command seperator. s New (April 1995) in Version 1.5 (since version 1.3).1) ported to MAC and PC. These versions are quite slow unless youhave a nice beefy ma
12、chine. On a Power Mac or a Pentium boxit is nice and fast. Two precompiled versions are supplied for Macs(Power mac and old mac versions).Mac: 1500 residues by 100 sequencesPower Mac 3000 PC 1500 2) alignment of new sequences to an alignment. Fixed a serious bugwhich assigned weights to the wrong se
13、quences. Now also, weights sequences according to distance from the incoming sequence. Thenew weights are: tree weights * similarity to incoming sequence.The tree weights are the old weights that we derive from the treeconnecting all the sequences in the existing alignment.3) for all platforms, outp
14、ut linelength = 60.4) Bootstrap files (*.phb): the final node (arbitrary trichotomyat the end of the neighbor-joining process) is labelled as TRICHOTOMY in the bootstrap output files. This is to helplink bootstrap figures with nodes when you reroot the tree.5) Command line /bootstrap option now more
15、 robust.INTRODUCTIONThis document gives some BRIEF notes about usage of the Clustal Wmultiple alignment program for UNIX and VMS machines. Clustal Wis a major update and rewrite of the Clustal V program which was described in:Higgins, D.G., Bleasby, A.J. and Fuchs, R. (1992)CLUSTAL V: improved softw
16、are for multiple sequence alignment.Computer Applications in the Biosciences (CABIOS), 8(2):189-191.The main new features are a greatly improved (more sensitive)multiple alignment procedure for proteins and improved supportfor different file formats. This software was described in: improving the sen
17、sitivity of progressive multiplesequence alignment through sequence weighting, position specificgap penalties and weight matrix choice.Nucleic Acids Research, 22(22):The usage of Clustal W is largely the same as forClustal V details of which are described in clustalv.doc. Details of thenew alignment
18、 algorithms are described in the manuscript byThompson et. al. above, an ascii/text version of which is included (clustalw.ms). This file lists some of the details not covered by either of the above documents.There are brief notes on the following topics:1) Installation for VMS and UNIX and MAC and
19、PC2) File input3) file output4) changes to the alignment algorithms5) minor modifications to the phylogenetic tree and bootstrapping methods6) summary of the command line usage.-1) INSTALLATION (for Unix, VAX/VMS, PC and MAC)*IMPORTANT*If you wish to recompile the program (or compile it for the firs
20、ttime; you will have to do this with UNIX or VAX):first check the file CLUSTALW.H which needs to be changed if youmove the code from between unix and vms machines. At the topof the file are four lines which define one of VMS, MSDOS, MAC orUNIX to be 1. All of these EXCEPT one must be commented outus
21、ing enclosed /* . */. *Unix-Make files are supplied for unix machines. The code was compiled andtested using Decstation (Ultrix), SUN (Gnu C compiler/gcc), SiliconGraphics (IRIX) and DEC/Alpha (OSF1). We have not tested the code on any othersystems. Just use makefile to make on most systems. For Sun
22、, you need tohave the Gnuc C (gcc) compiler installed . use the file makefile.sun in thiscase. You make the program with:make (or make -f makefile.sun)This produces the file clustalw which can be run by typing clustalw andpressing return. The help file is called clustalw_helpVMS-There is a small DCL
23、 command file (VMSLINK.COM) to compile and link thecode for VMS machines (vax or alpha). This procedure just compiles thesource files and links using default settings. Run it using:$ vmslinkThis produces Clustalw.exe which can be run using the run command:$ run clustalwThe intermediate object files
24、can be deleted with:$ del *.obj;There is an extensive command line facility. To use this, you mustcreate a symbol to run the program (and put this in your file).e.g.$ clustalw := $drive:dir.dirclustalwwhere $drive is the drive on which the executable file is stored (clustalw.exe)and dir.dir is the f
25、ull directory specification. NOTE THE EXTRA DOLLAR SIGN.Then the program can be run using the command:$ clustalwThe help file is called clustalw.hlp . this must be defined to be clustalw_help using the command:$ define clustalw_help $drive:dir.dirclustalw.hlpwhere $drive is the drive name and dir.di
26、r is the name of thedirectory where the help file is stored.PC_We supply two executable files (Clustalw.exe and Clwbig.exe) which will run using MSDOS. They will also run under windows (as a DOS application) * IF you have a maths coprocessor*. If you do not have a maths chip (e.g. 80387), the progra
27、m can only be run under MSDOS. In the latter case, you must have the file EMU387.exe in the same directory as CLUSTALW.EXE. This file emulates a maths chip if you do not have one. We generated these executable files using gnu c for MSDOS. It will also compile (with about 10,000 warning messages)usin
28、g Microsoft C but we have not tested it and there appear to be problemswith the executable. You will need to use a memory extender to allow the program to get at more than 640kb of memory.Clustalw.exe: up to 100 sequences of max. length 1500 residues (including GAPS)Clwbig.exe: up to 150 sequences of max. length 2600 residues (including GAPS)MAC-The code comp
copyright@ 2008-2022 冰豆网网站版权所有
经营许可证编号:鄂ICP备2022015515号-1