Note that a new beta version of Gambit is available; it has no expiration date.
requirements, downloads, and installing
AN INTRODUCTION TO BOOTSTRAPPERS GAMBIT
(Copyright 2006 J.A. Lake, All rights Reserved)
It's not generally appreciated that molecular sequence analysis is a field in its infancy. Thus it
is an inexact science, in which there are few analytical tools that are based on general
mathematical principles. As a result many, perhaps most, phylogenetic trees reconstructed from
molecular sequences are incorrect, because they make mathematical assumptions that are not met
by the data being analyzed. Frequently these incorrect assumptions lead to long branch attraction.
Long Branch Attraction
Long branch attraction can be caused by one or more of three pitfalls of sequence analysis.
For all three the effects are the same, In trees artifactually produced by long branch attraction,
rapidly evolving sequences (represented by long branches on unrooted phylogenetic trees) will be
placed with other rapidly evolving sequences, even if the sequences are only distantly related. In
comparison with most problems in molecular biology, which can be solved by acquiring more
data, long branch attractions are diabolical. When long branch attractions are present, if longer
sequences are used, the incorrect solution will be even more strongly supported.
Specifically the mathematical steps in sequence analysis that produce this pitfall are; i. incorrect
sequence alignments, caused by inadequate mathematical models and often related specifically to
biases created by progressive alignment algorithms when they are used to align more that three
taxa (organisms); ii. the failure to account properly for site to site variation (all sites within
sequences can evolve at different rates, yet most algorithms assume they evolve at the same rate),
and iii. unequal rate effects (the inability of most tree building algorithms to produce good
phylogenetic trees when genes from different taxa in the tree evolve at different rates). Of the
three pitfalls, alignment artifacts are potentially the most serious, because even if one solves the
second and third problems, then misalignments can still produce incorrect trees. General
algorithms are available for pitfalls two (site to site variation) and three (unequal rate effects) and
are incorporated in the Gambit program, but none are available for the alignment problem.
Gambit contains algorithms not significantly affected by site to site variation or by unequal rate
effects. Specifically, paralinear (logdet) distances, is a truly additive method for determining
distances between sequences. Since Paralinear distances is based on a very general Markov
model, it is not significantly affected by unequal rate effects. Also, Pattern Filtering is a
demonstrably optimal method for estimating the variation of rates at different sequence sites, and
as such, is not significantly affected by site to site variation effects. Both of these methods are
available in Gambit.
Tree Reconstruction
Determining globally optimal, multi-taxon phylogenetic trees is also computationally intensive
because the number of possible trees increases rapidly with increasing taxa. (For four taxa, three
unrooted trees must be compared, whereas for thirteen taxa, 13,749,310,575 trees
must be compared.) Given such large numbers it is difficult to search exhaustively more than 12-
13 taxon trees even using the branch and bound algorithm (4). Gambit approaches this problem
in a unique way. Once Gambit finds a solution (using heuristic methods), it uses the data to
estimate the probability that a better solution exists (5). Gambit then accepts only solutions for
which better solutions are unlikely (at either the 95% or 99% confidence levels). With these
methods it is possible to calculate "best" trees in reasonable times for 15 - 30 taxa, depending upon
the sequence data.
An additional difficulty found when constructing multiple taxon trees, is that many different
optimality criteria are used for evaluating the "best" multi-taxon trees. For example, distance trees
can be reconstructed by searching for local minima using least-squares criteria, or by the criterion
of minimum distance, whereas parsimony methods minimize the number of nucleotide changes
often using global searches. Bootstrappers Gambit is a multi-taxon tree reconstruction algorithm
designed so that it can be used with most, if not all, phylogenetic methods. It uses a probability
criterion as a common basis for comparing trees derived using diverse methods.
Bootstrappers Gambit combines various algorithms for phylogenetic analysis into a single
package. The program is designed for personal computers and runs on Windows operating systems.
Among the phylogenetic reconstruction methods
accommodated in Gambit, in addition to Paralinear distances are: Jukes-Cantor distances, Kimura
two parameter distances, a 6 parameter distance method based on the evolutionary parsimony
assumptions (Lake, unpublished), maximum parsimony, evolutionary parsimony, and a symmetric
transversion parsimony. Other algorithms, such as maximum likelihood are being added.
Installing Gambit onto your Windows personal computer is simple. You can download
the four Gambit files as a single zipped file and unzip them; alternatively, you may choose to
download the four files individually.
(If you receive Gambit on disk, install it directly from the disk.)
The four files of Gambit are
ReadMe.pdf, which can be read with a variety of applications including
Adobe Acrobat Reader,
LOPH1294.CUT, a sample metazoan data set slightly modified from Halanych, Bacheller, Aguinaldo, Hillis, and Lake,
Science, 267, 1641-43, 1995),
GAM95.xyz, the Gambit program for phylogenetic analysis, &
SWAPC.xyz, which is useful for manipulating sequence files.
You will need to replace the .xyz of the last two files with .exe after
they are downloaded. You will then be able to run Gambit by
double-clicking its icon.
To obtain Gambit, please choose the appropriate option below.