This book looks at the mathematical foundations of the models currently in use. This is crucial for the correct interpretation of the outputs of the models. A bioinformatician should be able not only to use software packages, but also to know the mathematics behind these packages. From this point of view, mathematics departments throughout the world have a major role to play in bioinformatics education by teaching courses on the mathematical foundations of the subject. Based on the courses taught by the author the book combines several topics in biological sequence analysis with mathematical and statistical material required for such analysis.
part i sequence analysis
1 introduction: biological sequences
2 sequence alignment
2.1 sequence similarity
2.2 dynamic programming: global alignment
2.3 dynamic programming: local alignment
2.4 alignment with affine gap model
2.5 heuristic alignment algorithms
2.5.1 fasta
2.5.2 blast
2.6 significance of scores
2.7 multiple alignment
2.7.1 msa
2.7.2 progressive alignment
exercises
3 markov chains and hidden markov models
3.1 markov chains
3.2 hidden markov models
3.3 the viterbi algorithm
3.4 the forward algorithm
3.5 the backward algorithm and posterior decoding
3.6 parameter estimation for hmms
3.6.1 estimation when paths are known
3.6.2 estimation when paths are unknown
3.7 hmms with silent states
3.8 profile hmms
3.9 multiple sequence alignment by profile hmms
exercises
protein folding
4.1 levels of protein structure
4.2 prediction by profile hmms
4.3 threading
4.4 molecular modeling
4.5 lattice hp-model
exercises
5 phylogenetic reconstruction
5.1 phylogenetic trees
5.2 parsimony methods
5.3 distance methods
5.4 evolutionary models
5.4.1 the jukes-cantor model
5.4.2 the kimura model
5.4.3 the felsenstein model
5.4.4 the hasegawa-kishino-yano (hky) model
5.5 maximum likelihood method
5.6 model comparison
exercises
part ii mathematical background for sequence analysis
6 elements of probability theory
6.1 sample spaces and events
6.2 probability measure
6.3 conditional probability
6.4 random variables
6.5 integration of random variables
6.6 monotone functions on the real line
6.7 distribution functions
6.8 common types of random variables
6.8.1 the discrete type
6.8.2 the continuous type
6.9 common discrete and continuous distributions
6.9.1 the discrete case
6.9.2 the continuous case
6.10 vector-valued random variables
6.11 sequences of random variables
exercises
7 significance of sequence alignment scores
7.1 the problem
7.2 random walks
7.3 significance of scores
exercises
elements of statistics
8.1 statistical modeling
8.2 parameter estimation
8.3 hypothesis testing
8.4 significance of scores for global alignments
exercises
9 substitution matrices
9.1 the general form of a substitution matrix.
9.2 pam substitution matrices
9.3 blosum substitution matrices
exercises
references
index