Our purpose in writing this monograph is to provide an applied documen-tation source, as well as an introduction to a collection of associated computer programs that would be of interest to applied statisticians and data analysts but also accessible to a notationally sophisticated but otherwise substantively focused user.
The content of the monograph itself and how the various parts are orga-nized can be discussed under a number of headings that serve to character-ize both the type of object arrangements to be identified and the form of the data on which the identification is to be based.
Combinatorial data analysis (CDA) refers to a wide class of methods for the study of relevant data sets in which the arrangement of a collection of objects is absolutely central. Combinatorial Data Analysis: Optimization by Dynamic Programming focuses on the identification of arrangements, which are then further restricted to where the combinatorial search is carried out by a recursive optimization process based on the general principles of dynamic programming (DP).
The authors provide a comprehensive and self-contained review delineating a very general DP paradigm, or schema, that can serve two functions. First, the paradigm can be applied in various special forms to encompass all previously proposed applications suggested in the classification literature. Second, the paradigm can lead directly to many more novel uses. An appendix is included as a user's manual for a collection of programs available as freeware.
The incorporation of a wide variety of CDA tasks under one common optimization framework based on DP is one of this book's strongest points. The authors include verifiably optimal solutions to nontrivially sized problems over the array of data analysis tasks discussed.
This monograph provides an applied documentation source, as well as an introduction to a collection of associated computer programs, that will be of interest to applied statisticians and data analysts as well as notationally sophisticated users.
Preface
1 Introduction
2 General Dynamic Programming Paradigm
2.1 An Introductory Example: Linear Assignment
2.2 The GDPP
3 Cluster Analysis
3.1 Partitioning
3.1.1 Admissibility Restrictions on Partitions
3.1.2 Partitioning Based on Two-Mode Proximity Matrices
3.2 Hierarchical Clustering
3.2.1 Hierarchical Clustering and the Optimal Fitting of Ultrametrics
3.2.2 Constrained Hierarchical Clustering
4 Object Sequencing and Seriation
4.1 Optimal Sequencing of a Single Object Set
4.1.1 Symmetric One-Mode Proximity Matrices
4.1.2 Skew-Symmetric One-Mode Proximity Matrices
4.1.3 Two-Mode Proximity Matrices
4.1.4 Object Sequencing for Symmetric One-Mode Proximity Matrices Based on the Construction of Optimal Paths
4.2 Sequencing an Object Set Subject to Precedence Constraints
4.3 Construction of Optimal Ordered Partitions
5 Heuristic Applications of the GDPP
5.1 Cluster Analysis
5.2 Object Sequencing and Seriation
6 Extensions and Generalizations
6.1 Introduction
6.1.1 Multiple Data Sources
6.1.2 Multiple Structures
6.1.3 Uses for the Information in the SetsΩ1,...,Ωk
6.1.4 A Priori Weights for Objects and/or Proximities
6.2 Prospects
Appendix: Available Programs
Bibliography
Author Index
Subject Index