Brian S Everitt, Sabine Landau, Morven Leese, Daniel Stahl
Cluster Analysis
By Brian S. Everitt, Sabine Landau, Morven Leese et al.
Brian S Everitt, Sabine Landau, Morven Leese, Daniel Stahl
Cluster Analysis
By Brian S. Everitt, Sabine Landau, Morven Leese et al.
- Gebundenes Buch
- Merkliste
- Auf die Merkliste
- Bewerten Bewerten
- Teilen
- Produkt teilen
- Produkterinnerung
- Produkterinnerung
Cluster analysis comprises a range of methods for classifying multivariate data into subgroups. By organizing multivariate data into such subgroups, clustering can help reveal the characteristics of any structure or patterns present. These techniques have proven useful in a wide range of areas such as medicine, psychology, market research and bioinformatics.
This fifth edition of the highly successful Cluster Analysis includes coverage of the latest developments in the field and a new chapter dealing with finite mixture models for structured data. Real life examples are used throughout to…mehr
Andere Kunden interessierten sich auch für
- Peter J. HuberData Analysis149,99 €
- Methodology of Longitudinal Surveys151,99 €
- Michael J. PanikGrowth Curve Modeling159,99 €
- Leonard KaufmanFinding Groups in Data154,99 €
- Geoffrey McLachlanFinite Mixture Models217,99 €
- P. M. KroonenbergApplied Multiway Data Analysis192,99 €
- Eric J. BehCorrespondence Analysis120,99 €
-
-
-
Cluster analysis comprises a range of methods for classifying multivariate data into subgroups. By organizing multivariate data into such subgroups, clustering can help reveal the characteristics of any structure or patterns present. These techniques have proven useful in a wide range of areas such as medicine, psychology, market research and bioinformatics.
This fifth edition of the highly successful Cluster Analysis includes coverage of the latest developments in the field and a new chapter dealing with finite mixture models for structured data.
Real life examples are used throughout to demonstrate the application of the theory, and figures are used extensively to illustrate graphical techniques. The book is comprehensive yet relatively non-mathematical, focusing on the practical aspects of cluster analysis.
Key Features:
Presents a comprehensive guide to clustering techniques, with focus on the practical aspects of cluster analysis.
Provides a thorough revision of the fourth edition, including new developments in clustering longitudinal data and examples from bioinformatics and gene studies
Updates the chapter on mixture models to include recent developments and presents a new chapter on mixture modeling for structured data.
Practitioners and researchers working in cluster analysis and data analysis will benefit from this book.
This fifth edition of the highly successful Cluster Analysis includes coverage of the latest developments in the field and a new chapter dealing with finite mixture models for structured data.
Real life examples are used throughout to demonstrate the application of the theory, and figures are used extensively to illustrate graphical techniques. The book is comprehensive yet relatively non-mathematical, focusing on the practical aspects of cluster analysis.
Key Features:
Presents a comprehensive guide to clustering techniques, with focus on the practical aspects of cluster analysis.
Provides a thorough revision of the fourth edition, including new developments in clustering longitudinal data and examples from bioinformatics and gene studies
Updates the chapter on mixture models to include recent developments and presents a new chapter on mixture modeling for structured data.
Practitioners and researchers working in cluster analysis and data analysis will benefit from this book.
Produktdetails
- Produktdetails
- Wiley Series in Probability and Statistics
- Verlag: Wiley & Sons
- Artikelnr. des Verlages: 14574991000
- 5. Aufl.
- Seitenzahl: 352
- Erscheinungstermin: 21. Februar 2011
- Englisch
- Abmessung: 235mm x 157mm x 23mm
- Gewicht: 640g
- ISBN-13: 9780470749913
- ISBN-10: 0470749911
- Artikelnr.: 30589054
- Wiley Series in Probability and Statistics
- Verlag: Wiley & Sons
- Artikelnr. des Verlages: 14574991000
- 5. Aufl.
- Seitenzahl: 352
- Erscheinungstermin: 21. Februar 2011
- Englisch
- Abmessung: 235mm x 157mm x 23mm
- Gewicht: 640g
- ISBN-13: 9780470749913
- ISBN-10: 0470749911
- Artikelnr.: 30589054
Brian S. Everitt, Head of the Biostatistics and Computing Department and Professor of Behavioural Statistics, Kings College London. He has authored/ co-authored over 50 books on statistics and approximately 100 papers and other articles, and is also joint editor of Statistical Methods in Medical Research. Dr Sabine Landau, Head of Department of Biostatistics, Institute of Psychiatry, Kings College London. Dr Morven Leese, Health Service and Population Research, Institute of Psychiatry, Kings College London. Dr Daniel Stahl, Deptartment of Biostatistics & Computing, Institute of Psychiatry, Kings College London.
Preface. Acknowledgement. 1 An Introduction to classification and
clustering. 1.1 Introduction. 1.2 Reasons for classifying. 1.3 Numerical
methods of classification - cluster analysis. 1.4 What is a cluster? 1.5
Examples of the use of clustering. 1.5.1 Market research. 1.5.2 Astronomy.
1.5.3 Psychiatry. 1.5.4 Weather classification. 1.5.5 Archaeology. 1.5.6
Bioinformatics and genetics. 1.6 Summary. 2 Detecting clusters graphically.
2.1 Introduction. 2.2 Detecting clusters with univariate and bivariate
plots of data. 2.2.1 Histograms. 2.2.2 Scatterplots. 2.2.3 Density
estimation. 2.2.4 Scatterplot matrices. 2.3 Using lower-dimensional
projections of multivariate data for graphical representations. 2.3.1
Principal components analysis of multivariate data. 2.3.2 Exploratory
projection pursuit. 2.3.3 Multidimensional scaling. 2.4 Three-dimensional
plots and trellis graphics. 2.5 Summary. 3 Measurement of proximity. 3.1
Introduction. 3.2 Similarity measures for categorical data. 3.2.1
Similarity measures for binary data. 3.2.2 Similarity measures for
categorical data with more than two levels. 3.3 Dissimilarity and distance
measures for continuous data. 3.4 Similarity measures for data containing
both continuous and categorical variables. 3.5 Proximity measures for
structured data. 3.6 Inter-group proximity measures. 3.6.1 Inter-group
proximity derived from the proximity matrix. 3.6.2 Inter-group proximity
based on group summaries for continuous data. 3.6.3 Inter-group proximity
based on group summaries for categorical data. 3.7 Weighting variables. 3.8
Standardization. 3.9 Choice of proximity measure. 3.10 Summary. 4
Hierarchical clustering. 4.1 Introduction. 4.2 Agglomerative methods. 4.2.1
Illustrative examples of agglomerative methods. 4.2.2 The standard
agglomerative methods. 4.2.3 Recurrence formula for agglomerative methods.
4.2.4 Problems of agglomerative hierarchical methods. 4.2.5 Empirical
studies of hierarchical agglomerative methods. 4.3 Divisive methods. 4.3.1
Monothetic divisive methods. 4.3.2 Polythetic divisive methods. 4.4
Applying the hierarchical clustering process. 4.4.1 Dendrograms and other
tree representations. 4.4.2 Comparing dendrograms and measuring their
distortion. 4.4.3 Mathematical properties of hierarchical methods. 4.4.4
Choice of partition - the problem of the number of groups. 4.4.5
Hierarchical algorithms. 4.4.6 Methods for large data sets. 4.5
Applications of hierarchical methods. 4.5.1 Dolphin whistles -
agglomerative clustering. 4.5.2 Needs of psychiatric patients - monothetic
divisive clustering. 4.5.3 Globalization of cities - polythetic divisive
method. 4.5.4 Women's life histories - divisive clustering of sequence
data. 4.5.5 Composition of mammals' milk - exemplars, dendrogram seriation
and choice of partition. 4.6 Summary. 5 Optimization clustering techniques.
5.1 Introduction. 5.2 Clustering criteria derived from the dissimilarity
matrix. 5.3 Clustering criteria derived from continuous data. 5.3.1
Minimization of trace(W). 5.3.2 Minimization of det(W). 5.3.3 Maximization
of trace (BW1). 5.3.4 Properties of the clustering criteria. 5.3.5
Alternative criteria for clusters of different shapes and sizes. 5.4
Optimization algorithms. 5.4.1 Numerical example. 5.4.2 More on k-means.
5.4.3 Software implementations of optimization clustering. 5.5 Choosing the
number of clusters. 5.6 Applications of optimization methods. 5.6.1 Survey
of student attitudes towards video games. 5.6.2 Air pollution indicators
for US cities. 5.6.3 Aesthetic judgement of painters. 5.6.4 Classification
of 'nonspecific' back pain. 5.7 Summary. 6 Finite mixture densities as
models for cluster analysis. 6.1 Introduction. 6.2 Finite mixture
densities. 6.2.1 Maximum likelihood estimation. 6.2.2 Maximum likelihood
estimation of mixtures of multivariate normal densities. 6.2.3 Problems
with maximum likelihood estimation of finite mixture models using the EM
algorithm. 6.3 Other finite mixture densities. 6.3.1 Mixtures of
multivariate t-distributions. 6.3.2 Mixtures for categorical data - latent
class analysis. 6.3.3 Mixture models for mixed-mode data. 6.4 Bayesian
analysis of mixtures. 6.4.1 Choosing a prior distribution. 6.4.2 Label
switching. 6.4.3 Markov chain Monte Carlo samplers. 6.5 Inference for
mixture models with unknown number of components and model structure. 6.5.1
Log-likelihood ratio test statistics. 6.5.2 Information criteria. 6.5.3
Bayes factors. 6.5.4 Markov chain Monte Carlo methods. 6.6 Dimension
reduction - variable selection in finite mixture modelling. 6.7 Finite
regression mixtures. 6.8 Software for finite mixture modelling. 6.9 Some
examples of the application of finite mixture densities. 6.9.1 Finite
mixture densities with univariate Gaussian components. 6.9.2 Finite mixture
densities with multivariate Gaussian components. 6.9.3 Applications of
latent class analysis. 6.9.4 Application of a mixture model with different
component densities. 6.10 Summary. 7 Model-based cluster analysis for
structured data. 7.1 Introduction. 7.2 Finite mixture models for structured
data. 7.3 Finite mixtures of factor models. 7.4 Finite mixtures of
longitudinal models. 7.5 Applications of finite mixture models for
structured data. 7.5.1 Application of finite mixture factor analysis to the
'categorical versus dimensional representation' debate. 7.5.2 Application
of finite mixture confirmatory factor analysis to cluster genes using
replicated microarray experiments. 7.5.3 Application of finite mixture
exploratory factor analysis to cluster Italian wines. 7.5.4 Application of
growth mixture modelling to identify distinct developmental trajectories.
7.5.5 Application of growth mixture modelling to identify trajectories of
perinatal depressive symptomatology. 7.6 Summary. 8 Miscellaneous
clustering methods. 8.1 Introduction. 8.2 Density search clustering
techniques. 8.2.1 Mode analysis. 8.2.2 Nearest-neighbour clustering
procedures. 8.3 Density-based spatial clustering of applications with
noise. 8.4 Techniques which allow overlapping clusters. 8.4.1 Clumping and
related techniques. 8.4.2 Additive clustering. 8.4.3 Application of MAPCLUS
to data on social relations in a monastery. 8.4.4 Pyramids. 8.4.5
Application of pyramid clustering to gene sequences of yeasts. 8.5
Simultaneous clustering of objects and variables. 8.5.1 Hierarchical
classes. 8.5.2 Application of hierarchical classes to psychiatric symptoms.
8.5.3 The error variance technique. 8.5.4 Application of the error variance
technique to appropriateness of behaviour data. 8.6 Clustering with
constraints. 8.6.1 Contiguity constraints. 8.6.2 Application of
contiguity-constrained clustering. 8.7 Fuzzy clustering. 8.7.1 Methods for
fuzzy cluster analysis. 8.7.2 The assessment of fuzzy clustering. 8.7.3
Application of fuzzy cluster analysis to Roman glass composition. 8.8
Clustering and artificial neural networks. 8.8.1 Components of a neural
network. 8.8.2 The Kohonen self-organizing map. 8.8.3 Application of neural
nets to brainstorming sessions. 8.9 Summary. 9 Some final comments and
guidelines. 9.1 Introduction. 9.2 Using clustering techniques in practice.
9.3 Testing for absence of structure. 9.4 Methods for comparing cluster
solutions. 9.4.1 Comparing partitions. 9.4.2 Comparing dendrograms. 9.4.3
Comparing proximity matrices. 9.5 Internal cluster quality, influence and
robustness. 9.5.1 Internal cluster quality. 9.5.2 Robustness - split-sample
validation and consensus trees. 9.5.3 Influence of individual points. 9.6
Displaying cluster solutions graphically. 9.7 Illustrative examples. 9.7.1
Indo-European languages - a consensus tree in linguistics. 9.7.2 Scotch
whisky tasting - cophenetic matrices for comparing clusterings. 9.7.3
Chemical compounds in the pharmaceutical industry. 9.7.4 Evaluating
clustering algorithms for gene expression data. 9.8 Summary. Bibliography.
Index.
clustering. 1.1 Introduction. 1.2 Reasons for classifying. 1.3 Numerical
methods of classification - cluster analysis. 1.4 What is a cluster? 1.5
Examples of the use of clustering. 1.5.1 Market research. 1.5.2 Astronomy.
1.5.3 Psychiatry. 1.5.4 Weather classification. 1.5.5 Archaeology. 1.5.6
Bioinformatics and genetics. 1.6 Summary. 2 Detecting clusters graphically.
2.1 Introduction. 2.2 Detecting clusters with univariate and bivariate
plots of data. 2.2.1 Histograms. 2.2.2 Scatterplots. 2.2.3 Density
estimation. 2.2.4 Scatterplot matrices. 2.3 Using lower-dimensional
projections of multivariate data for graphical representations. 2.3.1
Principal components analysis of multivariate data. 2.3.2 Exploratory
projection pursuit. 2.3.3 Multidimensional scaling. 2.4 Three-dimensional
plots and trellis graphics. 2.5 Summary. 3 Measurement of proximity. 3.1
Introduction. 3.2 Similarity measures for categorical data. 3.2.1
Similarity measures for binary data. 3.2.2 Similarity measures for
categorical data with more than two levels. 3.3 Dissimilarity and distance
measures for continuous data. 3.4 Similarity measures for data containing
both continuous and categorical variables. 3.5 Proximity measures for
structured data. 3.6 Inter-group proximity measures. 3.6.1 Inter-group
proximity derived from the proximity matrix. 3.6.2 Inter-group proximity
based on group summaries for continuous data. 3.6.3 Inter-group proximity
based on group summaries for categorical data. 3.7 Weighting variables. 3.8
Standardization. 3.9 Choice of proximity measure. 3.10 Summary. 4
Hierarchical clustering. 4.1 Introduction. 4.2 Agglomerative methods. 4.2.1
Illustrative examples of agglomerative methods. 4.2.2 The standard
agglomerative methods. 4.2.3 Recurrence formula for agglomerative methods.
4.2.4 Problems of agglomerative hierarchical methods. 4.2.5 Empirical
studies of hierarchical agglomerative methods. 4.3 Divisive methods. 4.3.1
Monothetic divisive methods. 4.3.2 Polythetic divisive methods. 4.4
Applying the hierarchical clustering process. 4.4.1 Dendrograms and other
tree representations. 4.4.2 Comparing dendrograms and measuring their
distortion. 4.4.3 Mathematical properties of hierarchical methods. 4.4.4
Choice of partition - the problem of the number of groups. 4.4.5
Hierarchical algorithms. 4.4.6 Methods for large data sets. 4.5
Applications of hierarchical methods. 4.5.1 Dolphin whistles -
agglomerative clustering. 4.5.2 Needs of psychiatric patients - monothetic
divisive clustering. 4.5.3 Globalization of cities - polythetic divisive
method. 4.5.4 Women's life histories - divisive clustering of sequence
data. 4.5.5 Composition of mammals' milk - exemplars, dendrogram seriation
and choice of partition. 4.6 Summary. 5 Optimization clustering techniques.
5.1 Introduction. 5.2 Clustering criteria derived from the dissimilarity
matrix. 5.3 Clustering criteria derived from continuous data. 5.3.1
Minimization of trace(W). 5.3.2 Minimization of det(W). 5.3.3 Maximization
of trace (BW1). 5.3.4 Properties of the clustering criteria. 5.3.5
Alternative criteria for clusters of different shapes and sizes. 5.4
Optimization algorithms. 5.4.1 Numerical example. 5.4.2 More on k-means.
5.4.3 Software implementations of optimization clustering. 5.5 Choosing the
number of clusters. 5.6 Applications of optimization methods. 5.6.1 Survey
of student attitudes towards video games. 5.6.2 Air pollution indicators
for US cities. 5.6.3 Aesthetic judgement of painters. 5.6.4 Classification
of 'nonspecific' back pain. 5.7 Summary. 6 Finite mixture densities as
models for cluster analysis. 6.1 Introduction. 6.2 Finite mixture
densities. 6.2.1 Maximum likelihood estimation. 6.2.2 Maximum likelihood
estimation of mixtures of multivariate normal densities. 6.2.3 Problems
with maximum likelihood estimation of finite mixture models using the EM
algorithm. 6.3 Other finite mixture densities. 6.3.1 Mixtures of
multivariate t-distributions. 6.3.2 Mixtures for categorical data - latent
class analysis. 6.3.3 Mixture models for mixed-mode data. 6.4 Bayesian
analysis of mixtures. 6.4.1 Choosing a prior distribution. 6.4.2 Label
switching. 6.4.3 Markov chain Monte Carlo samplers. 6.5 Inference for
mixture models with unknown number of components and model structure. 6.5.1
Log-likelihood ratio test statistics. 6.5.2 Information criteria. 6.5.3
Bayes factors. 6.5.4 Markov chain Monte Carlo methods. 6.6 Dimension
reduction - variable selection in finite mixture modelling. 6.7 Finite
regression mixtures. 6.8 Software for finite mixture modelling. 6.9 Some
examples of the application of finite mixture densities. 6.9.1 Finite
mixture densities with univariate Gaussian components. 6.9.2 Finite mixture
densities with multivariate Gaussian components. 6.9.3 Applications of
latent class analysis. 6.9.4 Application of a mixture model with different
component densities. 6.10 Summary. 7 Model-based cluster analysis for
structured data. 7.1 Introduction. 7.2 Finite mixture models for structured
data. 7.3 Finite mixtures of factor models. 7.4 Finite mixtures of
longitudinal models. 7.5 Applications of finite mixture models for
structured data. 7.5.1 Application of finite mixture factor analysis to the
'categorical versus dimensional representation' debate. 7.5.2 Application
of finite mixture confirmatory factor analysis to cluster genes using
replicated microarray experiments. 7.5.3 Application of finite mixture
exploratory factor analysis to cluster Italian wines. 7.5.4 Application of
growth mixture modelling to identify distinct developmental trajectories.
7.5.5 Application of growth mixture modelling to identify trajectories of
perinatal depressive symptomatology. 7.6 Summary. 8 Miscellaneous
clustering methods. 8.1 Introduction. 8.2 Density search clustering
techniques. 8.2.1 Mode analysis. 8.2.2 Nearest-neighbour clustering
procedures. 8.3 Density-based spatial clustering of applications with
noise. 8.4 Techniques which allow overlapping clusters. 8.4.1 Clumping and
related techniques. 8.4.2 Additive clustering. 8.4.3 Application of MAPCLUS
to data on social relations in a monastery. 8.4.4 Pyramids. 8.4.5
Application of pyramid clustering to gene sequences of yeasts. 8.5
Simultaneous clustering of objects and variables. 8.5.1 Hierarchical
classes. 8.5.2 Application of hierarchical classes to psychiatric symptoms.
8.5.3 The error variance technique. 8.5.4 Application of the error variance
technique to appropriateness of behaviour data. 8.6 Clustering with
constraints. 8.6.1 Contiguity constraints. 8.6.2 Application of
contiguity-constrained clustering. 8.7 Fuzzy clustering. 8.7.1 Methods for
fuzzy cluster analysis. 8.7.2 The assessment of fuzzy clustering. 8.7.3
Application of fuzzy cluster analysis to Roman glass composition. 8.8
Clustering and artificial neural networks. 8.8.1 Components of a neural
network. 8.8.2 The Kohonen self-organizing map. 8.8.3 Application of neural
nets to brainstorming sessions. 8.9 Summary. 9 Some final comments and
guidelines. 9.1 Introduction. 9.2 Using clustering techniques in practice.
9.3 Testing for absence of structure. 9.4 Methods for comparing cluster
solutions. 9.4.1 Comparing partitions. 9.4.2 Comparing dendrograms. 9.4.3
Comparing proximity matrices. 9.5 Internal cluster quality, influence and
robustness. 9.5.1 Internal cluster quality. 9.5.2 Robustness - split-sample
validation and consensus trees. 9.5.3 Influence of individual points. 9.6
Displaying cluster solutions graphically. 9.7 Illustrative examples. 9.7.1
Indo-European languages - a consensus tree in linguistics. 9.7.2 Scotch
whisky tasting - cophenetic matrices for comparing clusterings. 9.7.3
Chemical compounds in the pharmaceutical industry. 9.7.4 Evaluating
clustering algorithms for gene expression data. 9.8 Summary. Bibliography.
Index.
Preface. Acknowledgement. 1 An Introduction to classification and
clustering. 1.1 Introduction. 1.2 Reasons for classifying. 1.3 Numerical
methods of classification - cluster analysis. 1.4 What is a cluster? 1.5
Examples of the use of clustering. 1.5.1 Market research. 1.5.2 Astronomy.
1.5.3 Psychiatry. 1.5.4 Weather classification. 1.5.5 Archaeology. 1.5.6
Bioinformatics and genetics. 1.6 Summary. 2 Detecting clusters graphically.
2.1 Introduction. 2.2 Detecting clusters with univariate and bivariate
plots of data. 2.2.1 Histograms. 2.2.2 Scatterplots. 2.2.3 Density
estimation. 2.2.4 Scatterplot matrices. 2.3 Using lower-dimensional
projections of multivariate data for graphical representations. 2.3.1
Principal components analysis of multivariate data. 2.3.2 Exploratory
projection pursuit. 2.3.3 Multidimensional scaling. 2.4 Three-dimensional
plots and trellis graphics. 2.5 Summary. 3 Measurement of proximity. 3.1
Introduction. 3.2 Similarity measures for categorical data. 3.2.1
Similarity measures for binary data. 3.2.2 Similarity measures for
categorical data with more than two levels. 3.3 Dissimilarity and distance
measures for continuous data. 3.4 Similarity measures for data containing
both continuous and categorical variables. 3.5 Proximity measures for
structured data. 3.6 Inter-group proximity measures. 3.6.1 Inter-group
proximity derived from the proximity matrix. 3.6.2 Inter-group proximity
based on group summaries for continuous data. 3.6.3 Inter-group proximity
based on group summaries for categorical data. 3.7 Weighting variables. 3.8
Standardization. 3.9 Choice of proximity measure. 3.10 Summary. 4
Hierarchical clustering. 4.1 Introduction. 4.2 Agglomerative methods. 4.2.1
Illustrative examples of agglomerative methods. 4.2.2 The standard
agglomerative methods. 4.2.3 Recurrence formula for agglomerative methods.
4.2.4 Problems of agglomerative hierarchical methods. 4.2.5 Empirical
studies of hierarchical agglomerative methods. 4.3 Divisive methods. 4.3.1
Monothetic divisive methods. 4.3.2 Polythetic divisive methods. 4.4
Applying the hierarchical clustering process. 4.4.1 Dendrograms and other
tree representations. 4.4.2 Comparing dendrograms and measuring their
distortion. 4.4.3 Mathematical properties of hierarchical methods. 4.4.4
Choice of partition - the problem of the number of groups. 4.4.5
Hierarchical algorithms. 4.4.6 Methods for large data sets. 4.5
Applications of hierarchical methods. 4.5.1 Dolphin whistles -
agglomerative clustering. 4.5.2 Needs of psychiatric patients - monothetic
divisive clustering. 4.5.3 Globalization of cities - polythetic divisive
method. 4.5.4 Women's life histories - divisive clustering of sequence
data. 4.5.5 Composition of mammals' milk - exemplars, dendrogram seriation
and choice of partition. 4.6 Summary. 5 Optimization clustering techniques.
5.1 Introduction. 5.2 Clustering criteria derived from the dissimilarity
matrix. 5.3 Clustering criteria derived from continuous data. 5.3.1
Minimization of trace(W). 5.3.2 Minimization of det(W). 5.3.3 Maximization
of trace (BW1). 5.3.4 Properties of the clustering criteria. 5.3.5
Alternative criteria for clusters of different shapes and sizes. 5.4
Optimization algorithms. 5.4.1 Numerical example. 5.4.2 More on k-means.
5.4.3 Software implementations of optimization clustering. 5.5 Choosing the
number of clusters. 5.6 Applications of optimization methods. 5.6.1 Survey
of student attitudes towards video games. 5.6.2 Air pollution indicators
for US cities. 5.6.3 Aesthetic judgement of painters. 5.6.4 Classification
of 'nonspecific' back pain. 5.7 Summary. 6 Finite mixture densities as
models for cluster analysis. 6.1 Introduction. 6.2 Finite mixture
densities. 6.2.1 Maximum likelihood estimation. 6.2.2 Maximum likelihood
estimation of mixtures of multivariate normal densities. 6.2.3 Problems
with maximum likelihood estimation of finite mixture models using the EM
algorithm. 6.3 Other finite mixture densities. 6.3.1 Mixtures of
multivariate t-distributions. 6.3.2 Mixtures for categorical data - latent
class analysis. 6.3.3 Mixture models for mixed-mode data. 6.4 Bayesian
analysis of mixtures. 6.4.1 Choosing a prior distribution. 6.4.2 Label
switching. 6.4.3 Markov chain Monte Carlo samplers. 6.5 Inference for
mixture models with unknown number of components and model structure. 6.5.1
Log-likelihood ratio test statistics. 6.5.2 Information criteria. 6.5.3
Bayes factors. 6.5.4 Markov chain Monte Carlo methods. 6.6 Dimension
reduction - variable selection in finite mixture modelling. 6.7 Finite
regression mixtures. 6.8 Software for finite mixture modelling. 6.9 Some
examples of the application of finite mixture densities. 6.9.1 Finite
mixture densities with univariate Gaussian components. 6.9.2 Finite mixture
densities with multivariate Gaussian components. 6.9.3 Applications of
latent class analysis. 6.9.4 Application of a mixture model with different
component densities. 6.10 Summary. 7 Model-based cluster analysis for
structured data. 7.1 Introduction. 7.2 Finite mixture models for structured
data. 7.3 Finite mixtures of factor models. 7.4 Finite mixtures of
longitudinal models. 7.5 Applications of finite mixture models for
structured data. 7.5.1 Application of finite mixture factor analysis to the
'categorical versus dimensional representation' debate. 7.5.2 Application
of finite mixture confirmatory factor analysis to cluster genes using
replicated microarray experiments. 7.5.3 Application of finite mixture
exploratory factor analysis to cluster Italian wines. 7.5.4 Application of
growth mixture modelling to identify distinct developmental trajectories.
7.5.5 Application of growth mixture modelling to identify trajectories of
perinatal depressive symptomatology. 7.6 Summary. 8 Miscellaneous
clustering methods. 8.1 Introduction. 8.2 Density search clustering
techniques. 8.2.1 Mode analysis. 8.2.2 Nearest-neighbour clustering
procedures. 8.3 Density-based spatial clustering of applications with
noise. 8.4 Techniques which allow overlapping clusters. 8.4.1 Clumping and
related techniques. 8.4.2 Additive clustering. 8.4.3 Application of MAPCLUS
to data on social relations in a monastery. 8.4.4 Pyramids. 8.4.5
Application of pyramid clustering to gene sequences of yeasts. 8.5
Simultaneous clustering of objects and variables. 8.5.1 Hierarchical
classes. 8.5.2 Application of hierarchical classes to psychiatric symptoms.
8.5.3 The error variance technique. 8.5.4 Application of the error variance
technique to appropriateness of behaviour data. 8.6 Clustering with
constraints. 8.6.1 Contiguity constraints. 8.6.2 Application of
contiguity-constrained clustering. 8.7 Fuzzy clustering. 8.7.1 Methods for
fuzzy cluster analysis. 8.7.2 The assessment of fuzzy clustering. 8.7.3
Application of fuzzy cluster analysis to Roman glass composition. 8.8
Clustering and artificial neural networks. 8.8.1 Components of a neural
network. 8.8.2 The Kohonen self-organizing map. 8.8.3 Application of neural
nets to brainstorming sessions. 8.9 Summary. 9 Some final comments and
guidelines. 9.1 Introduction. 9.2 Using clustering techniques in practice.
9.3 Testing for absence of structure. 9.4 Methods for comparing cluster
solutions. 9.4.1 Comparing partitions. 9.4.2 Comparing dendrograms. 9.4.3
Comparing proximity matrices. 9.5 Internal cluster quality, influence and
robustness. 9.5.1 Internal cluster quality. 9.5.2 Robustness - split-sample
validation and consensus trees. 9.5.3 Influence of individual points. 9.6
Displaying cluster solutions graphically. 9.7 Illustrative examples. 9.7.1
Indo-European languages - a consensus tree in linguistics. 9.7.2 Scotch
whisky tasting - cophenetic matrices for comparing clusterings. 9.7.3
Chemical compounds in the pharmaceutical industry. 9.7.4 Evaluating
clustering algorithms for gene expression data. 9.8 Summary. Bibliography.
Index.
clustering. 1.1 Introduction. 1.2 Reasons for classifying. 1.3 Numerical
methods of classification - cluster analysis. 1.4 What is a cluster? 1.5
Examples of the use of clustering. 1.5.1 Market research. 1.5.2 Astronomy.
1.5.3 Psychiatry. 1.5.4 Weather classification. 1.5.5 Archaeology. 1.5.6
Bioinformatics and genetics. 1.6 Summary. 2 Detecting clusters graphically.
2.1 Introduction. 2.2 Detecting clusters with univariate and bivariate
plots of data. 2.2.1 Histograms. 2.2.2 Scatterplots. 2.2.3 Density
estimation. 2.2.4 Scatterplot matrices. 2.3 Using lower-dimensional
projections of multivariate data for graphical representations. 2.3.1
Principal components analysis of multivariate data. 2.3.2 Exploratory
projection pursuit. 2.3.3 Multidimensional scaling. 2.4 Three-dimensional
plots and trellis graphics. 2.5 Summary. 3 Measurement of proximity. 3.1
Introduction. 3.2 Similarity measures for categorical data. 3.2.1
Similarity measures for binary data. 3.2.2 Similarity measures for
categorical data with more than two levels. 3.3 Dissimilarity and distance
measures for continuous data. 3.4 Similarity measures for data containing
both continuous and categorical variables. 3.5 Proximity measures for
structured data. 3.6 Inter-group proximity measures. 3.6.1 Inter-group
proximity derived from the proximity matrix. 3.6.2 Inter-group proximity
based on group summaries for continuous data. 3.6.3 Inter-group proximity
based on group summaries for categorical data. 3.7 Weighting variables. 3.8
Standardization. 3.9 Choice of proximity measure. 3.10 Summary. 4
Hierarchical clustering. 4.1 Introduction. 4.2 Agglomerative methods. 4.2.1
Illustrative examples of agglomerative methods. 4.2.2 The standard
agglomerative methods. 4.2.3 Recurrence formula for agglomerative methods.
4.2.4 Problems of agglomerative hierarchical methods. 4.2.5 Empirical
studies of hierarchical agglomerative methods. 4.3 Divisive methods. 4.3.1
Monothetic divisive methods. 4.3.2 Polythetic divisive methods. 4.4
Applying the hierarchical clustering process. 4.4.1 Dendrograms and other
tree representations. 4.4.2 Comparing dendrograms and measuring their
distortion. 4.4.3 Mathematical properties of hierarchical methods. 4.4.4
Choice of partition - the problem of the number of groups. 4.4.5
Hierarchical algorithms. 4.4.6 Methods for large data sets. 4.5
Applications of hierarchical methods. 4.5.1 Dolphin whistles -
agglomerative clustering. 4.5.2 Needs of psychiatric patients - monothetic
divisive clustering. 4.5.3 Globalization of cities - polythetic divisive
method. 4.5.4 Women's life histories - divisive clustering of sequence
data. 4.5.5 Composition of mammals' milk - exemplars, dendrogram seriation
and choice of partition. 4.6 Summary. 5 Optimization clustering techniques.
5.1 Introduction. 5.2 Clustering criteria derived from the dissimilarity
matrix. 5.3 Clustering criteria derived from continuous data. 5.3.1
Minimization of trace(W). 5.3.2 Minimization of det(W). 5.3.3 Maximization
of trace (BW1). 5.3.4 Properties of the clustering criteria. 5.3.5
Alternative criteria for clusters of different shapes and sizes. 5.4
Optimization algorithms. 5.4.1 Numerical example. 5.4.2 More on k-means.
5.4.3 Software implementations of optimization clustering. 5.5 Choosing the
number of clusters. 5.6 Applications of optimization methods. 5.6.1 Survey
of student attitudes towards video games. 5.6.2 Air pollution indicators
for US cities. 5.6.3 Aesthetic judgement of painters. 5.6.4 Classification
of 'nonspecific' back pain. 5.7 Summary. 6 Finite mixture densities as
models for cluster analysis. 6.1 Introduction. 6.2 Finite mixture
densities. 6.2.1 Maximum likelihood estimation. 6.2.2 Maximum likelihood
estimation of mixtures of multivariate normal densities. 6.2.3 Problems
with maximum likelihood estimation of finite mixture models using the EM
algorithm. 6.3 Other finite mixture densities. 6.3.1 Mixtures of
multivariate t-distributions. 6.3.2 Mixtures for categorical data - latent
class analysis. 6.3.3 Mixture models for mixed-mode data. 6.4 Bayesian
analysis of mixtures. 6.4.1 Choosing a prior distribution. 6.4.2 Label
switching. 6.4.3 Markov chain Monte Carlo samplers. 6.5 Inference for
mixture models with unknown number of components and model structure. 6.5.1
Log-likelihood ratio test statistics. 6.5.2 Information criteria. 6.5.3
Bayes factors. 6.5.4 Markov chain Monte Carlo methods. 6.6 Dimension
reduction - variable selection in finite mixture modelling. 6.7 Finite
regression mixtures. 6.8 Software for finite mixture modelling. 6.9 Some
examples of the application of finite mixture densities. 6.9.1 Finite
mixture densities with univariate Gaussian components. 6.9.2 Finite mixture
densities with multivariate Gaussian components. 6.9.3 Applications of
latent class analysis. 6.9.4 Application of a mixture model with different
component densities. 6.10 Summary. 7 Model-based cluster analysis for
structured data. 7.1 Introduction. 7.2 Finite mixture models for structured
data. 7.3 Finite mixtures of factor models. 7.4 Finite mixtures of
longitudinal models. 7.5 Applications of finite mixture models for
structured data. 7.5.1 Application of finite mixture factor analysis to the
'categorical versus dimensional representation' debate. 7.5.2 Application
of finite mixture confirmatory factor analysis to cluster genes using
replicated microarray experiments. 7.5.3 Application of finite mixture
exploratory factor analysis to cluster Italian wines. 7.5.4 Application of
growth mixture modelling to identify distinct developmental trajectories.
7.5.5 Application of growth mixture modelling to identify trajectories of
perinatal depressive symptomatology. 7.6 Summary. 8 Miscellaneous
clustering methods. 8.1 Introduction. 8.2 Density search clustering
techniques. 8.2.1 Mode analysis. 8.2.2 Nearest-neighbour clustering
procedures. 8.3 Density-based spatial clustering of applications with
noise. 8.4 Techniques which allow overlapping clusters. 8.4.1 Clumping and
related techniques. 8.4.2 Additive clustering. 8.4.3 Application of MAPCLUS
to data on social relations in a monastery. 8.4.4 Pyramids. 8.4.5
Application of pyramid clustering to gene sequences of yeasts. 8.5
Simultaneous clustering of objects and variables. 8.5.1 Hierarchical
classes. 8.5.2 Application of hierarchical classes to psychiatric symptoms.
8.5.3 The error variance technique. 8.5.4 Application of the error variance
technique to appropriateness of behaviour data. 8.6 Clustering with
constraints. 8.6.1 Contiguity constraints. 8.6.2 Application of
contiguity-constrained clustering. 8.7 Fuzzy clustering. 8.7.1 Methods for
fuzzy cluster analysis. 8.7.2 The assessment of fuzzy clustering. 8.7.3
Application of fuzzy cluster analysis to Roman glass composition. 8.8
Clustering and artificial neural networks. 8.8.1 Components of a neural
network. 8.8.2 The Kohonen self-organizing map. 8.8.3 Application of neural
nets to brainstorming sessions. 8.9 Summary. 9 Some final comments and
guidelines. 9.1 Introduction. 9.2 Using clustering techniques in practice.
9.3 Testing for absence of structure. 9.4 Methods for comparing cluster
solutions. 9.4.1 Comparing partitions. 9.4.2 Comparing dendrograms. 9.4.3
Comparing proximity matrices. 9.5 Internal cluster quality, influence and
robustness. 9.5.1 Internal cluster quality. 9.5.2 Robustness - split-sample
validation and consensus trees. 9.5.3 Influence of individual points. 9.6
Displaying cluster solutions graphically. 9.7 Illustrative examples. 9.7.1
Indo-European languages - a consensus tree in linguistics. 9.7.2 Scotch
whisky tasting - cophenetic matrices for comparing clusterings. 9.7.3
Chemical compounds in the pharmaceutical industry. 9.7.4 Evaluating
clustering algorithms for gene expression data. 9.8 Summary. Bibliography.
Index.