2nd Edition. — Wiley, 2009. — 249 p. — ISBN: 0470058870, 978-0470058879.
The increasing availability of data in our current, information overloaded society has led to the need for valid tools for its modelling and analysis. Data mining and applied statistical methods are the appropriate tools to extract knowledge from such data. This book provides an accessible introduction to data mining methods in a consistent and application oriented statistical framework, using case studies drawn from real industry projects and highlighting the use of data mining methods in a variety of business applications.
1) Introduces data mining methods and applications.
2) Covers classical and Bayesian multivariate statistical methodology as well as machine learning and computational data mining methods.
3) Includes many recent developments such as association and sequence rules, graphical Markov models, lifetime value modelling, credit risk, operational risk and web mining.
4) Features detailed case studies based on applied projects within industry.
5) Incorporates discussion of data mining software, with case studies analysed using R.
6) Is accessible to anyone with a basic knowledge of statistics or data analysis.
7) Includes an extensive bibliography and pointers to further reading within the text.
Applied Data Mining for Business and Industry, 2nd edition is aimed at advanced undergraduate and graduate students of data mining, applied statistics, database management, computer science and economics. The case studies will provide guidance to professionals working in industry on projects involving large volumes of data, such as customer relationship management, web design, risk management, marketing, economics and finance.
Part I. MethodologyOrganisation of the dataStatistical units and statistical variables
Data matrices and their transformations
Complex data structures
Summary statisticsUnivariate exploratory analysis:
measures of location, variability, heterogeneity, concentration, asymmetry, kurtosis
Bivariate exploratory analysis of quantitative data
Multivariate exploratory analysis of quantitative data
Multivariate exploratory analysis of qualitative data:
independence and association, distance measures, dependency measures, model-based measures
Reduction of dimensionality
Interpretation of the principal components
Model specificationMeasures of distance: euclidean distance, similarity measures, multidimensional scaling
Cluster analysis: hierarchical methods, evaluation of hierarchical methods, non-hierarchical methods
Linear regression:
bivariate linear regression, properties of the residuals, goodness of fit, multiple linear regression
Logistic regression: interpretation of logistic regression, discriminant analysis
Tree models: division criteria, pruning
Neural networks: architecture of a neural network, multilayer perceptron, Kohonen networks
Nearest-neighbour models
Local models: association rules, retrieval by content
Uncertainty measures and inference: probability, statistical models, statistical inference
Non-parametric modelling
The normal linear model
Main inferential results
Generalised linear models: exponential family, logistic regression model
Log-linear models: construction, interpretation, comparison, graphical log-linear models
Graphical models:
symmetric graphical models, recursive graphical models, graphical models and neural networks
Survival analysis models
Model evaluationCriteria based on statistical tests:
distance between statistical models, discrepancy of a statistical model, Kullback–Leibler discrepancy
Criteria based on scoring functions
Bayesian criteria
Computational criteria
Criteria based on loss functions
Part II. Business case studiesDescribing website visitorsObjectives of the analysis
Description of the data
Exploratory analysis
Model building: cluster analysis, Kohonen networks
Model comparison
Market basket analysisObjectives of the analysis
Description of the data
Exploratory data analysis
Model building: log-linear models, association rules
Model comparison
Describing customer satisfactionObjectives of the analysis
Description of the data
Exploratory data analysis
Model building
Predicting credit risk of small businessesObjectives of the analysis
Description of the data
Exploratory data analysis
Model building
Model comparison
Predicting e-learning student performanceObjectives of the analysis
Description of the data
Exploratory data analysis
Model specification
Model comparison
Predicting customer lifetime valueObjectives of the analysis
Description of the data
Exploratory data analysis
Model specification
Model comparison
Operational risk managementContext and objectives of the analysis
Exploratory data analysis
Model building
Model comparison
Summary conclusions