Galit Shmueli, Nitin R. Patel, Peter Gedeck, Peter C. Bruce
Machine Learning for Business Analytics
Concepts, Techniques, and Applications in Python
Galit Shmueli, Nitin R. Patel, Peter Gedeck, Peter C. Bruce
Machine Learning for Business Analytics
Concepts, Techniques, and Applications in Python
- Gebundenes Buch
- Merkliste
- Auf die Merkliste
- Bewerten Bewerten
- Teilen
- Produkt teilen
- Produkterinnerung
- Produkterinnerung
Machine Learning for Business Analytics: Concepts, Techniques, and Applications in Python is a comprehensive introduction to and an overview of the methods that underlie modern AI. This best-selling textbook covers both statistical and machine learning (AI) algorithms for prediction, classification, visualization, dimension reduction, rule mining, recommendations, clustering, text mining, experimentation, network analytics and generative AI. Along with hands-on exercises and real-life case studies, it also discusses managerial and ethical issues for responsible use of machine learning…mehr
Andere Kunden interessierten sich auch für
- Intelligent Techniques for Predictive Data Analytics134,99 €
- Fred AminzadehArtificial Intelligence and Data Analytics for Energy Exploration and Production222,99 €
- Field Cady (Allen Institute for Artificial IntelligenceThe Data Science Handbook62,99 €
- Maoguo GongComputational Intelligence for Network Structure Analytics75,99 €
- John M. ChambersExtending R62,99 €
- Statistical Applications from Clinical Trials and Personalized Medicine to Finance and Business Analytics164,99 €
- Du Yong PakIntegral: Theory and Application in a variable64,99 €
-
-
-
Machine Learning for Business Analytics: Concepts, Techniques, and Applications in Python is a comprehensive introduction to and an overview of the methods that underlie modern AI. This best-selling textbook covers both statistical and machine learning (AI) algorithms for prediction, classification, visualization, dimension reduction, rule mining, recommendations, clustering, text mining, experimentation, network analytics and generative AI. Along with hands-on exercises and real-life case studies, it also discusses managerial and ethical issues for responsible use of machine learning techniques. This is the second Python edition of Machine Learning for Business Analytics. This edition also includes: * A new chapter on generative AI (large language models or LLMs, and image generation) * An expanded chapter on deep learning * A new chapter on experimental feedback techniques including A/B testing, uplift modeling, and reinforcement learning * A new chapter on responsible data science * Updates and new material based on feedback from instructors teaching MBA, Masters in Business Analytics and related programs, undergraduate, diploma and executive courses, and from their students * A full chapter of cases demonstrating applications for the machine learning techniques * End-of-chapter exercises with data * A companion website with more than two dozen data sets, and instructor materials including exercise solutions, slides, and case solutions This textbook is an ideal resource for upper-level undergraduate and graduate level courses in AI, data science, predictive analytics, and business analytics. It is also an excellent reference for analysts, researchers, and data science practitioners working with quantitative data in management, finance, marketing, operations management, information systems, computer science, and information technology.
Produktdetails
- Produktdetails
- Verlag: John Wiley & Sons Inc
- 2. Auflage
- Seitenzahl: 720
- Erscheinungstermin: 13. Mai 2025
- Englisch
- Abmessung: 259mm x 183mm x 41mm
- Gewicht: 1656g
- ISBN-13: 9781394286799
- ISBN-10: 1394286791
- Artikelnr.: 73582284
- Herstellerkennzeichnung
- Libri GmbH
- Europaallee 1
- 36244 Bad Hersfeld
- gpsr@libri.de
- Verlag: John Wiley & Sons Inc
- 2. Auflage
- Seitenzahl: 720
- Erscheinungstermin: 13. Mai 2025
- Englisch
- Abmessung: 259mm x 183mm x 41mm
- Gewicht: 1656g
- ISBN-13: 9781394286799
- ISBN-10: 1394286791
- Artikelnr.: 73582284
- Herstellerkennzeichnung
- Libri GmbH
- Europaallee 1
- 36244 Bad Hersfeld
- gpsr@libri.de
Galit Shmueli, PhD, is Chair Professor at National Tsing Hua University's Institute of Service Science, College of Technology Management. She has designed and instructed business analytics courses since 2004 at University of Maryland, Statistics.com, The Indian School of Business, and National Tsing Hua University, Taiwan. Peter C. Bruce is the Founder and former President of the Institute for Statistics Education at Statistics.com. Peter Gedeck, PhD, is Senior Data Scientist at Collaborative Drug Discovery and Lecturer at the UVA School of Data Science. His speciality is the development of machine learning algorithms to predict biological and physicochemical properties of drug candidates. Nitin R. Patel, PhD, is cofounder and lead researcher at Cytel Inc. He was also a co-founder of Tata Consultancy Services. A Fellow of the American Statistical Association, Dr. Patel has served as a visiting professor at the Massachusetts Institute of Technology and at Harvard University. He is a Fellow of the Computer Society of India and was a professor at the Indian Institute of Management, Ahmedabad, for 15 years.
Foreword by Gareth James xxi
Preface to the Second Python Edition xxiii
Acknowledgments xxvii
Part I Preliminaries
Chapter 1 Introduction 3
1.1 What Is Business Analytics? 3
1.2 What Is Machine Learning? 5
1.3 Machine Learning, AI, and Related Terms 5
1.4 Big Data 7
1.5 Data Science 8
1.6 Why Are There So Many Different Methods? 8
1.7 Terminology and Notation 9
1.8 Road Maps to This Book 12
Order of Topics 13
Chapter 2 Overview of the Machine Learning Process 17
2.1 Introduction 18
2.2 Core Ideas in Machine Learning 18
2.3 The Steps in a Machine Learning Project 22
2.4 Preliminary Steps 23
2.5 Predictive Power and Overfitting 37
2.6 Building a Predictive Model 43
2.7 Using Python for Machine Learning on a Local Machine 49
2.8 Automating Machine Learning Solutions 49
2.9 Ethical Practice in Machine Learning 54
Problems 55
Part II Data Exploration and Dimension Reduction
Chapter 3 Data Visualization 61
3.1 Uses of Data Visualization 62
3.2 Data Examples 64
3.3 Basic Charts: Bar Charts, Line Charts, and Scatter Plots 66
3.4 Multidimensional Visualization 75
3.5 Specialized Visualizations 90
Problems 98
Chapter 4 Dimension Reduction 101
4.1 Introduction 102
4.2 Curse of Dimensionality 102
4.3 Practical Considerations 103
4.4 Data Summaries 103
4.5 Correlation Analysis 108
4.6 Reducing the Number of Categories in Categorical Variables 109
4.7 Converting a Categorical Variable to a Numerical Variable 109
4.8 Principal Component Analysis 111
4.9 Dimension Reduction Using Regression Models 121
4.10 Dimension Reduction Using Classification and Regression Trees 121
Problems 123
Part III Performance Evaluation
Chapter 5 Evaluating Predictive Performance 129
5.1 Introduction 130
5.2 Evaluating Predictive Performance 131
5.3 Judging Classifier Performance 137
5.4 Judging Ranking Performance 150
5.5 Oversampling 156
Problems 162
Part IV Prediction and Classification Methods
Chapter 6 Multiple Linear Regression 167
6.1 Introduction 168
6.2 Explanatory vs. Predictive Modeling 168
6.3 Estimating the Regression Equation and Prediction 170
6.4 Variable Selection in Linear Regression 176
Problems 188
Chapter 7 k-Nearest Neighbors (k-NN) 193
7.1 The k-NN Classifier (Categorical Outcome) 194
7.2 k-NN for a Numerical Outcome 203
7.3 Advantages and Shortcomings of k-NN Algorithms 205
Problems 207
Chapter 8 The Naive Bayes Classifier 209
8.1 Introduction 209
8.2 Applying the Full (Exact) Bayesian Classifier 212
8.3 Solution: Naive Bayes 213
8.4 Advantages and Shortcomings of the Naive Bayes Classifier 224
Problems 226
Chapter 9 Classification and Regression Trees 229
9.1 Introduction 230
9.2 Classification Trees 232
9.3 Evaluating the Performance of a Classification Tree 241
9.4 Avoiding Overfitting 246
9.5 Classification Rules from Trees 252
9.6 Classification Trees for More Than Two Classes 252
9.7 Regression Trees 253
9.8 Advantages and Weaknesses of a Tree 256
9.9 Improving Prediction: Random Forests and Boosted Trees 258
Problems 264
Chapter 10 Logistic Regression 267
10.1 Introduction 268
10.2 The Logistic Regression Model 269
10.3 Example: Acceptance of Personal Loan 272
10.4 Evaluating Classification Performance 277
10.5 Variable Selection 280
10.6 Logistic Regression for Multi-Class Classification 281
10.7 Example of Complete Analysis: Predicting Delayed Flights 285
Problems 298
Chapter 11 Neural Nets 301
11.1 Introduction 302
11.2 Concept and Structure of a Neural Network 302
11.3 Fitting a Network to Data 303
11.4 Required User Input 316
11.5 Exploring the Relationship Between Predictors and Outcome 317
11.6 Deep Learning 318
11.7 Advantages and Weaknesses of Neural Networks 329
Problems 331
Chapter 12 Discriminant Analysis 333
12.1 Introduction 334
12.2 Distance of a Record from a Class 336
12.3 Fisher's Linear Classification Functions 337
12.4 Classification Performance of Discriminant Analysis 341
12.5 Prior Probabilities 342
12.6 Unequal Misclassification Costs 342
12.7 Classifying More Than Two Classes 344
12.8 Advantages and Weaknesses 347
Problems 348
Chapter 13 Generating, Comparing, and Combining Multiple Models 351
13.1 Ensembles 352
13.2 Automated Machine Learning (AutoML) 359
13.3 Explaining Model Predictions 365
13.4 Summary 366
Problems 368
Chapter 14 Experiments, Uplift Models, and Reinforcement Learning 371
14.1 A/B Testing 372
14.2 Uplift (Persuasion) Modeling 377
14.3 Reinforcement Learning 384
14.4 Summary 393
Problems 395
Part V Mining Relationships Among Records
Chapter 15 Association Rules and Collaborative Filtering 399
15.1 Association Rules 400
15.2 Collaborative Filtering 413
15.3 Summary 427
Problems 429
Chapter 16 Cluster Analysis 433
16.1 Introduction 434
16.2 Measuring Distance Between Two Records 437
16.3 Measuring Distance Between Two Clusters 443
16.4 Hierarchical (Agglomerative) Clustering 445
16.5 Non-Hierarchical Clustering: The k-Means Algorithm 453
Problems 459
Part VI Forecasting Time Series
Chapter 17 Handling Time Series 463
17.1 Introduction 464
17.2 Descriptive vs. Predictive Modeling 465
17.3 Popular Forecasting Methods in Business 465
17.4 Time Series Components 466
17.5 Data Partitioning and Performance Evaluation 470
Problems 474
Chapter 18 Regression-Based Forecasting 477
18.1 A Model with Trend 478
18.2 A Model with Seasonality 484
18.3 A Model with Trend and Seasonality 486
18.4 Autocorrelation and ARIMA Models 488
Problems 498
Chapter 19 Smoothing and Deep Learning Methods for Forecasting 509
19.1 Smoothing Methods: Introduction 510
19.2 Moving Average 510
19.3 Simple Exponential Smoothing 515
19.4 Advanced Exponential Smoothing 518
19.5 Deep Learning for Forecasting 521
Problems 527
Part VII Data Analytics
Chapter 20 Social Network Analytics 537
20.1 Introduction 538
20.2 Directed vs. Undirected Networks 538
20.3 Visualizing and Analyzing Networks 539
20.4 Social Data Metrics and Taxonomy 544
20.5 Using Network Metrics in Prediction and Classification 550
20.6 Business Uses of Social Network Analysis 556
20.7 Summary 557
Problems 559
Chapter 21 Text Mining 561
21.1 Introduction 562
21.2 The Tabular Representation of Text 562
21.3 Bag-of-Words vs. Meaning Extraction at Document Level 563
21.4 Preprocessing the Text 564
21.5 Implementing Machine Learning Methods 573
21.6 Example: Online Discussions on Autos and Electronics 573
21.7 Deep Learning Approaches 577
21.8 Example: Sentiment Analysis of Movie Reviews 578
21.9 Summary 581
Problems 584
Chapter 22 Responsible Data Science 587
22.1 Introduction 588
22.2 Unintentional Harm 589
22.3 Legal Considerations 591
22.4 Principles of Responsible Data Science 592
22.5 A Responsible Data Science Framework 595
22.6 Documentation Tools 599
22.7 Example: Applying the RDS Framework to the COMPAS Example 603
22.8 Summary 613
Problems 614
Chapter 23 Generative AI 617
23.1 The Transformative Power of Generative AI 617
23.2 What is Generative AI? 619
23.3 Data and Infrastructure Requirements 621
23.4 Adapting Models for Specific Purposes 623
23.5 Prompt Engineering 624
23.6 Uses of Generative AI 625
23.7 Caveats and Concerns 629
23.8 Summary 631
Problems 633
Part VIII Cases
Chapter 24 Cases 639
24.1 Charles Book Club 639
24.2 German Credit 646
24.3 Tayko Software Cataloger 651
24.4 Political Persuasion 655
24.5 Taxi Cancellations 659
24.7 Direct-Mail Fundraising 665
24.8 Catalog Cross-Selling 668
24.9 Time-Series Case: Forecasting Public Transportation Demand 670
24.10 Loan Approval 672
References 675
Index 677
Preface to the Second Python Edition xxiii
Acknowledgments xxvii
Part I Preliminaries
Chapter 1 Introduction 3
1.1 What Is Business Analytics? 3
1.2 What Is Machine Learning? 5
1.3 Machine Learning, AI, and Related Terms 5
1.4 Big Data 7
1.5 Data Science 8
1.6 Why Are There So Many Different Methods? 8
1.7 Terminology and Notation 9
1.8 Road Maps to This Book 12
Order of Topics 13
Chapter 2 Overview of the Machine Learning Process 17
2.1 Introduction 18
2.2 Core Ideas in Machine Learning 18
2.3 The Steps in a Machine Learning Project 22
2.4 Preliminary Steps 23
2.5 Predictive Power and Overfitting 37
2.6 Building a Predictive Model 43
2.7 Using Python for Machine Learning on a Local Machine 49
2.8 Automating Machine Learning Solutions 49
2.9 Ethical Practice in Machine Learning 54
Problems 55
Part II Data Exploration and Dimension Reduction
Chapter 3 Data Visualization 61
3.1 Uses of Data Visualization 62
3.2 Data Examples 64
3.3 Basic Charts: Bar Charts, Line Charts, and Scatter Plots 66
3.4 Multidimensional Visualization 75
3.5 Specialized Visualizations 90
Problems 98
Chapter 4 Dimension Reduction 101
4.1 Introduction 102
4.2 Curse of Dimensionality 102
4.3 Practical Considerations 103
4.4 Data Summaries 103
4.5 Correlation Analysis 108
4.6 Reducing the Number of Categories in Categorical Variables 109
4.7 Converting a Categorical Variable to a Numerical Variable 109
4.8 Principal Component Analysis 111
4.9 Dimension Reduction Using Regression Models 121
4.10 Dimension Reduction Using Classification and Regression Trees 121
Problems 123
Part III Performance Evaluation
Chapter 5 Evaluating Predictive Performance 129
5.1 Introduction 130
5.2 Evaluating Predictive Performance 131
5.3 Judging Classifier Performance 137
5.4 Judging Ranking Performance 150
5.5 Oversampling 156
Problems 162
Part IV Prediction and Classification Methods
Chapter 6 Multiple Linear Regression 167
6.1 Introduction 168
6.2 Explanatory vs. Predictive Modeling 168
6.3 Estimating the Regression Equation and Prediction 170
6.4 Variable Selection in Linear Regression 176
Problems 188
Chapter 7 k-Nearest Neighbors (k-NN) 193
7.1 The k-NN Classifier (Categorical Outcome) 194
7.2 k-NN for a Numerical Outcome 203
7.3 Advantages and Shortcomings of k-NN Algorithms 205
Problems 207
Chapter 8 The Naive Bayes Classifier 209
8.1 Introduction 209
8.2 Applying the Full (Exact) Bayesian Classifier 212
8.3 Solution: Naive Bayes 213
8.4 Advantages and Shortcomings of the Naive Bayes Classifier 224
Problems 226
Chapter 9 Classification and Regression Trees 229
9.1 Introduction 230
9.2 Classification Trees 232
9.3 Evaluating the Performance of a Classification Tree 241
9.4 Avoiding Overfitting 246
9.5 Classification Rules from Trees 252
9.6 Classification Trees for More Than Two Classes 252
9.7 Regression Trees 253
9.8 Advantages and Weaknesses of a Tree 256
9.9 Improving Prediction: Random Forests and Boosted Trees 258
Problems 264
Chapter 10 Logistic Regression 267
10.1 Introduction 268
10.2 The Logistic Regression Model 269
10.3 Example: Acceptance of Personal Loan 272
10.4 Evaluating Classification Performance 277
10.5 Variable Selection 280
10.6 Logistic Regression for Multi-Class Classification 281
10.7 Example of Complete Analysis: Predicting Delayed Flights 285
Problems 298
Chapter 11 Neural Nets 301
11.1 Introduction 302
11.2 Concept and Structure of a Neural Network 302
11.3 Fitting a Network to Data 303
11.4 Required User Input 316
11.5 Exploring the Relationship Between Predictors and Outcome 317
11.6 Deep Learning 318
11.7 Advantages and Weaknesses of Neural Networks 329
Problems 331
Chapter 12 Discriminant Analysis 333
12.1 Introduction 334
12.2 Distance of a Record from a Class 336
12.3 Fisher's Linear Classification Functions 337
12.4 Classification Performance of Discriminant Analysis 341
12.5 Prior Probabilities 342
12.6 Unequal Misclassification Costs 342
12.7 Classifying More Than Two Classes 344
12.8 Advantages and Weaknesses 347
Problems 348
Chapter 13 Generating, Comparing, and Combining Multiple Models 351
13.1 Ensembles 352
13.2 Automated Machine Learning (AutoML) 359
13.3 Explaining Model Predictions 365
13.4 Summary 366
Problems 368
Chapter 14 Experiments, Uplift Models, and Reinforcement Learning 371
14.1 A/B Testing 372
14.2 Uplift (Persuasion) Modeling 377
14.3 Reinforcement Learning 384
14.4 Summary 393
Problems 395
Part V Mining Relationships Among Records
Chapter 15 Association Rules and Collaborative Filtering 399
15.1 Association Rules 400
15.2 Collaborative Filtering 413
15.3 Summary 427
Problems 429
Chapter 16 Cluster Analysis 433
16.1 Introduction 434
16.2 Measuring Distance Between Two Records 437
16.3 Measuring Distance Between Two Clusters 443
16.4 Hierarchical (Agglomerative) Clustering 445
16.5 Non-Hierarchical Clustering: The k-Means Algorithm 453
Problems 459
Part VI Forecasting Time Series
Chapter 17 Handling Time Series 463
17.1 Introduction 464
17.2 Descriptive vs. Predictive Modeling 465
17.3 Popular Forecasting Methods in Business 465
17.4 Time Series Components 466
17.5 Data Partitioning and Performance Evaluation 470
Problems 474
Chapter 18 Regression-Based Forecasting 477
18.1 A Model with Trend 478
18.2 A Model with Seasonality 484
18.3 A Model with Trend and Seasonality 486
18.4 Autocorrelation and ARIMA Models 488
Problems 498
Chapter 19 Smoothing and Deep Learning Methods for Forecasting 509
19.1 Smoothing Methods: Introduction 510
19.2 Moving Average 510
19.3 Simple Exponential Smoothing 515
19.4 Advanced Exponential Smoothing 518
19.5 Deep Learning for Forecasting 521
Problems 527
Part VII Data Analytics
Chapter 20 Social Network Analytics 537
20.1 Introduction 538
20.2 Directed vs. Undirected Networks 538
20.3 Visualizing and Analyzing Networks 539
20.4 Social Data Metrics and Taxonomy 544
20.5 Using Network Metrics in Prediction and Classification 550
20.6 Business Uses of Social Network Analysis 556
20.7 Summary 557
Problems 559
Chapter 21 Text Mining 561
21.1 Introduction 562
21.2 The Tabular Representation of Text 562
21.3 Bag-of-Words vs. Meaning Extraction at Document Level 563
21.4 Preprocessing the Text 564
21.5 Implementing Machine Learning Methods 573
21.6 Example: Online Discussions on Autos and Electronics 573
21.7 Deep Learning Approaches 577
21.8 Example: Sentiment Analysis of Movie Reviews 578
21.9 Summary 581
Problems 584
Chapter 22 Responsible Data Science 587
22.1 Introduction 588
22.2 Unintentional Harm 589
22.3 Legal Considerations 591
22.4 Principles of Responsible Data Science 592
22.5 A Responsible Data Science Framework 595
22.6 Documentation Tools 599
22.7 Example: Applying the RDS Framework to the COMPAS Example 603
22.8 Summary 613
Problems 614
Chapter 23 Generative AI 617
23.1 The Transformative Power of Generative AI 617
23.2 What is Generative AI? 619
23.3 Data and Infrastructure Requirements 621
23.4 Adapting Models for Specific Purposes 623
23.5 Prompt Engineering 624
23.6 Uses of Generative AI 625
23.7 Caveats and Concerns 629
23.8 Summary 631
Problems 633
Part VIII Cases
Chapter 24 Cases 639
24.1 Charles Book Club 639
24.2 German Credit 646
24.3 Tayko Software Cataloger 651
24.4 Political Persuasion 655
24.5 Taxi Cancellations 659
24.7 Direct-Mail Fundraising 665
24.8 Catalog Cross-Selling 668
24.9 Time-Series Case: Forecasting Public Transportation Demand 670
24.10 Loan Approval 672
References 675
Index 677
Foreword by Gareth James xxi
Preface to the Second Python Edition xxiii
Acknowledgments xxvii
Part I Preliminaries
Chapter 1 Introduction 3
1.1 What Is Business Analytics? 3
1.2 What Is Machine Learning? 5
1.3 Machine Learning, AI, and Related Terms 5
1.4 Big Data 7
1.5 Data Science 8
1.6 Why Are There So Many Different Methods? 8
1.7 Terminology and Notation 9
1.8 Road Maps to This Book 12
Order of Topics 13
Chapter 2 Overview of the Machine Learning Process 17
2.1 Introduction 18
2.2 Core Ideas in Machine Learning 18
2.3 The Steps in a Machine Learning Project 22
2.4 Preliminary Steps 23
2.5 Predictive Power and Overfitting 37
2.6 Building a Predictive Model 43
2.7 Using Python for Machine Learning on a Local Machine 49
2.8 Automating Machine Learning Solutions 49
2.9 Ethical Practice in Machine Learning 54
Problems 55
Part II Data Exploration and Dimension Reduction
Chapter 3 Data Visualization 61
3.1 Uses of Data Visualization 62
3.2 Data Examples 64
3.3 Basic Charts: Bar Charts, Line Charts, and Scatter Plots 66
3.4 Multidimensional Visualization 75
3.5 Specialized Visualizations 90
Problems 98
Chapter 4 Dimension Reduction 101
4.1 Introduction 102
4.2 Curse of Dimensionality 102
4.3 Practical Considerations 103
4.4 Data Summaries 103
4.5 Correlation Analysis 108
4.6 Reducing the Number of Categories in Categorical Variables 109
4.7 Converting a Categorical Variable to a Numerical Variable 109
4.8 Principal Component Analysis 111
4.9 Dimension Reduction Using Regression Models 121
4.10 Dimension Reduction Using Classification and Regression Trees 121
Problems 123
Part III Performance Evaluation
Chapter 5 Evaluating Predictive Performance 129
5.1 Introduction 130
5.2 Evaluating Predictive Performance 131
5.3 Judging Classifier Performance 137
5.4 Judging Ranking Performance 150
5.5 Oversampling 156
Problems 162
Part IV Prediction and Classification Methods
Chapter 6 Multiple Linear Regression 167
6.1 Introduction 168
6.2 Explanatory vs. Predictive Modeling 168
6.3 Estimating the Regression Equation and Prediction 170
6.4 Variable Selection in Linear Regression 176
Problems 188
Chapter 7 k-Nearest Neighbors (k-NN) 193
7.1 The k-NN Classifier (Categorical Outcome) 194
7.2 k-NN for a Numerical Outcome 203
7.3 Advantages and Shortcomings of k-NN Algorithms 205
Problems 207
Chapter 8 The Naive Bayes Classifier 209
8.1 Introduction 209
8.2 Applying the Full (Exact) Bayesian Classifier 212
8.3 Solution: Naive Bayes 213
8.4 Advantages and Shortcomings of the Naive Bayes Classifier 224
Problems 226
Chapter 9 Classification and Regression Trees 229
9.1 Introduction 230
9.2 Classification Trees 232
9.3 Evaluating the Performance of a Classification Tree 241
9.4 Avoiding Overfitting 246
9.5 Classification Rules from Trees 252
9.6 Classification Trees for More Than Two Classes 252
9.7 Regression Trees 253
9.8 Advantages and Weaknesses of a Tree 256
9.9 Improving Prediction: Random Forests and Boosted Trees 258
Problems 264
Chapter 10 Logistic Regression 267
10.1 Introduction 268
10.2 The Logistic Regression Model 269
10.3 Example: Acceptance of Personal Loan 272
10.4 Evaluating Classification Performance 277
10.5 Variable Selection 280
10.6 Logistic Regression for Multi-Class Classification 281
10.7 Example of Complete Analysis: Predicting Delayed Flights 285
Problems 298
Chapter 11 Neural Nets 301
11.1 Introduction 302
11.2 Concept and Structure of a Neural Network 302
11.3 Fitting a Network to Data 303
11.4 Required User Input 316
11.5 Exploring the Relationship Between Predictors and Outcome 317
11.6 Deep Learning 318
11.7 Advantages and Weaknesses of Neural Networks 329
Problems 331
Chapter 12 Discriminant Analysis 333
12.1 Introduction 334
12.2 Distance of a Record from a Class 336
12.3 Fisher's Linear Classification Functions 337
12.4 Classification Performance of Discriminant Analysis 341
12.5 Prior Probabilities 342
12.6 Unequal Misclassification Costs 342
12.7 Classifying More Than Two Classes 344
12.8 Advantages and Weaknesses 347
Problems 348
Chapter 13 Generating, Comparing, and Combining Multiple Models 351
13.1 Ensembles 352
13.2 Automated Machine Learning (AutoML) 359
13.3 Explaining Model Predictions 365
13.4 Summary 366
Problems 368
Chapter 14 Experiments, Uplift Models, and Reinforcement Learning 371
14.1 A/B Testing 372
14.2 Uplift (Persuasion) Modeling 377
14.3 Reinforcement Learning 384
14.4 Summary 393
Problems 395
Part V Mining Relationships Among Records
Chapter 15 Association Rules and Collaborative Filtering 399
15.1 Association Rules 400
15.2 Collaborative Filtering 413
15.3 Summary 427
Problems 429
Chapter 16 Cluster Analysis 433
16.1 Introduction 434
16.2 Measuring Distance Between Two Records 437
16.3 Measuring Distance Between Two Clusters 443
16.4 Hierarchical (Agglomerative) Clustering 445
16.5 Non-Hierarchical Clustering: The k-Means Algorithm 453
Problems 459
Part VI Forecasting Time Series
Chapter 17 Handling Time Series 463
17.1 Introduction 464
17.2 Descriptive vs. Predictive Modeling 465
17.3 Popular Forecasting Methods in Business 465
17.4 Time Series Components 466
17.5 Data Partitioning and Performance Evaluation 470
Problems 474
Chapter 18 Regression-Based Forecasting 477
18.1 A Model with Trend 478
18.2 A Model with Seasonality 484
18.3 A Model with Trend and Seasonality 486
18.4 Autocorrelation and ARIMA Models 488
Problems 498
Chapter 19 Smoothing and Deep Learning Methods for Forecasting 509
19.1 Smoothing Methods: Introduction 510
19.2 Moving Average 510
19.3 Simple Exponential Smoothing 515
19.4 Advanced Exponential Smoothing 518
19.5 Deep Learning for Forecasting 521
Problems 527
Part VII Data Analytics
Chapter 20 Social Network Analytics 537
20.1 Introduction 538
20.2 Directed vs. Undirected Networks 538
20.3 Visualizing and Analyzing Networks 539
20.4 Social Data Metrics and Taxonomy 544
20.5 Using Network Metrics in Prediction and Classification 550
20.6 Business Uses of Social Network Analysis 556
20.7 Summary 557
Problems 559
Chapter 21 Text Mining 561
21.1 Introduction 562
21.2 The Tabular Representation of Text 562
21.3 Bag-of-Words vs. Meaning Extraction at Document Level 563
21.4 Preprocessing the Text 564
21.5 Implementing Machine Learning Methods 573
21.6 Example: Online Discussions on Autos and Electronics 573
21.7 Deep Learning Approaches 577
21.8 Example: Sentiment Analysis of Movie Reviews 578
21.9 Summary 581
Problems 584
Chapter 22 Responsible Data Science 587
22.1 Introduction 588
22.2 Unintentional Harm 589
22.3 Legal Considerations 591
22.4 Principles of Responsible Data Science 592
22.5 A Responsible Data Science Framework 595
22.6 Documentation Tools 599
22.7 Example: Applying the RDS Framework to the COMPAS Example 603
22.8 Summary 613
Problems 614
Chapter 23 Generative AI 617
23.1 The Transformative Power of Generative AI 617
23.2 What is Generative AI? 619
23.3 Data and Infrastructure Requirements 621
23.4 Adapting Models for Specific Purposes 623
23.5 Prompt Engineering 624
23.6 Uses of Generative AI 625
23.7 Caveats and Concerns 629
23.8 Summary 631
Problems 633
Part VIII Cases
Chapter 24 Cases 639
24.1 Charles Book Club 639
24.2 German Credit 646
24.3 Tayko Software Cataloger 651
24.4 Political Persuasion 655
24.5 Taxi Cancellations 659
24.7 Direct-Mail Fundraising 665
24.8 Catalog Cross-Selling 668
24.9 Time-Series Case: Forecasting Public Transportation Demand 670
24.10 Loan Approval 672
References 675
Index 677
Preface to the Second Python Edition xxiii
Acknowledgments xxvii
Part I Preliminaries
Chapter 1 Introduction 3
1.1 What Is Business Analytics? 3
1.2 What Is Machine Learning? 5
1.3 Machine Learning, AI, and Related Terms 5
1.4 Big Data 7
1.5 Data Science 8
1.6 Why Are There So Many Different Methods? 8
1.7 Terminology and Notation 9
1.8 Road Maps to This Book 12
Order of Topics 13
Chapter 2 Overview of the Machine Learning Process 17
2.1 Introduction 18
2.2 Core Ideas in Machine Learning 18
2.3 The Steps in a Machine Learning Project 22
2.4 Preliminary Steps 23
2.5 Predictive Power and Overfitting 37
2.6 Building a Predictive Model 43
2.7 Using Python for Machine Learning on a Local Machine 49
2.8 Automating Machine Learning Solutions 49
2.9 Ethical Practice in Machine Learning 54
Problems 55
Part II Data Exploration and Dimension Reduction
Chapter 3 Data Visualization 61
3.1 Uses of Data Visualization 62
3.2 Data Examples 64
3.3 Basic Charts: Bar Charts, Line Charts, and Scatter Plots 66
3.4 Multidimensional Visualization 75
3.5 Specialized Visualizations 90
Problems 98
Chapter 4 Dimension Reduction 101
4.1 Introduction 102
4.2 Curse of Dimensionality 102
4.3 Practical Considerations 103
4.4 Data Summaries 103
4.5 Correlation Analysis 108
4.6 Reducing the Number of Categories in Categorical Variables 109
4.7 Converting a Categorical Variable to a Numerical Variable 109
4.8 Principal Component Analysis 111
4.9 Dimension Reduction Using Regression Models 121
4.10 Dimension Reduction Using Classification and Regression Trees 121
Problems 123
Part III Performance Evaluation
Chapter 5 Evaluating Predictive Performance 129
5.1 Introduction 130
5.2 Evaluating Predictive Performance 131
5.3 Judging Classifier Performance 137
5.4 Judging Ranking Performance 150
5.5 Oversampling 156
Problems 162
Part IV Prediction and Classification Methods
Chapter 6 Multiple Linear Regression 167
6.1 Introduction 168
6.2 Explanatory vs. Predictive Modeling 168
6.3 Estimating the Regression Equation and Prediction 170
6.4 Variable Selection in Linear Regression 176
Problems 188
Chapter 7 k-Nearest Neighbors (k-NN) 193
7.1 The k-NN Classifier (Categorical Outcome) 194
7.2 k-NN for a Numerical Outcome 203
7.3 Advantages and Shortcomings of k-NN Algorithms 205
Problems 207
Chapter 8 The Naive Bayes Classifier 209
8.1 Introduction 209
8.2 Applying the Full (Exact) Bayesian Classifier 212
8.3 Solution: Naive Bayes 213
8.4 Advantages and Shortcomings of the Naive Bayes Classifier 224
Problems 226
Chapter 9 Classification and Regression Trees 229
9.1 Introduction 230
9.2 Classification Trees 232
9.3 Evaluating the Performance of a Classification Tree 241
9.4 Avoiding Overfitting 246
9.5 Classification Rules from Trees 252
9.6 Classification Trees for More Than Two Classes 252
9.7 Regression Trees 253
9.8 Advantages and Weaknesses of a Tree 256
9.9 Improving Prediction: Random Forests and Boosted Trees 258
Problems 264
Chapter 10 Logistic Regression 267
10.1 Introduction 268
10.2 The Logistic Regression Model 269
10.3 Example: Acceptance of Personal Loan 272
10.4 Evaluating Classification Performance 277
10.5 Variable Selection 280
10.6 Logistic Regression for Multi-Class Classification 281
10.7 Example of Complete Analysis: Predicting Delayed Flights 285
Problems 298
Chapter 11 Neural Nets 301
11.1 Introduction 302
11.2 Concept and Structure of a Neural Network 302
11.3 Fitting a Network to Data 303
11.4 Required User Input 316
11.5 Exploring the Relationship Between Predictors and Outcome 317
11.6 Deep Learning 318
11.7 Advantages and Weaknesses of Neural Networks 329
Problems 331
Chapter 12 Discriminant Analysis 333
12.1 Introduction 334
12.2 Distance of a Record from a Class 336
12.3 Fisher's Linear Classification Functions 337
12.4 Classification Performance of Discriminant Analysis 341
12.5 Prior Probabilities 342
12.6 Unequal Misclassification Costs 342
12.7 Classifying More Than Two Classes 344
12.8 Advantages and Weaknesses 347
Problems 348
Chapter 13 Generating, Comparing, and Combining Multiple Models 351
13.1 Ensembles 352
13.2 Automated Machine Learning (AutoML) 359
13.3 Explaining Model Predictions 365
13.4 Summary 366
Problems 368
Chapter 14 Experiments, Uplift Models, and Reinforcement Learning 371
14.1 A/B Testing 372
14.2 Uplift (Persuasion) Modeling 377
14.3 Reinforcement Learning 384
14.4 Summary 393
Problems 395
Part V Mining Relationships Among Records
Chapter 15 Association Rules and Collaborative Filtering 399
15.1 Association Rules 400
15.2 Collaborative Filtering 413
15.3 Summary 427
Problems 429
Chapter 16 Cluster Analysis 433
16.1 Introduction 434
16.2 Measuring Distance Between Two Records 437
16.3 Measuring Distance Between Two Clusters 443
16.4 Hierarchical (Agglomerative) Clustering 445
16.5 Non-Hierarchical Clustering: The k-Means Algorithm 453
Problems 459
Part VI Forecasting Time Series
Chapter 17 Handling Time Series 463
17.1 Introduction 464
17.2 Descriptive vs. Predictive Modeling 465
17.3 Popular Forecasting Methods in Business 465
17.4 Time Series Components 466
17.5 Data Partitioning and Performance Evaluation 470
Problems 474
Chapter 18 Regression-Based Forecasting 477
18.1 A Model with Trend 478
18.2 A Model with Seasonality 484
18.3 A Model with Trend and Seasonality 486
18.4 Autocorrelation and ARIMA Models 488
Problems 498
Chapter 19 Smoothing and Deep Learning Methods for Forecasting 509
19.1 Smoothing Methods: Introduction 510
19.2 Moving Average 510
19.3 Simple Exponential Smoothing 515
19.4 Advanced Exponential Smoothing 518
19.5 Deep Learning for Forecasting 521
Problems 527
Part VII Data Analytics
Chapter 20 Social Network Analytics 537
20.1 Introduction 538
20.2 Directed vs. Undirected Networks 538
20.3 Visualizing and Analyzing Networks 539
20.4 Social Data Metrics and Taxonomy 544
20.5 Using Network Metrics in Prediction and Classification 550
20.6 Business Uses of Social Network Analysis 556
20.7 Summary 557
Problems 559
Chapter 21 Text Mining 561
21.1 Introduction 562
21.2 The Tabular Representation of Text 562
21.3 Bag-of-Words vs. Meaning Extraction at Document Level 563
21.4 Preprocessing the Text 564
21.5 Implementing Machine Learning Methods 573
21.6 Example: Online Discussions on Autos and Electronics 573
21.7 Deep Learning Approaches 577
21.8 Example: Sentiment Analysis of Movie Reviews 578
21.9 Summary 581
Problems 584
Chapter 22 Responsible Data Science 587
22.1 Introduction 588
22.2 Unintentional Harm 589
22.3 Legal Considerations 591
22.4 Principles of Responsible Data Science 592
22.5 A Responsible Data Science Framework 595
22.6 Documentation Tools 599
22.7 Example: Applying the RDS Framework to the COMPAS Example 603
22.8 Summary 613
Problems 614
Chapter 23 Generative AI 617
23.1 The Transformative Power of Generative AI 617
23.2 What is Generative AI? 619
23.3 Data and Infrastructure Requirements 621
23.4 Adapting Models for Specific Purposes 623
23.5 Prompt Engineering 624
23.6 Uses of Generative AI 625
23.7 Caveats and Concerns 629
23.8 Summary 631
Problems 633
Part VIII Cases
Chapter 24 Cases 639
24.1 Charles Book Club 639
24.2 German Credit 646
24.3 Tayko Software Cataloger 651
24.4 Political Persuasion 655
24.5 Taxi Cancellations 659
24.7 Direct-Mail Fundraising 665
24.8 Catalog Cross-Selling 668
24.9 Time-Series Case: Forecasting Public Transportation Demand 670
24.10 Loan Approval 672
References 675
Index 677