Resve A Saleh, Sohaib Majzoub, A K MD Ehsanes Saleh
Fundamentals of Robust Machine Learning
Handling Outliers and Anomalies in Data Science
Resve A Saleh, Sohaib Majzoub, A K MD Ehsanes Saleh
Fundamentals of Robust Machine Learning
Handling Outliers and Anomalies in Data Science
- Gebundenes Buch
- Merkliste
- Auf die Merkliste
- Bewerten Bewerten
- Teilen
- Produkt teilen
- Produkterinnerung
- Produkterinnerung
An essential guide for tackling outliers and anomalies in machine learning and data science. In recent years, machine learning (ML) has transformed virtually every area of research and technology, becoming one of the key tools for data scientists. Robust machine learning is a new approach to handling outliers in datasets, which is an often-overlooked aspect of data science. Ignoring outliers can lead to bad business decisions, wrong medical diagnoses, reaching the wrong conclusions or incorrectly assessing feature importance, just to name a few. Fundamentals of Robust Machine Learning offers a…mehr
Andere Kunden interessierten sich auch für
- Marco CremoniniData Science Fundamentals with R, Python, and Open Data119,99 €
- Guoqing ChenFuzzy Logic in Data Modeling112,99 €
- Roy L. StreitPoisson Point Processes112,99 €
- Paulo Romero Martins MacielPerformance, Reliability, and Availability Evaluation of Computational Systems, Volume I164,99 €
- Christoph AdamiIntroduction to Artificial Life38,99 €
- Combinatorial Optimization and Graph Algorithms75,99 €
- Jeffrey McConnellAnalysis of Algorithms192,99 €
-
-
-
An essential guide for tackling outliers and anomalies in machine learning and data science. In recent years, machine learning (ML) has transformed virtually every area of research and technology, becoming one of the key tools for data scientists. Robust machine learning is a new approach to handling outliers in datasets, which is an often-overlooked aspect of data science. Ignoring outliers can lead to bad business decisions, wrong medical diagnoses, reaching the wrong conclusions or incorrectly assessing feature importance, just to name a few. Fundamentals of Robust Machine Learning offers a thorough but accessible overview of this subject by focusing on how to properly handle outliers and anomalies in datasets. There are two main approaches described in the book: using outlier-tolerant ML tools, or removing outliers before using conventional tools. Balancing theoretical foundations with practical Python code, it provides all the necessary skills to enhance the accuracy, stability and reliability of ML models. Fundamentals of Robust Machine Learning readers will also find: * A blend of robust statistics and machine learning principles * Detailed discussion of a wide range of robust machine learning methodologies, from robust clustering, regression and classification, to neural networks and anomaly detection * Python code with immediate application to data science problems Fundamentals of Robust Machine Learning is ideal for undergraduate or graduate students in data science, machine learning, and related fields, as well as for professionals in the field looking to enhance their understanding of building models in the presence of outliers.
Produktdetails
- Produktdetails
- Verlag: Wiley
- Seitenzahl: 416
- Erscheinungstermin: 13. Mai 2025
- Englisch
- Abmessung: 234mm x 188mm x 28mm
- Gewicht: 839g
- ISBN-13: 9781394294374
- ISBN-10: 1394294379
- Artikelnr.: 72118828
- Herstellerkennzeichnung
- Libri GmbH
- Europaallee 1
- 36244 Bad Hersfeld
- gpsr@libri.de
- Verlag: Wiley
- Seitenzahl: 416
- Erscheinungstermin: 13. Mai 2025
- Englisch
- Abmessung: 234mm x 188mm x 28mm
- Gewicht: 839g
- ISBN-13: 9781394294374
- ISBN-10: 1394294379
- Artikelnr.: 72118828
- Herstellerkennzeichnung
- Libri GmbH
- Europaallee 1
- 36244 Bad Hersfeld
- gpsr@libri.de
Resve Saleh, (PhD, UC Berkeley) is a Professor Emeritus at the University of British Columbia. He worked for a decade as a professor at the University of Illinois and as a visiting professor at Stanford University. He was Founder and Chairman of Simplex Solutions, Inc., which went public in 2001. He is an IEEE Fellow and Fellow of the Canadian Academy of Engineering. Sohaib Majzoub, (PhD, University of British Columbia) is an Associate Professor at the University of Sharjah, UAE. He also taught at the American University in Dubai, UAE and at King Saud University, KSA, and a visiting professor at Delft Technical University in The Netherlands. He is a Senior Member of the IEEE. A. K. MD. Ehsanes Saleh, (PhD, University of Western Ontario) is a Professor Emeritus and Distinguished Professor in the School of Mathematics and Statistics, Carleton University, Ottawa, Canada. He also taught as Simon Fraser University, the University of Toronto, and Stanford University. He is a Fellow of IMS, ASA and an Honorary Member of SSC, Canada.
Preface xv
About the Companion Website xix
1 Introduction 1
1.1 Defining Outliers 2
1.2 Overview of the Book 3
1.3 What Is Robust Machine Learning? 3
1.3.1 Machine Learning Basics 4
1.3.2 Effect of Outliers 6
1.3.3 What Is Robust Data Science? 7
1.3.4 Noise in Datasets 7
1.3.5 Training and Testing Flows 8
1.4 Robustness of the Median 9
1.4.1 Mean vs. Median 9
1.4.2 Effect on Standard Deviation 10
1.5 l 1 and l 2 Norms 11
1.6 Review of Gaussian Distribution 12
1.7 Unsupervised Learning Case Study 13
1.7.1 Clustering Example 14
1.7.2 Clustering Problem Specification 14
1.8 Creating Synthetic Data for Clustering 16
1.8.1 One-Dimensional Datasets 16
1.8.2 Multidimensional Datasets 17
1.9 Clustering Algorithms 19
1.9.1 k-Means Clustering 19
1.9.2 k-Medians Clustering 21
1.10 Importance of Robust Clustering 22
1.10.1 Clustering with No Outliers 22
1.10.2 Clustering with Outliers 23
1.10.3 Detection and Removal of Outliers 25
1.11 Summary 27
Problems 28
References 34
2 Robust Linear Regression 35
2.1 Introduction 35
2.2 Supervised Learning 35
2.3 Linear Regression 36
2.4 Importance of Residuals 38
2.4.1 Defining Errors and Residuals 38
2.4.2 Residuals in Loss Functions 39
2.4.3 Distribution of Residuals 40
2.5 Estimation Background 42
2.5.1 Linear Models 42
2.5.2 Desirable Properties of Estimators 43
2.5.3 Maximum-Likelihood Estimation 44
2.5.4 Gradient Descent 47
2.6 M-Estimation 49
2.7 Least Squares Estimation (LSE) 52
2.8 Least Absolute Deviation (LAD) 54
2.9 Comparison of LSE and LAD 55
2.9.1 Simple Linear Model 55
2.9.2 Location Problem 56
2.10 Huber's Method 58
2.10.1 Huber Loss Function 58
2.10.2 Comparison with LSE and LAD 63
2.11 Summary 64
Problems 64
References 67
3 The Log-Cosh Loss Function 69
3.1 Introduction 69
3.2 An Intuitive View of Log-Cosh 69
3.3 Hyperbolic Functions 71
3.4 M-Estimation 71
3.4.1 Asymptotic Behavior 72
3.4.2 Linear Regression Using Log-Cosh 74
3.5 Deriving the Distribution for Log-Cosh 75
3.6 Standard Errors for Robust Estimators 79
3.6.1 Example: Swiss Fertility Dataset 81
3.6.2 Example: Boston Housing Dataset 82
3.7 Statistical Properties of Log-Cosh Loss 83
3.7.1 Maximum-Likelihood Estimation 83
3.8 A General Log-Cosh Loss Function 84
3.9 Summary 88
Problems 88
References 93
4 Outlier Detection, Metrics, and Standardization 95
4.1 Introduction 95
4.2 Effect of Outliers 95
4.3 Outlier Diagnosis 97
4.3.1 Boxplots 98
4.3.2 Histogram Plots 100
4.3.3 Exploratory Data Analysis 101
4.4 Outlier Detection 102
4.4.1 3-Sigma Edit Rule 102
4.4.2 4.5-MAD Edit Rule 104
4.4.3 1.5-IQR Edit Rule 105
4.5 Outlier Removal 105
4.5.1 Trimming Methods 105
4.5.2 Winsorization 105
4.5.3 Anomaly Detection Method 106
4.6 Regression-Based Outlier Detection 107
4.6.1 LS vs. LC Residuals 108
4.6.2 Comparison of Detection Methods 109
4.6.3 Ordered Absolute Residuals (OARs) 110
4.6.4 Quantile-Quantile Plot 111
4.6.5 Quad-Plots for Outlier Diagnosis 113
4.7 Regression-Based Outlier Removal 114
4.7.1 Iterative Boxplot Method 114
4.8 Regression Metrics with Outliers 116
4.8.1 Mean Square Error (MSE) 117
4.8.2 Median Absolute Error (MAE) 118
4.8.3 MSE vs. MAE on Realistic Data 119
4.8.4 Selecting Hyperparameters for Robust Regression 120
4.9 Dataset Standardization 121
4.9.1 Robust Standardization 122
4.10 Summary 126
Problems 126
References 131
5 Robustness of Penalty Estimators 133
5.1 Introduction 133
5.2 Penalty Functions 133
5.2.1 Multicollinearity 133
5.2.2 Penalized Loss Functions 135
5.3 Ridge Penalty 136
5.4 LASSO Penalty 137
5.5 Effect of Penalty Functions 138
5.6 Penalty Functions with Outliers 139
5.7 Ridge Traces 142
5.8 Elastic Net (Enet) Penalty 143
5.9 Adaptive LASSO (aLASSO) Penalty 145
5.10 Penalty Effects on Variance and Bias 146
5.10.1 Effect on Variance 146
5.10.2 Geometric Interpretation of Bias 148
5.11 Variable Importance 151
5.11.1 The t-Statistic 151
5.11.2 LASSO and aLASSO Traces 153
5.12 Summary 155
Problems 156
References 159
6 Robust Regularized Models 161
6.1 Introduction 161
6.2 Overfitting and Underfitting 161
6.3 The Bias-Variance Trade-Off 162
6.4 Regularization with Ridge 164
6.4.1 Selection of Hyperparameter ¿ 165
6.4.2 Example: Diabetes Dataset 167
6.5 Generalization using Robust Estimators 169
6.5.1 Training and Test Sets 169
6.5.2 k-Fold Cross-validation 171
6.6 Robust Generalization and Regularization 173
6.6.1 Regularization with LC-Ridge 174
6.7 Model Complexity 175
6.7.1 Variable Selection Using LS-LASSO 176
6.7.2 Variable Ordering Using LC-aLASSO 176
6.7.3 Building a Compact Model 179
6.8 Summary 182
Problems 182
References 186
7 Quantile Regression Using Log-Cosh 187
7.1 Introduction 187
7.2 Understanding Quantile Regression 188
7.3 The Crossing Problem 189
7.4 Standard Quantile Loss Function 190
7.5 Smooth Regression Quantiles (SMRQ) 192
7.6 Evaluation of Quantile Methods 195
7.6.1 Qualitative Assessment 196
7.6.2 Quantitative Assessment 198
7.7 Selection of Robustness Coefficient 200
7.8 Maximum-Likelihood Procedure for SMRQ 202
7.9 Standard Error Computation 204
7.10 Summary 206
Problems 207
References 209
8 Robust Binary Classification 211
8.1 Introduction 211
8.2 Binary Classification Problem 212
8.2.1 Why Linear Regression Fails 212
8.2.2 Outliers in Binary Classification 213
8.3 The Cross-Entropy (CE) Loss 215
8.3.1 Deriving the Cross-Entropy Loss 216
8.3.2 Understanding Logistic Regression 218
8.3.3 Gradient Descent 221
8.4 The Log-Cosh (LC) Loss Function 221
8.4.1 General Formulation 223
8.5 Algorithms for Logistic Regression 224
8.6 Example: Motor Trend Cars 226
8.7 Regularization of Logistic Regression 227
8.7.1 Overfitting and Underfitting 228
8.7.2 k-Fold Cross-Validation 229
8.7.3 Penalty Functions 229
8.7.4 Effect of Outliers 230
8.8 Example: Circular Dataset 231
8.9 Outlier Detection 234
8.10 Robustness of Binary Classifiers 235
8.10.1 Support Vector Classifier (SVC) 235
8.10.2 Support Vector Machines (SVMs) 238
8.10.3 k-Nearest Neighbors (k-NN) 241
8.10.4 Decision Trees and Random Forest 243
8.11 Summary 244
Problems 244
Reference 249
9 Neural Networks Using Log-Cosh 251
9.1 Introduction 251
9.2 A Brief History of Neural Networks 251
9.3 Defining Neural Networks 252
9.3.1 Basic Computational Unit 253
9.3.2 Four-Layer Neural Network 254
9.3.3 Activation Functions 255
9.4 Training of Neural Networks 257
9.5 Forward and Backward Propagation 258
9.5.1 Forward Propagation 259
9.5.2 Backward Propagation 260
9.5.3 Log-Cosh Gradients 263
9.6 Cross-entropy and Log-Cosh Algorithms 264
9.7 Example: Circular Dataset 266
9.8 Classification Metrics and Outliers 269
9.8.1 Precision, Recall, F 1 Score 269
9.8.2 Receiver Operating Characteristics (ROCs) 271
9.9 Summary 273
Problems 273
References 280
10 Multi-class Classification and Adam Optimization 281
10.1 Introduction 281
10.2 Multi-class Classification 281
10.2.1 Multi-class Loss Functions 282
10.2.2 Softmax Activation Function 284
10.3 Example: MNIST Dataset 288
10.3.1 Neural Network Architecture 289
10.3.2 Comparing Cross-Entropy with Log-Cosh Losses 289
10.3.3 Outliers in MNIST 291
10.4 Optimization of Neural Networks 291
10.4.1 Momentum 293
10.4.2 rmsprop Approach 294
10.4.3 Optimizer Warm-Up Phase 295
10.4.4 Adam Optimizer 296
10.5 Summary 297
Problems 297
References 302
11 Anomaly Detection and Evaluation Metrics 303
11.1 Introduction 303
11.2 Anomaly Detection Methods 303
11.2.1 k-Nearest Neighbors 304
11.2.2 Dbscan 308
11.2.3 Isolation Forest 311
11.3 Anomaly Detection Using MADmax 316
11.3.1 Robust Standardization 317
11.3.2 k-Medians Clustering 317
11.3.3 Selecting MADmax 319
11.3.4 k-Nearest Neighbors (k-NN) 319
11.3.5 k-Nearest Medians (k-NM) 320
11.4 Qualitative Evaluation Methods 323
11.5 Quantitative Evaluation Methods 326
11.6 Summary 330
Problems 330
Reference 336
12 Case Studies in Data Science 337
12.1 Introduction 337
12.2 Example: Boston Housing Dataset 337
12.2.1 Exploratory Data Analysis 338
12.2.2 Neural Network Architecture 339
12.2.3 Comparison of LSNN and LCNN 342
12.2.4 Predicting Housing Prices 344
12.2.5 RMSE vs. MAE 344
12.2.6 Correlation Coefficients 345
12.3 Example: Titanic Dataset 346
12.3.1 Exploratory Data Analysis 346
12.3.2 LCLR vs. CELR 351
12.3.3 Outlier Detection and Removal 353
12.3.4 Robustness Coefficient for Log-Cosh 355
12.3.5 The Implications of Robustness 356
12.3.6 Ridge and aLASSO 357
12.4 Application to Explainable Artificial Intelligence (XAI) 359
12.4.1 Case Study: Logistic Regression 360
12.4.2 Case Study: Neural Networks 365
12.5 Time Series Example: Climate Change 366
12.5.1 Autoregressive Model 367
12.5.2 Forecasting Using AR(p) 369
12.5.3 Stationary Time Series 371
12.5.4 Moving Average 374
12.5.5 Finding Outliers in Time Series 375
12.6 Summary and Conclusions 376
Problems 376
References 382
Index 383
About the Companion Website xix
1 Introduction 1
1.1 Defining Outliers 2
1.2 Overview of the Book 3
1.3 What Is Robust Machine Learning? 3
1.3.1 Machine Learning Basics 4
1.3.2 Effect of Outliers 6
1.3.3 What Is Robust Data Science? 7
1.3.4 Noise in Datasets 7
1.3.5 Training and Testing Flows 8
1.4 Robustness of the Median 9
1.4.1 Mean vs. Median 9
1.4.2 Effect on Standard Deviation 10
1.5 l 1 and l 2 Norms 11
1.6 Review of Gaussian Distribution 12
1.7 Unsupervised Learning Case Study 13
1.7.1 Clustering Example 14
1.7.2 Clustering Problem Specification 14
1.8 Creating Synthetic Data for Clustering 16
1.8.1 One-Dimensional Datasets 16
1.8.2 Multidimensional Datasets 17
1.9 Clustering Algorithms 19
1.9.1 k-Means Clustering 19
1.9.2 k-Medians Clustering 21
1.10 Importance of Robust Clustering 22
1.10.1 Clustering with No Outliers 22
1.10.2 Clustering with Outliers 23
1.10.3 Detection and Removal of Outliers 25
1.11 Summary 27
Problems 28
References 34
2 Robust Linear Regression 35
2.1 Introduction 35
2.2 Supervised Learning 35
2.3 Linear Regression 36
2.4 Importance of Residuals 38
2.4.1 Defining Errors and Residuals 38
2.4.2 Residuals in Loss Functions 39
2.4.3 Distribution of Residuals 40
2.5 Estimation Background 42
2.5.1 Linear Models 42
2.5.2 Desirable Properties of Estimators 43
2.5.3 Maximum-Likelihood Estimation 44
2.5.4 Gradient Descent 47
2.6 M-Estimation 49
2.7 Least Squares Estimation (LSE) 52
2.8 Least Absolute Deviation (LAD) 54
2.9 Comparison of LSE and LAD 55
2.9.1 Simple Linear Model 55
2.9.2 Location Problem 56
2.10 Huber's Method 58
2.10.1 Huber Loss Function 58
2.10.2 Comparison with LSE and LAD 63
2.11 Summary 64
Problems 64
References 67
3 The Log-Cosh Loss Function 69
3.1 Introduction 69
3.2 An Intuitive View of Log-Cosh 69
3.3 Hyperbolic Functions 71
3.4 M-Estimation 71
3.4.1 Asymptotic Behavior 72
3.4.2 Linear Regression Using Log-Cosh 74
3.5 Deriving the Distribution for Log-Cosh 75
3.6 Standard Errors for Robust Estimators 79
3.6.1 Example: Swiss Fertility Dataset 81
3.6.2 Example: Boston Housing Dataset 82
3.7 Statistical Properties of Log-Cosh Loss 83
3.7.1 Maximum-Likelihood Estimation 83
3.8 A General Log-Cosh Loss Function 84
3.9 Summary 88
Problems 88
References 93
4 Outlier Detection, Metrics, and Standardization 95
4.1 Introduction 95
4.2 Effect of Outliers 95
4.3 Outlier Diagnosis 97
4.3.1 Boxplots 98
4.3.2 Histogram Plots 100
4.3.3 Exploratory Data Analysis 101
4.4 Outlier Detection 102
4.4.1 3-Sigma Edit Rule 102
4.4.2 4.5-MAD Edit Rule 104
4.4.3 1.5-IQR Edit Rule 105
4.5 Outlier Removal 105
4.5.1 Trimming Methods 105
4.5.2 Winsorization 105
4.5.3 Anomaly Detection Method 106
4.6 Regression-Based Outlier Detection 107
4.6.1 LS vs. LC Residuals 108
4.6.2 Comparison of Detection Methods 109
4.6.3 Ordered Absolute Residuals (OARs) 110
4.6.4 Quantile-Quantile Plot 111
4.6.5 Quad-Plots for Outlier Diagnosis 113
4.7 Regression-Based Outlier Removal 114
4.7.1 Iterative Boxplot Method 114
4.8 Regression Metrics with Outliers 116
4.8.1 Mean Square Error (MSE) 117
4.8.2 Median Absolute Error (MAE) 118
4.8.3 MSE vs. MAE on Realistic Data 119
4.8.4 Selecting Hyperparameters for Robust Regression 120
4.9 Dataset Standardization 121
4.9.1 Robust Standardization 122
4.10 Summary 126
Problems 126
References 131
5 Robustness of Penalty Estimators 133
5.1 Introduction 133
5.2 Penalty Functions 133
5.2.1 Multicollinearity 133
5.2.2 Penalized Loss Functions 135
5.3 Ridge Penalty 136
5.4 LASSO Penalty 137
5.5 Effect of Penalty Functions 138
5.6 Penalty Functions with Outliers 139
5.7 Ridge Traces 142
5.8 Elastic Net (Enet) Penalty 143
5.9 Adaptive LASSO (aLASSO) Penalty 145
5.10 Penalty Effects on Variance and Bias 146
5.10.1 Effect on Variance 146
5.10.2 Geometric Interpretation of Bias 148
5.11 Variable Importance 151
5.11.1 The t-Statistic 151
5.11.2 LASSO and aLASSO Traces 153
5.12 Summary 155
Problems 156
References 159
6 Robust Regularized Models 161
6.1 Introduction 161
6.2 Overfitting and Underfitting 161
6.3 The Bias-Variance Trade-Off 162
6.4 Regularization with Ridge 164
6.4.1 Selection of Hyperparameter ¿ 165
6.4.2 Example: Diabetes Dataset 167
6.5 Generalization using Robust Estimators 169
6.5.1 Training and Test Sets 169
6.5.2 k-Fold Cross-validation 171
6.6 Robust Generalization and Regularization 173
6.6.1 Regularization with LC-Ridge 174
6.7 Model Complexity 175
6.7.1 Variable Selection Using LS-LASSO 176
6.7.2 Variable Ordering Using LC-aLASSO 176
6.7.3 Building a Compact Model 179
6.8 Summary 182
Problems 182
References 186
7 Quantile Regression Using Log-Cosh 187
7.1 Introduction 187
7.2 Understanding Quantile Regression 188
7.3 The Crossing Problem 189
7.4 Standard Quantile Loss Function 190
7.5 Smooth Regression Quantiles (SMRQ) 192
7.6 Evaluation of Quantile Methods 195
7.6.1 Qualitative Assessment 196
7.6.2 Quantitative Assessment 198
7.7 Selection of Robustness Coefficient 200
7.8 Maximum-Likelihood Procedure for SMRQ 202
7.9 Standard Error Computation 204
7.10 Summary 206
Problems 207
References 209
8 Robust Binary Classification 211
8.1 Introduction 211
8.2 Binary Classification Problem 212
8.2.1 Why Linear Regression Fails 212
8.2.2 Outliers in Binary Classification 213
8.3 The Cross-Entropy (CE) Loss 215
8.3.1 Deriving the Cross-Entropy Loss 216
8.3.2 Understanding Logistic Regression 218
8.3.3 Gradient Descent 221
8.4 The Log-Cosh (LC) Loss Function 221
8.4.1 General Formulation 223
8.5 Algorithms for Logistic Regression 224
8.6 Example: Motor Trend Cars 226
8.7 Regularization of Logistic Regression 227
8.7.1 Overfitting and Underfitting 228
8.7.2 k-Fold Cross-Validation 229
8.7.3 Penalty Functions 229
8.7.4 Effect of Outliers 230
8.8 Example: Circular Dataset 231
8.9 Outlier Detection 234
8.10 Robustness of Binary Classifiers 235
8.10.1 Support Vector Classifier (SVC) 235
8.10.2 Support Vector Machines (SVMs) 238
8.10.3 k-Nearest Neighbors (k-NN) 241
8.10.4 Decision Trees and Random Forest 243
8.11 Summary 244
Problems 244
Reference 249
9 Neural Networks Using Log-Cosh 251
9.1 Introduction 251
9.2 A Brief History of Neural Networks 251
9.3 Defining Neural Networks 252
9.3.1 Basic Computational Unit 253
9.3.2 Four-Layer Neural Network 254
9.3.3 Activation Functions 255
9.4 Training of Neural Networks 257
9.5 Forward and Backward Propagation 258
9.5.1 Forward Propagation 259
9.5.2 Backward Propagation 260
9.5.3 Log-Cosh Gradients 263
9.6 Cross-entropy and Log-Cosh Algorithms 264
9.7 Example: Circular Dataset 266
9.8 Classification Metrics and Outliers 269
9.8.1 Precision, Recall, F 1 Score 269
9.8.2 Receiver Operating Characteristics (ROCs) 271
9.9 Summary 273
Problems 273
References 280
10 Multi-class Classification and Adam Optimization 281
10.1 Introduction 281
10.2 Multi-class Classification 281
10.2.1 Multi-class Loss Functions 282
10.2.2 Softmax Activation Function 284
10.3 Example: MNIST Dataset 288
10.3.1 Neural Network Architecture 289
10.3.2 Comparing Cross-Entropy with Log-Cosh Losses 289
10.3.3 Outliers in MNIST 291
10.4 Optimization of Neural Networks 291
10.4.1 Momentum 293
10.4.2 rmsprop Approach 294
10.4.3 Optimizer Warm-Up Phase 295
10.4.4 Adam Optimizer 296
10.5 Summary 297
Problems 297
References 302
11 Anomaly Detection and Evaluation Metrics 303
11.1 Introduction 303
11.2 Anomaly Detection Methods 303
11.2.1 k-Nearest Neighbors 304
11.2.2 Dbscan 308
11.2.3 Isolation Forest 311
11.3 Anomaly Detection Using MADmax 316
11.3.1 Robust Standardization 317
11.3.2 k-Medians Clustering 317
11.3.3 Selecting MADmax 319
11.3.4 k-Nearest Neighbors (k-NN) 319
11.3.5 k-Nearest Medians (k-NM) 320
11.4 Qualitative Evaluation Methods 323
11.5 Quantitative Evaluation Methods 326
11.6 Summary 330
Problems 330
Reference 336
12 Case Studies in Data Science 337
12.1 Introduction 337
12.2 Example: Boston Housing Dataset 337
12.2.1 Exploratory Data Analysis 338
12.2.2 Neural Network Architecture 339
12.2.3 Comparison of LSNN and LCNN 342
12.2.4 Predicting Housing Prices 344
12.2.5 RMSE vs. MAE 344
12.2.6 Correlation Coefficients 345
12.3 Example: Titanic Dataset 346
12.3.1 Exploratory Data Analysis 346
12.3.2 LCLR vs. CELR 351
12.3.3 Outlier Detection and Removal 353
12.3.4 Robustness Coefficient for Log-Cosh 355
12.3.5 The Implications of Robustness 356
12.3.6 Ridge and aLASSO 357
12.4 Application to Explainable Artificial Intelligence (XAI) 359
12.4.1 Case Study: Logistic Regression 360
12.4.2 Case Study: Neural Networks 365
12.5 Time Series Example: Climate Change 366
12.5.1 Autoregressive Model 367
12.5.2 Forecasting Using AR(p) 369
12.5.3 Stationary Time Series 371
12.5.4 Moving Average 374
12.5.5 Finding Outliers in Time Series 375
12.6 Summary and Conclusions 376
Problems 376
References 382
Index 383
Preface xv
About the Companion Website xix
1 Introduction 1
1.1 Defining Outliers 2
1.2 Overview of the Book 3
1.3 What Is Robust Machine Learning? 3
1.3.1 Machine Learning Basics 4
1.3.2 Effect of Outliers 6
1.3.3 What Is Robust Data Science? 7
1.3.4 Noise in Datasets 7
1.3.5 Training and Testing Flows 8
1.4 Robustness of the Median 9
1.4.1 Mean vs. Median 9
1.4.2 Effect on Standard Deviation 10
1.5 l 1 and l 2 Norms 11
1.6 Review of Gaussian Distribution 12
1.7 Unsupervised Learning Case Study 13
1.7.1 Clustering Example 14
1.7.2 Clustering Problem Specification 14
1.8 Creating Synthetic Data for Clustering 16
1.8.1 One-Dimensional Datasets 16
1.8.2 Multidimensional Datasets 17
1.9 Clustering Algorithms 19
1.9.1 k-Means Clustering 19
1.9.2 k-Medians Clustering 21
1.10 Importance of Robust Clustering 22
1.10.1 Clustering with No Outliers 22
1.10.2 Clustering with Outliers 23
1.10.3 Detection and Removal of Outliers 25
1.11 Summary 27
Problems 28
References 34
2 Robust Linear Regression 35
2.1 Introduction 35
2.2 Supervised Learning 35
2.3 Linear Regression 36
2.4 Importance of Residuals 38
2.4.1 Defining Errors and Residuals 38
2.4.2 Residuals in Loss Functions 39
2.4.3 Distribution of Residuals 40
2.5 Estimation Background 42
2.5.1 Linear Models 42
2.5.2 Desirable Properties of Estimators 43
2.5.3 Maximum-Likelihood Estimation 44
2.5.4 Gradient Descent 47
2.6 M-Estimation 49
2.7 Least Squares Estimation (LSE) 52
2.8 Least Absolute Deviation (LAD) 54
2.9 Comparison of LSE and LAD 55
2.9.1 Simple Linear Model 55
2.9.2 Location Problem 56
2.10 Huber's Method 58
2.10.1 Huber Loss Function 58
2.10.2 Comparison with LSE and LAD 63
2.11 Summary 64
Problems 64
References 67
3 The Log-Cosh Loss Function 69
3.1 Introduction 69
3.2 An Intuitive View of Log-Cosh 69
3.3 Hyperbolic Functions 71
3.4 M-Estimation 71
3.4.1 Asymptotic Behavior 72
3.4.2 Linear Regression Using Log-Cosh 74
3.5 Deriving the Distribution for Log-Cosh 75
3.6 Standard Errors for Robust Estimators 79
3.6.1 Example: Swiss Fertility Dataset 81
3.6.2 Example: Boston Housing Dataset 82
3.7 Statistical Properties of Log-Cosh Loss 83
3.7.1 Maximum-Likelihood Estimation 83
3.8 A General Log-Cosh Loss Function 84
3.9 Summary 88
Problems 88
References 93
4 Outlier Detection, Metrics, and Standardization 95
4.1 Introduction 95
4.2 Effect of Outliers 95
4.3 Outlier Diagnosis 97
4.3.1 Boxplots 98
4.3.2 Histogram Plots 100
4.3.3 Exploratory Data Analysis 101
4.4 Outlier Detection 102
4.4.1 3-Sigma Edit Rule 102
4.4.2 4.5-MAD Edit Rule 104
4.4.3 1.5-IQR Edit Rule 105
4.5 Outlier Removal 105
4.5.1 Trimming Methods 105
4.5.2 Winsorization 105
4.5.3 Anomaly Detection Method 106
4.6 Regression-Based Outlier Detection 107
4.6.1 LS vs. LC Residuals 108
4.6.2 Comparison of Detection Methods 109
4.6.3 Ordered Absolute Residuals (OARs) 110
4.6.4 Quantile-Quantile Plot 111
4.6.5 Quad-Plots for Outlier Diagnosis 113
4.7 Regression-Based Outlier Removal 114
4.7.1 Iterative Boxplot Method 114
4.8 Regression Metrics with Outliers 116
4.8.1 Mean Square Error (MSE) 117
4.8.2 Median Absolute Error (MAE) 118
4.8.3 MSE vs. MAE on Realistic Data 119
4.8.4 Selecting Hyperparameters for Robust Regression 120
4.9 Dataset Standardization 121
4.9.1 Robust Standardization 122
4.10 Summary 126
Problems 126
References 131
5 Robustness of Penalty Estimators 133
5.1 Introduction 133
5.2 Penalty Functions 133
5.2.1 Multicollinearity 133
5.2.2 Penalized Loss Functions 135
5.3 Ridge Penalty 136
5.4 LASSO Penalty 137
5.5 Effect of Penalty Functions 138
5.6 Penalty Functions with Outliers 139
5.7 Ridge Traces 142
5.8 Elastic Net (Enet) Penalty 143
5.9 Adaptive LASSO (aLASSO) Penalty 145
5.10 Penalty Effects on Variance and Bias 146
5.10.1 Effect on Variance 146
5.10.2 Geometric Interpretation of Bias 148
5.11 Variable Importance 151
5.11.1 The t-Statistic 151
5.11.2 LASSO and aLASSO Traces 153
5.12 Summary 155
Problems 156
References 159
6 Robust Regularized Models 161
6.1 Introduction 161
6.2 Overfitting and Underfitting 161
6.3 The Bias-Variance Trade-Off 162
6.4 Regularization with Ridge 164
6.4.1 Selection of Hyperparameter ¿ 165
6.4.2 Example: Diabetes Dataset 167
6.5 Generalization using Robust Estimators 169
6.5.1 Training and Test Sets 169
6.5.2 k-Fold Cross-validation 171
6.6 Robust Generalization and Regularization 173
6.6.1 Regularization with LC-Ridge 174
6.7 Model Complexity 175
6.7.1 Variable Selection Using LS-LASSO 176
6.7.2 Variable Ordering Using LC-aLASSO 176
6.7.3 Building a Compact Model 179
6.8 Summary 182
Problems 182
References 186
7 Quantile Regression Using Log-Cosh 187
7.1 Introduction 187
7.2 Understanding Quantile Regression 188
7.3 The Crossing Problem 189
7.4 Standard Quantile Loss Function 190
7.5 Smooth Regression Quantiles (SMRQ) 192
7.6 Evaluation of Quantile Methods 195
7.6.1 Qualitative Assessment 196
7.6.2 Quantitative Assessment 198
7.7 Selection of Robustness Coefficient 200
7.8 Maximum-Likelihood Procedure for SMRQ 202
7.9 Standard Error Computation 204
7.10 Summary 206
Problems 207
References 209
8 Robust Binary Classification 211
8.1 Introduction 211
8.2 Binary Classification Problem 212
8.2.1 Why Linear Regression Fails 212
8.2.2 Outliers in Binary Classification 213
8.3 The Cross-Entropy (CE) Loss 215
8.3.1 Deriving the Cross-Entropy Loss 216
8.3.2 Understanding Logistic Regression 218
8.3.3 Gradient Descent 221
8.4 The Log-Cosh (LC) Loss Function 221
8.4.1 General Formulation 223
8.5 Algorithms for Logistic Regression 224
8.6 Example: Motor Trend Cars 226
8.7 Regularization of Logistic Regression 227
8.7.1 Overfitting and Underfitting 228
8.7.2 k-Fold Cross-Validation 229
8.7.3 Penalty Functions 229
8.7.4 Effect of Outliers 230
8.8 Example: Circular Dataset 231
8.9 Outlier Detection 234
8.10 Robustness of Binary Classifiers 235
8.10.1 Support Vector Classifier (SVC) 235
8.10.2 Support Vector Machines (SVMs) 238
8.10.3 k-Nearest Neighbors (k-NN) 241
8.10.4 Decision Trees and Random Forest 243
8.11 Summary 244
Problems 244
Reference 249
9 Neural Networks Using Log-Cosh 251
9.1 Introduction 251
9.2 A Brief History of Neural Networks 251
9.3 Defining Neural Networks 252
9.3.1 Basic Computational Unit 253
9.3.2 Four-Layer Neural Network 254
9.3.3 Activation Functions 255
9.4 Training of Neural Networks 257
9.5 Forward and Backward Propagation 258
9.5.1 Forward Propagation 259
9.5.2 Backward Propagation 260
9.5.3 Log-Cosh Gradients 263
9.6 Cross-entropy and Log-Cosh Algorithms 264
9.7 Example: Circular Dataset 266
9.8 Classification Metrics and Outliers 269
9.8.1 Precision, Recall, F 1 Score 269
9.8.2 Receiver Operating Characteristics (ROCs) 271
9.9 Summary 273
Problems 273
References 280
10 Multi-class Classification and Adam Optimization 281
10.1 Introduction 281
10.2 Multi-class Classification 281
10.2.1 Multi-class Loss Functions 282
10.2.2 Softmax Activation Function 284
10.3 Example: MNIST Dataset 288
10.3.1 Neural Network Architecture 289
10.3.2 Comparing Cross-Entropy with Log-Cosh Losses 289
10.3.3 Outliers in MNIST 291
10.4 Optimization of Neural Networks 291
10.4.1 Momentum 293
10.4.2 rmsprop Approach 294
10.4.3 Optimizer Warm-Up Phase 295
10.4.4 Adam Optimizer 296
10.5 Summary 297
Problems 297
References 302
11 Anomaly Detection and Evaluation Metrics 303
11.1 Introduction 303
11.2 Anomaly Detection Methods 303
11.2.1 k-Nearest Neighbors 304
11.2.2 Dbscan 308
11.2.3 Isolation Forest 311
11.3 Anomaly Detection Using MADmax 316
11.3.1 Robust Standardization 317
11.3.2 k-Medians Clustering 317
11.3.3 Selecting MADmax 319
11.3.4 k-Nearest Neighbors (k-NN) 319
11.3.5 k-Nearest Medians (k-NM) 320
11.4 Qualitative Evaluation Methods 323
11.5 Quantitative Evaluation Methods 326
11.6 Summary 330
Problems 330
Reference 336
12 Case Studies in Data Science 337
12.1 Introduction 337
12.2 Example: Boston Housing Dataset 337
12.2.1 Exploratory Data Analysis 338
12.2.2 Neural Network Architecture 339
12.2.3 Comparison of LSNN and LCNN 342
12.2.4 Predicting Housing Prices 344
12.2.5 RMSE vs. MAE 344
12.2.6 Correlation Coefficients 345
12.3 Example: Titanic Dataset 346
12.3.1 Exploratory Data Analysis 346
12.3.2 LCLR vs. CELR 351
12.3.3 Outlier Detection and Removal 353
12.3.4 Robustness Coefficient for Log-Cosh 355
12.3.5 The Implications of Robustness 356
12.3.6 Ridge and aLASSO 357
12.4 Application to Explainable Artificial Intelligence (XAI) 359
12.4.1 Case Study: Logistic Regression 360
12.4.2 Case Study: Neural Networks 365
12.5 Time Series Example: Climate Change 366
12.5.1 Autoregressive Model 367
12.5.2 Forecasting Using AR(p) 369
12.5.3 Stationary Time Series 371
12.5.4 Moving Average 374
12.5.5 Finding Outliers in Time Series 375
12.6 Summary and Conclusions 376
Problems 376
References 382
Index 383
About the Companion Website xix
1 Introduction 1
1.1 Defining Outliers 2
1.2 Overview of the Book 3
1.3 What Is Robust Machine Learning? 3
1.3.1 Machine Learning Basics 4
1.3.2 Effect of Outliers 6
1.3.3 What Is Robust Data Science? 7
1.3.4 Noise in Datasets 7
1.3.5 Training and Testing Flows 8
1.4 Robustness of the Median 9
1.4.1 Mean vs. Median 9
1.4.2 Effect on Standard Deviation 10
1.5 l 1 and l 2 Norms 11
1.6 Review of Gaussian Distribution 12
1.7 Unsupervised Learning Case Study 13
1.7.1 Clustering Example 14
1.7.2 Clustering Problem Specification 14
1.8 Creating Synthetic Data for Clustering 16
1.8.1 One-Dimensional Datasets 16
1.8.2 Multidimensional Datasets 17
1.9 Clustering Algorithms 19
1.9.1 k-Means Clustering 19
1.9.2 k-Medians Clustering 21
1.10 Importance of Robust Clustering 22
1.10.1 Clustering with No Outliers 22
1.10.2 Clustering with Outliers 23
1.10.3 Detection and Removal of Outliers 25
1.11 Summary 27
Problems 28
References 34
2 Robust Linear Regression 35
2.1 Introduction 35
2.2 Supervised Learning 35
2.3 Linear Regression 36
2.4 Importance of Residuals 38
2.4.1 Defining Errors and Residuals 38
2.4.2 Residuals in Loss Functions 39
2.4.3 Distribution of Residuals 40
2.5 Estimation Background 42
2.5.1 Linear Models 42
2.5.2 Desirable Properties of Estimators 43
2.5.3 Maximum-Likelihood Estimation 44
2.5.4 Gradient Descent 47
2.6 M-Estimation 49
2.7 Least Squares Estimation (LSE) 52
2.8 Least Absolute Deviation (LAD) 54
2.9 Comparison of LSE and LAD 55
2.9.1 Simple Linear Model 55
2.9.2 Location Problem 56
2.10 Huber's Method 58
2.10.1 Huber Loss Function 58
2.10.2 Comparison with LSE and LAD 63
2.11 Summary 64
Problems 64
References 67
3 The Log-Cosh Loss Function 69
3.1 Introduction 69
3.2 An Intuitive View of Log-Cosh 69
3.3 Hyperbolic Functions 71
3.4 M-Estimation 71
3.4.1 Asymptotic Behavior 72
3.4.2 Linear Regression Using Log-Cosh 74
3.5 Deriving the Distribution for Log-Cosh 75
3.6 Standard Errors for Robust Estimators 79
3.6.1 Example: Swiss Fertility Dataset 81
3.6.2 Example: Boston Housing Dataset 82
3.7 Statistical Properties of Log-Cosh Loss 83
3.7.1 Maximum-Likelihood Estimation 83
3.8 A General Log-Cosh Loss Function 84
3.9 Summary 88
Problems 88
References 93
4 Outlier Detection, Metrics, and Standardization 95
4.1 Introduction 95
4.2 Effect of Outliers 95
4.3 Outlier Diagnosis 97
4.3.1 Boxplots 98
4.3.2 Histogram Plots 100
4.3.3 Exploratory Data Analysis 101
4.4 Outlier Detection 102
4.4.1 3-Sigma Edit Rule 102
4.4.2 4.5-MAD Edit Rule 104
4.4.3 1.5-IQR Edit Rule 105
4.5 Outlier Removal 105
4.5.1 Trimming Methods 105
4.5.2 Winsorization 105
4.5.3 Anomaly Detection Method 106
4.6 Regression-Based Outlier Detection 107
4.6.1 LS vs. LC Residuals 108
4.6.2 Comparison of Detection Methods 109
4.6.3 Ordered Absolute Residuals (OARs) 110
4.6.4 Quantile-Quantile Plot 111
4.6.5 Quad-Plots for Outlier Diagnosis 113
4.7 Regression-Based Outlier Removal 114
4.7.1 Iterative Boxplot Method 114
4.8 Regression Metrics with Outliers 116
4.8.1 Mean Square Error (MSE) 117
4.8.2 Median Absolute Error (MAE) 118
4.8.3 MSE vs. MAE on Realistic Data 119
4.8.4 Selecting Hyperparameters for Robust Regression 120
4.9 Dataset Standardization 121
4.9.1 Robust Standardization 122
4.10 Summary 126
Problems 126
References 131
5 Robustness of Penalty Estimators 133
5.1 Introduction 133
5.2 Penalty Functions 133
5.2.1 Multicollinearity 133
5.2.2 Penalized Loss Functions 135
5.3 Ridge Penalty 136
5.4 LASSO Penalty 137
5.5 Effect of Penalty Functions 138
5.6 Penalty Functions with Outliers 139
5.7 Ridge Traces 142
5.8 Elastic Net (Enet) Penalty 143
5.9 Adaptive LASSO (aLASSO) Penalty 145
5.10 Penalty Effects on Variance and Bias 146
5.10.1 Effect on Variance 146
5.10.2 Geometric Interpretation of Bias 148
5.11 Variable Importance 151
5.11.1 The t-Statistic 151
5.11.2 LASSO and aLASSO Traces 153
5.12 Summary 155
Problems 156
References 159
6 Robust Regularized Models 161
6.1 Introduction 161
6.2 Overfitting and Underfitting 161
6.3 The Bias-Variance Trade-Off 162
6.4 Regularization with Ridge 164
6.4.1 Selection of Hyperparameter ¿ 165
6.4.2 Example: Diabetes Dataset 167
6.5 Generalization using Robust Estimators 169
6.5.1 Training and Test Sets 169
6.5.2 k-Fold Cross-validation 171
6.6 Robust Generalization and Regularization 173
6.6.1 Regularization with LC-Ridge 174
6.7 Model Complexity 175
6.7.1 Variable Selection Using LS-LASSO 176
6.7.2 Variable Ordering Using LC-aLASSO 176
6.7.3 Building a Compact Model 179
6.8 Summary 182
Problems 182
References 186
7 Quantile Regression Using Log-Cosh 187
7.1 Introduction 187
7.2 Understanding Quantile Regression 188
7.3 The Crossing Problem 189
7.4 Standard Quantile Loss Function 190
7.5 Smooth Regression Quantiles (SMRQ) 192
7.6 Evaluation of Quantile Methods 195
7.6.1 Qualitative Assessment 196
7.6.2 Quantitative Assessment 198
7.7 Selection of Robustness Coefficient 200
7.8 Maximum-Likelihood Procedure for SMRQ 202
7.9 Standard Error Computation 204
7.10 Summary 206
Problems 207
References 209
8 Robust Binary Classification 211
8.1 Introduction 211
8.2 Binary Classification Problem 212
8.2.1 Why Linear Regression Fails 212
8.2.2 Outliers in Binary Classification 213
8.3 The Cross-Entropy (CE) Loss 215
8.3.1 Deriving the Cross-Entropy Loss 216
8.3.2 Understanding Logistic Regression 218
8.3.3 Gradient Descent 221
8.4 The Log-Cosh (LC) Loss Function 221
8.4.1 General Formulation 223
8.5 Algorithms for Logistic Regression 224
8.6 Example: Motor Trend Cars 226
8.7 Regularization of Logistic Regression 227
8.7.1 Overfitting and Underfitting 228
8.7.2 k-Fold Cross-Validation 229
8.7.3 Penalty Functions 229
8.7.4 Effect of Outliers 230
8.8 Example: Circular Dataset 231
8.9 Outlier Detection 234
8.10 Robustness of Binary Classifiers 235
8.10.1 Support Vector Classifier (SVC) 235
8.10.2 Support Vector Machines (SVMs) 238
8.10.3 k-Nearest Neighbors (k-NN) 241
8.10.4 Decision Trees and Random Forest 243
8.11 Summary 244
Problems 244
Reference 249
9 Neural Networks Using Log-Cosh 251
9.1 Introduction 251
9.2 A Brief History of Neural Networks 251
9.3 Defining Neural Networks 252
9.3.1 Basic Computational Unit 253
9.3.2 Four-Layer Neural Network 254
9.3.3 Activation Functions 255
9.4 Training of Neural Networks 257
9.5 Forward and Backward Propagation 258
9.5.1 Forward Propagation 259
9.5.2 Backward Propagation 260
9.5.3 Log-Cosh Gradients 263
9.6 Cross-entropy and Log-Cosh Algorithms 264
9.7 Example: Circular Dataset 266
9.8 Classification Metrics and Outliers 269
9.8.1 Precision, Recall, F 1 Score 269
9.8.2 Receiver Operating Characteristics (ROCs) 271
9.9 Summary 273
Problems 273
References 280
10 Multi-class Classification and Adam Optimization 281
10.1 Introduction 281
10.2 Multi-class Classification 281
10.2.1 Multi-class Loss Functions 282
10.2.2 Softmax Activation Function 284
10.3 Example: MNIST Dataset 288
10.3.1 Neural Network Architecture 289
10.3.2 Comparing Cross-Entropy with Log-Cosh Losses 289
10.3.3 Outliers in MNIST 291
10.4 Optimization of Neural Networks 291
10.4.1 Momentum 293
10.4.2 rmsprop Approach 294
10.4.3 Optimizer Warm-Up Phase 295
10.4.4 Adam Optimizer 296
10.5 Summary 297
Problems 297
References 302
11 Anomaly Detection and Evaluation Metrics 303
11.1 Introduction 303
11.2 Anomaly Detection Methods 303
11.2.1 k-Nearest Neighbors 304
11.2.2 Dbscan 308
11.2.3 Isolation Forest 311
11.3 Anomaly Detection Using MADmax 316
11.3.1 Robust Standardization 317
11.3.2 k-Medians Clustering 317
11.3.3 Selecting MADmax 319
11.3.4 k-Nearest Neighbors (k-NN) 319
11.3.5 k-Nearest Medians (k-NM) 320
11.4 Qualitative Evaluation Methods 323
11.5 Quantitative Evaluation Methods 326
11.6 Summary 330
Problems 330
Reference 336
12 Case Studies in Data Science 337
12.1 Introduction 337
12.2 Example: Boston Housing Dataset 337
12.2.1 Exploratory Data Analysis 338
12.2.2 Neural Network Architecture 339
12.2.3 Comparison of LSNN and LCNN 342
12.2.4 Predicting Housing Prices 344
12.2.5 RMSE vs. MAE 344
12.2.6 Correlation Coefficients 345
12.3 Example: Titanic Dataset 346
12.3.1 Exploratory Data Analysis 346
12.3.2 LCLR vs. CELR 351
12.3.3 Outlier Detection and Removal 353
12.3.4 Robustness Coefficient for Log-Cosh 355
12.3.5 The Implications of Robustness 356
12.3.6 Ridge and aLASSO 357
12.4 Application to Explainable Artificial Intelligence (XAI) 359
12.4.1 Case Study: Logistic Regression 360
12.4.2 Case Study: Neural Networks 365
12.5 Time Series Example: Climate Change 366
12.5.1 Autoregressive Model 367
12.5.2 Forecasting Using AR(p) 369
12.5.3 Stationary Time Series 371
12.5.4 Moving Average 374
12.5.5 Finding Outliers in Time Series 375
12.6 Summary and Conclusions 376
Problems 376
References 382
Index 383