In today's data-driven world, most real-world applications face the challenge of class imbalance, where critical data is rare. This imbalance severely affects the accuracy of classification models, especially in sensitive domains like medical diagnosis, finance, and software defect prediction. In software systems, early defect prediction is essential for reducing cost and improving reliability. However, many machine learning models fail due to skewed datasets. To address this, we propose three novel algorithms: IDROS (oversampling), IDRUS (undersampling), and a hybrid approach IDROSUS. IDROS uses KNN around the minority centroid to generate synthetic data, while IDRUS removes less relevant majority samples based on distance from the mean. The hybrid IDROSUS balances both sides simultaneously, reducing overfitting and underfitting. We evaluated these methods using 40 datasets from the PROMISE repository and tested them across eight classifiers. Performance metrics like accuracy, recall, precision, and F-measure showed that IDROSUS outperformed existing techniques.
Bitte wählen Sie Ihr Anliegen aus.
Rechnungen
Retourenschein anfordern
Bestellstatus
Storno