BAGGING BASED ENSEMBLE ANALYSIS IN HANDLING UNBALANCED DATA ON CLASSIFICATION MODELING
Abstract
The purpose of this study is to Identify the algorithm of each method of handling the unbalanced class based on bagging based on the literature review. This study uses a bagging based ensemble method such as UnderBagging, OverBagging, UnderOverBagging, SMOTEBagging, Roughly Balanced Bagging and the last one is the Bagging Ensemble Variation. The data used is coded from the UCI Repository with 16 data, eight of which have class categories with low imbalance problems, and the rest are categorized as high imbalance problems. The number of classes used in this study amounted to two classes. The class with a small number is made into the minority class and the rest is made up as the majority class. The result of this research is the bagging based method gives better results when compared to classical methods such as the classification tree.
References
Barro A. F., R. Sulviant IDi, “The application of synthetic minority oversampling technique (Smote) to unbalanced data in making herbal composition models,” J. Stat., vol. 1, no. 1–6, 2013.
Bauer E., “An empirical comparison of voting classification algorithms: bagging, boosting and variants.,” vol. 36, pp. 15–139, 1999.
Bisri A, “Adaboost application to resolve class imbalances in determining student graduation using the decision tree method,” J. Intell. Syst., vol. 1, pp. 27–32, 2015.
Breiman L., “Bagging predictors machine learning,” vol. 24, pp. 123–140, 1996.
Chawla K. N.V, Japkowicz N. “Special issue on learning from imbalanced data sets.,” SIGKDD Explor. Newsl., vol. 6, pp. 1–6, 2004.
Efron T. R. B. An introduction to the bootstrap. New York: Chapman & Hall, 1993.
Elrahman A. A. SM, “A review of class imbalance problem,” J. Netw. Innov. Comput., vol. 1, pp. 332-340., 2013.
Freund Y., “Classifying imbalanced data using a bagging ensemble variation (BEV).,” ACM Southeast Conf., pp. 203–208, 2007.
Galar H.F. M., Fernandez A, Barrenechea E, Bustince H, “A review on ensembles for the class imbalance problem: bagging.boosting and hybrid- based approaches.,” IEEE Trans. Syst., vol. 42, pp. 463 – 484, 2012.
Gónzalez H. F. S, García S, Lázaro M, Vidal ARF, “Class switching according to nearest enemy distance for learning from highly imbalanced data- sets. Patern Recognition.,” vol. 70, pp. 12–24, 2017.
Hido T.Y., Kashima H, “Roughly balanced bagging for imbalanced data. Stat.,” Stat. Anal. Data Min., vol. 2, pp. 412–426, 2009.
Japkowicz N., “Handling the class imbalance problem in binary classification,” Masdar Inst. Sci. Technol., 2014.
Longadge M.L. R, Dongre SS, “Class imbalance problem in data mining: Review.,” Int. J. Comput. Sci. Netw. (IJCSN)., vol. 2, pp. 83–88, 2013.
Lopez H. F., V. Fernandez A, Garcia S, Palade V, “An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information Sciences.,” Inf. Sci., vol. 250, pp. 113–141, 2013.
Park Y, “Ensembles of α-trees for imbalanced classification problems,” J. Latex Cl. Files., vol. 6, pp. 1–14, 2007.
Permatasari Y., “Handling unbalanced class problems with rusboost and underbagging (Case Study: Drop Out Students of SPs IPB Masters Program,” 2016.
Ralescu A. A. “Predicting software aging related bugs from imbalanced datasets by using data mining techniques,” IOSR J. Comput. Eng., vol. 18, pp. 27–35, 2016.
Ramyachitra M.P. “Imbalanced datasets classification and solutions,” Int. J. Comput. Bus. Res., vol. 5, pp. 1–29, 2014.
Rodrigo. L., “Ensemble-based classifiers. Artif. Intell,” vol. 33, pp. 1–39, 2010.
Sartono S.U. B. “Combined tree method: the preferred solution to overcome the weaknesses of single regression and classification trees,” vol. 15, pp. 1–7, 2010.
Schouts R., “An overview of the advantages of ensemble classification trees to improve the predictive ability of single classification trees.,” vol. 9, no. 33–38, 2015.
Sun Z, Song Q, Zhu X, Sun H, Xu B, “A novel ensemble method for classifying imbalanced data. Pattern Recognition,” vol. 48, pp. 1623–1637, 2015.
Wang Y. X. S, “Diversity analysis on imbalanced data sets by using ensemble models,” IEEE Symp. Comput. Intell. Data Min., vol. 324–331, 2009.
Yusof R, Kasmiran KA, Mustapha A, Mustapha N, “Techniques for handling imbalanced datasets when producing classifier models,” J. Theor. Appl. Inf. Technol., vol. 95, pp. 1425–1440, 2017.
Zhang J. H. D, Liu W, Gong X, “A novel improved smote resampling algorithm based on fractal. Article Computational Information Systems,” Artic. Comput. Inf. Syst., pp. 2204–2211, 2011.
Zhu B., B. Baesens B, Seppe K.L.M, “An empirical comparison of techniques for the class imbalance problem in churn prediction,” Inf. Sci., vol. 408, pp. 84–99, 2017.

This work is licensed under a Creative Commons Attribution 4.0 International License.