Ensemble Analysis of the Students Length of Study at University of Klabat Manado Indonesia
Abstract
The purpose of this study is to classify the student's length of study based on the status of graduating on time or not on time based on several independent variables observed, namely gender, Grade Point Average (GPA), place of residence, type of parents occupation and school origin. The statistics used in this study is non-parametric statistics with a classification analysis method. The classification analysis is to find a training set model of the training set that distinguishes records into appropriate categories or classes. The method used is classification using ensemble techniques. The basic principle of the ensemble method is to develop a set of models from training data and combine a set of models to determine the final classification. The final classification is based on the largest collection of votes from a combination of a set of models. To get the best combination of models, the ensemble method enables the use of several different classification models. The ensemble method used in this study is Bagging and Boosting.
Keywords: Ensemble Analysis, Classification, Bagging, Boosting, Students Length of Study, Indonesia.
References
Breiman L. (1984) Classification and Regression Trees, First. Florida: CRC Press.
Breiman L. (1996). Bagging Predictors, Mach. Learn., vol. 24, pp. 123–140.
Han J. (2006). Data Mining: Concepts and Techniques, 12th ed. San Fransisco: Morgan Kauffman.
Jiawei Han J.P., Micheline Kamber (2016) Data mining: Concepts and techniques Transl. Tolkien doi: 10.3726/978-3-653-01058-9/2.
Larose D.T. (2006). Data Mining Methods and Models, 1st. ed. Hoboken, New Jersey: John Wiley & Sons, Inc.
Lewis R. (2000). An introduction to classification and regression tree (CART) analysis,” vol. 14.
Machová, K. (2006). A Bagging Method using Decision Trees in the Role of Base Classifiers. Acta Polytech. Hungarica, vol. 3, no. 2, pp. 121–132, 2006.
Maimon O. and L. Rokach. (2018). Data Mining and Knowledge Discovery Handbook, 2nd ed, no. January 2010.
Max B. (2007). Principles of Data Mining. London: Springer.
Powers D. (2011). Evaluation: From Precision, Recall And F-Measure To ROC, Informedness, Markedness & Correlation, vol. 2, no. 1, pp. 37–63.
Pristyanto I. (2017). Hybrid Resampling to Handle Imbalanced Class on Classification of Student Performance in Classroom, Informatics Comput. Sci. (ICICoS 2017), vol. 3, no. ICICoS, pp. 215–220.
Pristyanto I. (2018). Data level approach for imbalanced class handling on educational data mining multiclass classification, Int. Conf. Inf. Commun. Technol. ICOIACT, pp. 310–314.
Tibshirani R. (2008). The Elements of Statistical Learning, 2nd ed. New York: Springer.
Timofeev R. (2004). Classification an Regresion Trees Theory and Application. Humboldt University.
Toms C. (2014). Exploration of classification using NBTree for predicting students performance, Int. Conf. Data Softw. Eng., pp. 1–5.
Vanwezel R. (2005). Improved Customer Choice Predictions using Ensemble Methods, Eur. J. Oper. Res., vol. 18, no. 1, pp. 1–29.
Vedayoko L.G. (2008). Expert System Diagnosis of Bowel Disease Using Case-Based Reasoning with Nearest Neighbor Algorithm, Sci. J. Informatics, vol. 4, no. 2, pp. 134–142, 2017, doi: 10.15294/sji.v4i2.11770.
Weiss B. (2017.) Cost-Sensitive Learning vs. Sampling: Which is Best for Handling Unbalanced Classes with Unequal Error Costs,Proc. First Int. Conf. Adv. Data Inf. Eng., vol. 6, no. 3, pp. 13–22.
Witten I., Frank E., and Hall M. (2011). Data mining 2nd, vol. 54, no. 2.
Zhou Z.H. (2012). Ensemble Msethods Foundations and Algorithms. New York: CRC Press.

This work is licensed under a Creative Commons Attribution 4.0 International License.