Data science is becoming more important for software engineering problems. Software defect prediction is a critical area which can help the development team allocate test resource efficiently and better understand the root cause of defects. Furthermore, it can help find the reason why a component or even a project is failure-prone. This paper deals with binary classification in predicting if a software component has a bug by using three widely used machine learning algorithms: Random Forest (RF), Neural Networks (NN), and Support Vector Machine (SVM). The paper investigates the applications of these algorithms to the challenging issue of predicting defects in software components. This paper combines code metrics and process metrics as indicators for the Eclipse environment using the aforementioned three algorithms for a sample of weekly Eclipse features. Feature reduction is also adopted using General Linear Model (GLM) to save computational time. The results confirm the predictive capabilities of using two features - NBD-max and Pre-defects - and are comparable to the results of using all 61 features. Additionally, this paper evaluates the performance of the three algorithms. NN and RF turn out to have the best fit.

Additional Metadata
Keywords Data analysis, Eclipse, Machine learning techniques, Software Defect Prediction
Persistent URL dx.doi.org/10.1109/BigMM.2016.36
Conference 2nd IEEE International Conference on Multimedia Big Data, BigMM 2016
Citation
Han, W. (Wenjing), Lung, C.H, & Ajila, S. (2016). Empirical investigation of code and process metrics for defect prediction. In Proceedings - 2016 IEEE 2nd International Conference on Multimedia Big Data, BigMM 2016 (pp. 436–439). doi:10.1109/BigMM.2016.36