PREDICTIVE SOFTWARE QUALITY ANALYSIS USING TARGETED METRICS AND MACHINE LEARNING MODELS
DOI:
https://doi.org/10.29121/shodhkosh.v5.i6.2024.5569Keywords:
Software Quality, Machine Learning, Parameter Selection, Predictive Analytics, Defect DensityAbstract [English]
Our study suggests a complete way to check the quality of software by using advanced machine learning methods on factors that were carefully chosen from large code sources. To find the factors that can best predict the future, our method focusses on important software measures such as flaw density, code churn, test coverage, cyclomatic complexity, and maintainability indices. In tests using 75 open-source projects with more than 1.2 million lines of code, we found that using selected parameters improves classification accuracy by 26% compared to models learnt on full feature sets that have not been filtered. To lower the number of dimensions, we used feature sorting and association analysis. This showed that only 20% of the original metrics have a big effect on quality forecasts, which greatly reduces overfitting and processing load. Random Forest, XGBoost, Support Vector Machines, LightGBM, and a shallow Neural Network were the five machine learning models that were tried. Random Forest had the best F1-score of 0.88, beating XGBoost by 14% and showing 92% reliability in cross-validation scenes. AUC values of 0.91 across a wide range of project areas show strong generalisability. Additionally, fine-tuning hyperparameters cut model training time by 30%. You can see that selected parameter models are better than standard methods using statistical significance tests (p < 0.01). This shows how important focused feature engineering is for getting the most accurate predictions. As shown by the 0.78 mean correlation coefficient between chosen measures and final quality scores, our research shows that focussing on a simplified parameter group not only saves computing resources but also makes things easier to understand. According to these results, real-time, data-driven quality review can be easily added to current DevOps processes, making them scalable and strong. Ensemble-based interpretability methods and real-time anomaly spotting will be studied in more detail in the future. This will pave the way for proactive quality assurance measures in software development settings that change over time.
References
… P. N.-J. of A. E. T. and, & 2024, undefined. (n.d.). Integrating AI in testing automation: Enhancing test coverage and predictive analysis for improved software quality. Researchgate.Net. Retrieved February 6, 2025, from https://www.researchgate.net/profile/Prathyusha-Nama/publication/385206970_Integrating_AI_in_testing_automation_Enhancing_test_coverage_and_predictive_analysis_for_improved_software_quality/links/671a638755a5271cded85b46/Integrating-AI-in-testing-automation-Enhancing-test-coverage-and-predictive-analysis-for-improved-software-quality.pdf
Al Dallal, J., Abdulsalam, H., AlMarzouq, M., Selamat, A., Jehad Al Dallal, B., & Selamat aselamat, A. (2024). Machine learning-based exploration of the impact of move method refactoring on object-oriented software quality attributes. Springer, 49(3), 3867–3885. https://doi.org/10.1007/s13369-023-08174-0 DOI: https://doi.org/10.1007/s13369-023-08174-0
Aleem, S., Capretz, L., arXiv:1506.07563, F. A. preprint, & 2015, undefined. (n.d.). Benchmarking machine learning technologies for software defect detection. Arxiv.Org. Retrieved February 6, 2025, from https://arxiv.org/abs/1506.07563
Al-Jamimi, H., … M. A. I. S. and, & 2013, undefined. (n.d.). Machine learning-based software quality prediction models: state of the art. Ieeexplore.Ieee.Org. Retrieved February 6, 2025, from https://ieeexplore.ieee.org/abstract/document/6579473/
Alsaeedi, A., Applications, M. K.-J. of S. E. and, & 2019, undefined. (n.d.). Software defect prediction using supervised machine learning and ensemble techniques: a comparative study. Scirp.Org. Retrieved February 6, 2025, from https://www.scirp.org/journal/paperinformation?paperid=92522
Alsolai, H., Technology, M. R.-I. and S., & 2020, undefined. (n.d.). A systematic literature review of machine learning techniques for software maintainability prediction. Elsevier. Retrieved February 6, 2025, from https://www.sciencedirect.com/science/article/pii/S0950584919302228 DOI: https://doi.org/10.1016/j.infsof.2019.106214
Amershi, S., Begel, A., Bird, C., … R. D.-… on S., & 2019, undefined. (n.d.). Software engineering for machine learning: A case study. Ieeexplore.Ieee.Org. Retrieved February 6, 2025, from https://ieeexplore.ieee.org/abstract/document/8804457/
Behera, R. K., Shukla, S., Rath, S. K., & Misra, S. (2018). Software reliability assessment using machine learning technique. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 10964 LNCS, 403–411. https://doi.org/10.1007/978-3-319-95174-4_32 DOI: https://doi.org/10.1007/978-3-319-95174-4_32
Behera, R., Shukla, S., Rath, S., … S. M. I. A., & 2018, undefined. (n.d.). Software reliability assessment using machine learning technique. Springer. Retrieved February 6, 2025, from https://link.springer.com/chapter/10.1007/978-3-319-95174-4_32 DOI: https://doi.org/10.1007/978-981-13-2348-5_5
Ceylan, E., Kutlubay, F., … A. B.-S. E. and, & 2006, undefined. (n.d.). Software defect identification using machine learning techniques. Ieeexplore.Ieee.Org. Retrieved February 6, 2025, from https://ieeexplore.ieee.org/abstract/document/1690146/
Challagulla, V., Bastani, F., … I. Y.-… on A. I., & 2008, undefined. (2005). Empirical assessment of machine learning based software defect prediction techniques. World Scientific. https://www.worldscientific.com/doi/abs/10.1142/S0218213008003947 DOI: https://doi.org/10.1109/WORDS.2005.32
Challagulla, V. U. B., Bastani, F. B., Yen, I. L., & Paul, R. A. (2008). Empirical assessment of machine learning based software defect prediction techniques. International Journal on Artificial Intelligence Tools, 17(2), 389–400. https://doi.org/10.1142/S0218213008003947 DOI: https://doi.org/10.1142/S0218213008003947
Chandra, K., Kapoor, G., … R. K.-… and challenges in cyber, & 2016, undefined. (n.d.). Improving software quality using machine learning. Ieeexplore.Ieee.Org. Retrieved February 6, 2025, from https://ieeexplore.ieee.org/abstract/document/7542340/
Chen, N., H Hoi, S. C., Xiao, X., & H, S. C. (2015). Benchmarking Machine Learning Techniques for Software Defect Detection. International Journal of Software Engineering & Applications, 6(3), 11–23. https://doi.org/10.5121/ijsea.2015.6302 DOI: https://doi.org/10.5121/ijsea.2015.6302
Chen, N., Hoi, S., … X. X.-C. on A. S., & 2011, undefined. (2011). Software process evaluation: A machine learning approach. Ieeexplore.Ieee.Org, 333–342. https://ieeexplore.ieee.org/abstract/document/6100070/ DOI: https://doi.org/10.1109/ASE.2011.6100070
Computing, R. M.-A. S., & 2015, undefined. (n.d.). A systematic review of machine learning techniques for software fault prediction. Elsevier. Retrieved February 6, 2025, from https://www.sciencedirect.com/science/article/pii/S1568494614005857
Computing, R. M.-A. S., & 2016, undefined. (2016). An empirical framework for defect prediction using machine learning techniques with Android software. Elsevier. https://doi.org/10.1016/j.asoc.2016.04.032 DOI: https://doi.org/10.1016/j.asoc.2016.04.032
Côté, P. O., Nikanjam, A., Bouchoucha, R., Basta, I., Abidi, M., & Khomh, F. (2024). Quality issues in machine learning software systems. Empirical Software Engineering, 29(6). https://doi.org/10.1007/S10664-024-10536-7
Côté, P.-O., Nikanjam, A., Bouchoucha, R., Basta, I., Abidi, M., Khomh, F., Montréal, P., & Québec, C. (2024). Quality issues in machine learning software systems. Springer. https://link.springer.com/article/10.1007/s10664-024-10536-7 DOI: https://doi.org/10.1007/s10664-024-10536-7
Durelli, V., Durelli, R., … S. B.-I. T., & 2019, undefined. (n.d.). Machine learning applied to software testing: A systematic mapping study. Ieeexplore.Ieee.Org. Retrieved February 6, 2025, from https://ieeexplore.ieee.org/abstract/document/8638573/
Goyal, S., and, P. B.-I. J. of K., & 2020, undefined. (n.d.). Comparison of machine learning techniques for software quality prediction. Igi-Global.Com. Retrieved February 6, 2025, from https://www.igi-global.com/article/comparison-of-machine-learning-techniques-for-software-quality-prediction/252885
Hammouri, A., Hammad, M., … M. A.-… computer science and, & 2018, undefined. (n.d.). Software bug prediction using machine learning approach. Researchgate.Net. Retrieved February 6, 2025, from https://www.researchgate.net/profile/Mustafa-Hammad-2/publication/323536716_Software_Bug_Prediction_using_Machine_Learning_Approach/links/5c17cdec92851c39ebf51720/Software-Bug-Prediction-using-Machine-Learning-Approach.pdf DOI: https://doi.org/10.14569/IJACSA.2018.090212
Iqbal, A., Aftab, S., Ali, U., Nawaz, Z., … L. S.-… S. and, & 2019, undefined. (n.d.). Performance analysis of machine learning techniques on software defect prediction using NASA datasets. Researchgate.Net. Retrieved February 6, 2025, from https://www.researchgate.net/profile/Shabib-Aftab-2/publication/333513059_Performance_Analysis_of_Machine_Learning_Techniques_on_Software_Defect_Prediction_using_NASA_Datasets/links/5d04e2e5299bf12e7be0c614/Performance-Analysis-of-Machine-Learning-Techniques-on-Software-Defect-Prediction-using-NASA-Datasets.pdf
Jayaraman, P., Nagarajan, K., … P. P.-I. journal of, & 2024, undefined. (n.d.). Critical review on water quality analysis using IoT and machine learning models. Elsevier. Retrieved February 6, 2025, from https://www.sciencedirect.com/science/article/pii/S2667096823000563
Khan, M., Practice, A. M.-E. A. T. and, & 2024, undefined. (n.d.). Predictive Analytics And Machine Learning For Real-Time Detection Of Software Defects And Agile Test Management. Kuey.Net. Retrieved February 6, 2025, from https://kuey.net/menuscript/index.php/kuey/article/view/1608
Lal, H., … G. P.-C. on I. S. and, & 2017, undefined. (n.d.). Code review analysis of software system using machine learning techniques. Ieeexplore.Ieee.Org. Retrieved February 6, 2025, from https://ieeexplore.ieee.org/abstract/document/7855962/
Lenz, A., Pozo, A., Intelligence, S. V.-A. of A., & 2013, undefined. (n.d.). Linking software testing results with a machine learning approach. Elsevier. Retrieved February 6, 2025, from https://www.sciencedirect.com/science/article/pii/S0952197613000183
Li, K., Zhu, A., Zhao, P., Song, J., arXiv:2404.13630, J. L. preprint, & 2024, undefined. (2024). Utilizing deep learning to optimize software development processes. Arxiv.Org, 1(1). https://doi.org/10.5281/zenodo.11084103
Li, K., Zhu, A., Zhao, P., Song, J., & Liu, J. (2024). Utilizing Deep Learning to Optimize Software Development Processes. https://doi.org/10.5281/zenodo.11004006
Liang, P., Wu, Y., Xu, Z., … S. X.-J. of T. and, & 2024, undefined. (n.d.). Enhancing Security in DevOps by Integrating Artificial Intelligence and Machine Learning. Centuryscipub.Com. Retrieved February 6, 2025, from https://centuryscipub.com/index.php/jtpes/article/view/492
Linares-Vásquez, M., McMillan, C., … D. P.-E. S., & 2014, undefined. (n.d.). On using machine learning to automatically classify software applications into domain categories. Springer. Retrieved February 6, 2025, from https://link.springer.com/article/10.1007/s10664-012-9230-z
Linares-Vásquez, M., McMillan, C., Poshyvanyk, D., & Grechanik, M. (2014). On using machine learning to automatically classify software applications into domain categories. Empirical Software Engineering, 19(3), 582–618. https://doi.org/10.1007/S10664-012-9230-Z DOI: https://doi.org/10.1007/s10664-012-9230-z
Malhotra, R., Systems, A. J.-J. of I. P., & 2012, undefined. (n.d.). Fault prediction using statistical and machine learning methods for improving software quality. Academia.Edu. Retrieved February 6, 2025, from https://www.academia.edu/download/67675194/E1JBB0_2012_v8n2_241.pdf DOI: https://doi.org/10.3745/JIPS.2012.8.2.241
Masuda, S., Ono, K., … T. Y.-… on software testing, & 2018, undefined. (n.d.). A survey of software quality for machine learning applications. Ieeexplore.Ieee.Org. https://doi.org/10.1109/ICSTW.2018.00061 DOI: https://doi.org/10.1109/ICSTW.2018.00061
Mehdi Morovati, M., Nikanjam, A., Tambon, F., Khomh, F., & Ming Jiang, Z. (2023). Bug characterization in machine learning-based systems. Springer. https://link.springer.com/article/10.1007/s10664-023-10400-0
Morovati, M. M., Nikanjam, A., Tambon, F., Khomh, F., & Jiang, Z. M. (Jack). (2024). Bug characterization in machine learning-based systems. Empirical Software Engineering, 29(1). https://doi.org/10.1007/S10664-023-10400-0 DOI: https://doi.org/10.1007/s10664-023-10400-0
Ovy, N., Pochu, S., Multidisciplinary, S. E.-J. of, & 2023, undefined. (n.d.). Leveraging Machine Learning for Accurate Defect Prediction in Software QA. Researchgate.Net. Retrieved February 6, 2025, from https://www.researchgate.net/profile/Sandeep-Pochu/publication/388497619_Leveraging_Machine_Learning_for_Accurate_Defect_Prediction_in_Software_QA/links/679b1403207c0c20fa67a2d9/Leveraging-Machine-Learning-for-Accurate-Defect-Prediction-in-Software-QA.pdf
Pandey, N., Debarshi, ·, Sanyal, K., Hudait, A., Sen, · Amitava, Debarshi, B., & Sen, A. (2017). Automated classification of software issue reports using machine learning techniques: an empirical study. Springer, 13(4), 279–297. https://doi.org/10.1007/s11334-017-0294-1 DOI: https://doi.org/10.1007/s11334-017-0294-1
Pandey, S., Mishra, R., Applications, A. T.-E. S. with, & 2021, undefined. (n.d.). Machine learning based methods for software fault prediction: A survey. Elsevier. Retrieved February 6, 2025, from https://www.sciencedirect.com/science/article/pii/S0957417421000361
Paramshetti, P., and, D. P.-I. J. of S., & 2014, undefined. (n.d.). Survey on software defect prediction using machine learning techniques. Academia.Edu. Retrieved February 6, 2025, from https://www.academia.edu/download/77471920/U1VCMTQ3MjM_.pdf
Parra, E., Dimou, C., Llorens, J., … V. M.-I. and S., & 2015, undefined. (n.d.). A methodology for the classification of quality of requirements using machine learning techniques. Elsevier. Retrieved February 6, 2025, from https://www.sciencedirect.com/science/article/pii/S0950584915001299
Prasad, M., Florence, L., … A. A.-J. of D. T. and, & 2015, undefined. (n.d.). A study on software metrics based software defect prediction using data mining and machine learning techniques. Academia.Edu. Retrieved February 6, 2025, from https://www.academia.edu/download/67435591/3caa3fe1a954efd1ef8096048701f0257b6b.pdf
Rashid, E., Patnayak, S., & Bhattacherjee, V. (2012). A survey in the area of machine learning and its application for software quality prediction. ACM SIGSOFT Software Engineering Notes, 37(5), 1–7. https://doi.org/10.1145/2347696.2347709 DOI: https://doi.org/10.1145/2347696.2347709
Rashid, E., Patnayak, S., Software, V. B.-A. S., & 2012, undefined. (n.d.). A survey in the area of machine learning and its application for software quality prediction. Dl.Acm.Org. Retrieved February 6, 2025, from https://dl.acm.org/doi/abs/10.1145/2347696.2347709
Setia, S., Ravulakollu, K., … K. V.-… on C. for, & 2024, undefined. (n.d.). Software Defect Prediction using Machine Learning. Ieeexplore.Ieee.Org. Retrieved February 6, 2025, from https://ieeexplore.ieee.org/abstract/document/10498707/
Sharma, T., Kechagia, M., Georgiou, S., Software, R. T.-… of S. and, & 2024, undefined. (n.d.). A survey on machine learning techniques applied to source code. Elsevier. Retrieved February 6, 2025, from https://www.sciencedirect.com/science/article/pii/S0164121223003291
Singh, P., cloud, A. C.-2017 7th international conference on, & 2017, undefined. (n.d.). Software defect prediction analysis using machine learning algorithms. Ieeexplore.Ieee.Org. Retrieved February 6, 2025, from https://ieeexplore.ieee.org/abstract/document/7943255/
Software, I. G.-J. of S. and, & 2008, undefined. (n.d.). Applying machine learning to software fault-proneness prediction. Elsevier. Retrieved February 6, 2025, from https://www.sciencedirect.com/science/article/pii/S0164121207001240
Software, L. B.-I. C. on Q., & 2008, undefined. (n.d.). Novel applications of machine learning in software testing. Ieeexplore.Ieee.Org. Retrieved February 6, 2025, from https://ieeexplore.ieee.org/abstract/document/4601522/
Tran, H., Le, S., Nguyen, S., Science, P. H.-S. C., & 2020, undefined. (n.d.). An analysis of software bug reports using machine learning techniques. Springer. Retrieved February 6, 2025, from https://link.springer.com/article/10.1007/s42979-019-0004-1
Tran, H. M., Le, S. T., Nguyen, S. Van, & Ho, P. T. (2020). An Analysis of Software Bug Reports Using Machine Learning Techniques. SN Computer Science, 1(1). https://doi.org/10.1007/S42979-019-0004-1 DOI: https://doi.org/10.1007/s42979-019-0004-1
Wang, J., Huang, Y., Chen, C., Liu, Z., Wang, S., & Wang, Q. (n.d.). Software testing with large language models: Survey, landscape, and vision. Ieeexplore.Ieee.Org. Retrieved February 6, 2025, from https://ieeexplore.ieee.org/abstract/document/10440574/
Zhang, D., Journal, J. T.-S. Q., & 2003, undefined. (n.d.). Machine learning and software engineering. Springer. Retrieved February 6, 2025, from https://link.springer.com/article/10.1023/A:1023760326768
Zhang, D., & Tsai, J. J. P. (2003). Machine learning and software engineering. Software Quality Journal, 11(2), 87–119. https://doi.org/10.1023/A:1023760326768 DOI: https://doi.org/10.1023/A:1023760326768
Zhong, S., Khoshgoftaar, T., HASE, N. S.-, & 2004, undefined. (n.d.). Unsupervised learning for expert-based software quality estimation. Citeseer. Retrieved February 6, 2025, from https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=bcf39cc6aeaba6489e10042bbb38cdd49110f984
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Rakhi Singh, Mamta Bansal

This work is licensed under a Creative Commons Attribution 4.0 International License.
With the licence CC-BY, authors retain the copyright, allowing anyone to download, reuse, re-print, modify, distribute, and/or copy their contribution. The work must be properly attributed to its author.
It is not necessary to ask for further permission from the author or journal board.
This journal provides immediate open access to its content on the principle that making research freely available to the public supports a greater global exchange of knowledge.
 
							 
			
		 
			 
			 
				













 
  
  
  
  
 