PREDICTIVE SOFTWARE QUALITY ANALYSIS USING TARGETED METRICS AND MACHINE LEARNING MODELS

Authors

  • Rakhi Singh Department of Computer Science& Engineering, Shobhit Institute of Engineering & Technology (Deemed-to-be University), Meerut, India, Department of IT, Delhi Institute of Higher Education, Greater Noida West, India
  • Mamta Bansal Department of Computer Science& Engineering, Shobhit Institute of Engineering & Technology (Deemed-to-be University), Meerut, India

DOI:

https://doi.org/10.29121/shodhkosh.v5.i6.2024.5569

Keywords:

Software Quality, Machine Learning, Parameter Selection, Predictive Analytics, Defect Density

Abstract [English]

Our study suggests a complete way to check the quality of software by using advanced machine learning methods on factors that were carefully chosen from large code sources. To find the factors that can best predict the future, our method focusses on important software measures such as flaw density, code churn, test coverage, cyclomatic complexity, and maintainability indices. In tests using 75 open-source projects with more than 1.2 million lines of code, we found that using selected parameters improves classification accuracy by 26% compared to models learnt on full feature sets that have not been filtered. To lower the number of dimensions, we used feature sorting and association analysis. This showed that only 20% of the original metrics have a big effect on quality forecasts, which greatly reduces overfitting and processing load. Random Forest, XGBoost, Support Vector Machines, LightGBM, and a shallow Neural Network were the five machine learning models that were tried. Random Forest had the best F1-score of 0.88, beating XGBoost by 14% and showing 92% reliability in cross-validation scenes. AUC values of 0.91 across a wide range of project areas show strong generalisability. Additionally, fine-tuning hyperparameters cut model training time by 30%. You can see that selected parameter models are better than standard methods using statistical significance tests (p < 0.01). This shows how important focused feature engineering is for getting the most accurate predictions. As shown by the 0.78 mean correlation coefficient between chosen measures and final quality scores, our research shows that focussing on a simplified parameter group not only saves computing resources but also makes things easier to understand. According to these results, real-time, data-driven quality review can be easily added to current DevOps processes, making them scalable and strong. Ensemble-based interpretability methods and real-time anomaly spotting will be studied in more detail in the future. This will pave the way for proactive quality assurance measures in software development settings that change over time.

References

… P. N.-J. of A. E. T. and, & 2024, undefined. (n.d.). Integrating AI in testing automation: Enhancing test coverage and predictive analysis for improved software quality. Researchgate.Net. Retrieved February 6, 2025, from https://www.researchgate.net/profile/Prathyusha-Nama/publication/385206970_Integrating_AI_in_testing_automation_Enhancing_test_coverage_and_predictive_analysis_for_improved_software_quality/links/671a638755a5271cded85b46/Integrating-AI-in-testing-automation-Enhancing-test-coverage-and-predictive-analysis-for-improved-software-quality.pdf

Al Dallal, J., Abdulsalam, H., AlMarzouq, M., Selamat, A., Jehad Al Dallal, B., & Selamat aselamat, A. (2024). Machine learning-based exploration of the impact of move method refactoring on object-oriented software quality attributes. Springer, 49(3), 3867–3885. https://doi.org/10.1007/s13369-023-08174-0 DOI: https://doi.org/10.1007/s13369-023-08174-0

Aleem, S., Capretz, L., arXiv:1506.07563, F. A. preprint, & 2015, undefined. (n.d.). Benchmarking machine learning technologies for software defect detection. Arxiv.Org. Retrieved February 6, 2025, from https://arxiv.org/abs/1506.07563

Al-Jamimi, H., … M. A. I. S. and, & 2013, undefined. (n.d.). Machine learning-based software quality prediction models: state of the art. Ieeexplore.Ieee.Org. Retrieved February 6, 2025, from https://ieeexplore.ieee.org/abstract/document/6579473/

Alsaeedi, A., Applications, M. K.-J. of S. E. and, & 2019, undefined. (n.d.). Software defect prediction using supervised machine learning and ensemble techniques: a comparative study. Scirp.Org. Retrieved February 6, 2025, from https://www.scirp.org/journal/paperinformation?paperid=92522

Alsolai, H., Technology, M. R.-I. and S., & 2020, undefined. (n.d.). A systematic literature review of machine learning techniques for software maintainability prediction. Elsevier. Retrieved February 6, 2025, from https://www.sciencedirect.com/science/article/pii/S0950584919302228 DOI: https://doi.org/10.1016/j.infsof.2019.106214

Amershi, S., Begel, A., Bird, C., … R. D.-… on S., & 2019, undefined. (n.d.). Software engineering for machine learning: A case study. Ieeexplore.Ieee.Org. Retrieved February 6, 2025, from https://ieeexplore.ieee.org/abstract/document/8804457/

Behera, R. K., Shukla, S., Rath, S. K., & Misra, S. (2018). Software reliability assessment using machine learning technique. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 10964 LNCS, 403–411. https://doi.org/10.1007/978-3-319-95174-4_32 DOI: https://doi.org/10.1007/978-3-319-95174-4_32

Behera, R., Shukla, S., Rath, S., … S. M. I. A., & 2018, undefined. (n.d.). Software reliability assessment using machine learning technique. Springer. Retrieved February 6, 2025, from https://link.springer.com/chapter/10.1007/978-3-319-95174-4_32 DOI: https://doi.org/10.1007/978-981-13-2348-5_5

Ceylan, E., Kutlubay, F., … A. B.-S. E. and, & 2006, undefined. (n.d.). Software defect identification using machine learning techniques. Ieeexplore.Ieee.Org. Retrieved February 6, 2025, from https://ieeexplore.ieee.org/abstract/document/1690146/

Challagulla, V., Bastani, F., … I. Y.-… on A. I., & 2008, undefined. (2005). Empirical assessment of machine learning based software defect prediction techniques. World Scientific. https://www.worldscientific.com/doi/abs/10.1142/S0218213008003947 DOI: https://doi.org/10.1109/WORDS.2005.32

Challagulla, V. U. B., Bastani, F. B., Yen, I. L., & Paul, R. A. (2008). Empirical assessment of machine learning based software defect prediction techniques. International Journal on Artificial Intelligence Tools, 17(2), 389–400. https://doi.org/10.1142/S0218213008003947 DOI: https://doi.org/10.1142/S0218213008003947

Chandra, K., Kapoor, G., … R. K.-… and challenges in cyber, & 2016, undefined. (n.d.). Improving software quality using machine learning. Ieeexplore.Ieee.Org. Retrieved February 6, 2025, from https://ieeexplore.ieee.org/abstract/document/7542340/

Chen, N., H Hoi, S. C., Xiao, X., & H, S. C. (2015). Benchmarking Machine Learning Techniques for Software Defect Detection. International Journal of Software Engineering & Applications, 6(3), 11–23. https://doi.org/10.5121/ijsea.2015.6302 DOI: https://doi.org/10.5121/ijsea.2015.6302

Chen, N., Hoi, S., … X. X.-C. on A. S., & 2011, undefined. (2011). Software process evaluation: A machine learning approach. Ieeexplore.Ieee.Org, 333–342. https://ieeexplore.ieee.org/abstract/document/6100070/ DOI: https://doi.org/10.1109/ASE.2011.6100070

Computing, R. M.-A. S., & 2015, undefined. (n.d.). A systematic review of machine learning techniques for software fault prediction. Elsevier. Retrieved February 6, 2025, from https://www.sciencedirect.com/science/article/pii/S1568494614005857

Computing, R. M.-A. S., & 2016, undefined. (2016). An empirical framework for defect prediction using machine learning techniques with Android software. Elsevier. https://doi.org/10.1016/j.asoc.2016.04.032 DOI: https://doi.org/10.1016/j.asoc.2016.04.032

Côté, P. O., Nikanjam, A., Bouchoucha, R., Basta, I., Abidi, M., & Khomh, F. (2024). Quality issues in machine learning software systems. Empirical Software Engineering, 29(6). https://doi.org/10.1007/S10664-024-10536-7

Côté, P.-O., Nikanjam, A., Bouchoucha, R., Basta, I., Abidi, M., Khomh, F., Montréal, P., & Québec, C. (2024). Quality issues in machine learning software systems. Springer. https://link.springer.com/article/10.1007/s10664-024-10536-7 DOI: https://doi.org/10.1007/s10664-024-10536-7

Durelli, V., Durelli, R., … S. B.-I. T., & 2019, undefined. (n.d.). Machine learning applied to software testing: A systematic mapping study. Ieeexplore.Ieee.Org. Retrieved February 6, 2025, from https://ieeexplore.ieee.org/abstract/document/8638573/

Goyal, S., and, P. B.-I. J. of K., & 2020, undefined. (n.d.). Comparison of machine learning techniques for software quality prediction. Igi-Global.Com. Retrieved February 6, 2025, from https://www.igi-global.com/article/comparison-of-machine-learning-techniques-for-software-quality-prediction/252885

Hammouri, A., Hammad, M., … M. A.-… computer science and, & 2018, undefined. (n.d.). Software bug prediction using machine learning approach. Researchgate.Net. Retrieved February 6, 2025, from https://www.researchgate.net/profile/Mustafa-Hammad-2/publication/323536716_Software_Bug_Prediction_using_Machine_Learning_Approach/links/5c17cdec92851c39ebf51720/Software-Bug-Prediction-using-Machine-Learning-Approach.pdf DOI: https://doi.org/10.14569/IJACSA.2018.090212

Iqbal, A., Aftab, S., Ali, U., Nawaz, Z., … L. S.-… S. and, & 2019, undefined. (n.d.). Performance analysis of machine learning techniques on software defect prediction using NASA datasets. Researchgate.Net. Retrieved February 6, 2025, from https://www.researchgate.net/profile/Shabib-Aftab-2/publication/333513059_Performance_Analysis_of_Machine_Learning_Techniques_on_Software_Defect_Prediction_using_NASA_Datasets/links/5d04e2e5299bf12e7be0c614/Performance-Analysis-of-Machine-Learning-Techniques-on-Software-Defect-Prediction-using-NASA-Datasets.pdf

Jayaraman, P., Nagarajan, K., … P. P.-I. journal of, & 2024, undefined. (n.d.). Critical review on water quality analysis using IoT and machine learning models. Elsevier. Retrieved February 6, 2025, from https://www.sciencedirect.com/science/article/pii/S2667096823000563

Khan, M., Practice, A. M.-E. A. T. and, & 2024, undefined. (n.d.). Predictive Analytics And Machine Learning For Real-Time Detection Of Software Defects And Agile Test Management. Kuey.Net. Retrieved February 6, 2025, from https://kuey.net/menuscript/index.php/kuey/article/view/1608

Lal, H., … G. P.-C. on I. S. and, & 2017, undefined. (n.d.). Code review analysis of software system using machine learning techniques. Ieeexplore.Ieee.Org. Retrieved February 6, 2025, from https://ieeexplore.ieee.org/abstract/document/7855962/

Lenz, A., Pozo, A., Intelligence, S. V.-A. of A., & 2013, undefined. (n.d.). Linking software testing results with a machine learning approach. Elsevier. Retrieved February 6, 2025, from https://www.sciencedirect.com/science/article/pii/S0952197613000183

Li, K., Zhu, A., Zhao, P., Song, J., arXiv:2404.13630, J. L. preprint, & 2024, undefined. (2024). Utilizing deep learning to optimize software development processes. Arxiv.Org, 1(1). https://doi.org/10.5281/zenodo.11084103

Li, K., Zhu, A., Zhao, P., Song, J., & Liu, J. (2024). Utilizing Deep Learning to Optimize Software Development Processes. https://doi.org/10.5281/zenodo.11004006

Liang, P., Wu, Y., Xu, Z., … S. X.-J. of T. and, & 2024, undefined. (n.d.). Enhancing Security in DevOps by Integrating Artificial Intelligence and Machine Learning. Centuryscipub.Com. Retrieved February 6, 2025, from https://centuryscipub.com/index.php/jtpes/article/view/492

Linares-Vásquez, M., McMillan, C., … D. P.-E. S., & 2014, undefined. (n.d.). On using machine learning to automatically classify software applications into domain categories. Springer. Retrieved February 6, 2025, from https://link.springer.com/article/10.1007/s10664-012-9230-z

Linares-Vásquez, M., McMillan, C., Poshyvanyk, D., & Grechanik, M. (2014). On using machine learning to automatically classify software applications into domain categories. Empirical Software Engineering, 19(3), 582–618. https://doi.org/10.1007/S10664-012-9230-Z DOI: https://doi.org/10.1007/s10664-012-9230-z

Malhotra, R., Systems, A. J.-J. of I. P., & 2012, undefined. (n.d.). Fault prediction using statistical and machine learning methods for improving software quality. Academia.Edu. Retrieved February 6, 2025, from https://www.academia.edu/download/67675194/E1JBB0_2012_v8n2_241.pdf DOI: https://doi.org/10.3745/JIPS.2012.8.2.241

Masuda, S., Ono, K., … T. Y.-… on software testing, & 2018, undefined. (n.d.). A survey of software quality for machine learning applications. Ieeexplore.Ieee.Org. https://doi.org/10.1109/ICSTW.2018.00061 DOI: https://doi.org/10.1109/ICSTW.2018.00061

Mehdi Morovati, M., Nikanjam, A., Tambon, F., Khomh, F., & Ming Jiang, Z. (2023). Bug characterization in machine learning-based systems. Springer. https://link.springer.com/article/10.1007/s10664-023-10400-0

Morovati, M. M., Nikanjam, A., Tambon, F., Khomh, F., & Jiang, Z. M. (Jack). (2024). Bug characterization in machine learning-based systems. Empirical Software Engineering, 29(1). https://doi.org/10.1007/S10664-023-10400-0 DOI: https://doi.org/10.1007/s10664-023-10400-0

Ovy, N., Pochu, S., Multidisciplinary, S. E.-J. of, & 2023, undefined. (n.d.). Leveraging Machine Learning for Accurate Defect Prediction in Software QA. Researchgate.Net. Retrieved February 6, 2025, from https://www.researchgate.net/profile/Sandeep-Pochu/publication/388497619_Leveraging_Machine_Learning_for_Accurate_Defect_Prediction_in_Software_QA/links/679b1403207c0c20fa67a2d9/Leveraging-Machine-Learning-for-Accurate-Defect-Prediction-in-Software-QA.pdf

Pandey, N., Debarshi, ·, Sanyal, K., Hudait, A., Sen, · Amitava, Debarshi, B., & Sen, A. (2017). Automated classification of software issue reports using machine learning techniques: an empirical study. Springer, 13(4), 279–297. https://doi.org/10.1007/s11334-017-0294-1 DOI: https://doi.org/10.1007/s11334-017-0294-1

Pandey, S., Mishra, R., Applications, A. T.-E. S. with, & 2021, undefined. (n.d.). Machine learning based methods for software fault prediction: A survey. Elsevier. Retrieved February 6, 2025, from https://www.sciencedirect.com/science/article/pii/S0957417421000361

Paramshetti, P., and, D. P.-I. J. of S., & 2014, undefined. (n.d.). Survey on software defect prediction using machine learning techniques. Academia.Edu. Retrieved February 6, 2025, from https://www.academia.edu/download/77471920/U1VCMTQ3MjM_.pdf

Parra, E., Dimou, C., Llorens, J., … V. M.-I. and S., & 2015, undefined. (n.d.). A methodology for the classification of quality of requirements using machine learning techniques. Elsevier. Retrieved February 6, 2025, from https://www.sciencedirect.com/science/article/pii/S0950584915001299

Prasad, M., Florence, L., … A. A.-J. of D. T. and, & 2015, undefined. (n.d.). A study on software metrics based software defect prediction using data mining and machine learning techniques. Academia.Edu. Retrieved February 6, 2025, from https://www.academia.edu/download/67435591/3caa3fe1a954efd1ef8096048701f0257b6b.pdf

Rashid, E., Patnayak, S., & Bhattacherjee, V. (2012). A survey in the area of machine learning and its application for software quality prediction. ACM SIGSOFT Software Engineering Notes, 37(5), 1–7. https://doi.org/10.1145/2347696.2347709 DOI: https://doi.org/10.1145/2347696.2347709

Rashid, E., Patnayak, S., Software, V. B.-A. S., & 2012, undefined. (n.d.). A survey in the area of machine learning and its application for software quality prediction. Dl.Acm.Org. Retrieved February 6, 2025, from https://dl.acm.org/doi/abs/10.1145/2347696.2347709

Setia, S., Ravulakollu, K., … K. V.-… on C. for, & 2024, undefined. (n.d.). Software Defect Prediction using Machine Learning. Ieeexplore.Ieee.Org. Retrieved February 6, 2025, from https://ieeexplore.ieee.org/abstract/document/10498707/

Sharma, T., Kechagia, M., Georgiou, S., Software, R. T.-… of S. and, & 2024, undefined. (n.d.). A survey on machine learning techniques applied to source code. Elsevier. Retrieved February 6, 2025, from https://www.sciencedirect.com/science/article/pii/S0164121223003291

Singh, P., cloud, A. C.-2017 7th international conference on, & 2017, undefined. (n.d.). Software defect prediction analysis using machine learning algorithms. Ieeexplore.Ieee.Org. Retrieved February 6, 2025, from https://ieeexplore.ieee.org/abstract/document/7943255/

Software, I. G.-J. of S. and, & 2008, undefined. (n.d.). Applying machine learning to software fault-proneness prediction. Elsevier. Retrieved February 6, 2025, from https://www.sciencedirect.com/science/article/pii/S0164121207001240

Software, L. B.-I. C. on Q., & 2008, undefined. (n.d.). Novel applications of machine learning in software testing. Ieeexplore.Ieee.Org. Retrieved February 6, 2025, from https://ieeexplore.ieee.org/abstract/document/4601522/

Tran, H., Le, S., Nguyen, S., Science, P. H.-S. C., & 2020, undefined. (n.d.). An analysis of software bug reports using machine learning techniques. Springer. Retrieved February 6, 2025, from https://link.springer.com/article/10.1007/s42979-019-0004-1

Tran, H. M., Le, S. T., Nguyen, S. Van, & Ho, P. T. (2020). An Analysis of Software Bug Reports Using Machine Learning Techniques. SN Computer Science, 1(1). https://doi.org/10.1007/S42979-019-0004-1 DOI: https://doi.org/10.1007/s42979-019-0004-1

Wang, J., Huang, Y., Chen, C., Liu, Z., Wang, S., & Wang, Q. (n.d.). Software testing with large language models: Survey, landscape, and vision. Ieeexplore.Ieee.Org. Retrieved February 6, 2025, from https://ieeexplore.ieee.org/abstract/document/10440574/

Zhang, D., Journal, J. T.-S. Q., & 2003, undefined. (n.d.). Machine learning and software engineering. Springer. Retrieved February 6, 2025, from https://link.springer.com/article/10.1023/A:1023760326768

Zhang, D., & Tsai, J. J. P. (2003). Machine learning and software engineering. Software Quality Journal, 11(2), 87–119. https://doi.org/10.1023/A:1023760326768 DOI: https://doi.org/10.1023/A:1023760326768

Zhong, S., Khoshgoftaar, T., HASE, N. S.-, & 2004, undefined. (n.d.). Unsupervised learning for expert-based software quality estimation. Citeseer. Retrieved February 6, 2025, from https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=bcf39cc6aeaba6489e10042bbb38cdd49110f984

Downloads

Published

2024-06-30

How to Cite

Singh, R., & Bansal, M. (2024). PREDICTIVE SOFTWARE QUALITY ANALYSIS USING TARGETED METRICS AND MACHINE LEARNING MODELS. ShodhKosh: Journal of Visual and Performing Arts, 5(6), 2450–2466. https://doi.org/10.29121/shodhkosh.v5.i6.2024.5569