PERFORMANCE MEASURE OF VARIOUS MACHINE LEARNING OPTIMIZERS FOR DIABETES PREDICTION IN INDIAN WOMAN.

Surendra Goura; Md. Tabrez Nafis; Suraiya Parveena

doi:10.29121/shodhkosh.v4.i2.2023.5206

Authors

Surendra Goura Department of Computer Science & Engineering, Jamia Hamdard New Delhi,110062, India
Md. Tabrez Nafis Department of Computer Science & Engineering, Jamia Hamdard New Delhi,110062, India
Suraiya Parveena Department of Computer Science & Engineering, Jamia Hamdard New Delhi,110062, India

DOI:

https://doi.org/10.29121/shodhkosh.v4.i2.2023.5206

Keywords:

Optimizers, Learning Rate, Gradient, Iteration, Momentum

Abstract [English]

This research paper presents a comprehensive comparative analysis of gradient descent optimization algorithms using a Diabetes Prediction dataset. The study explores their strengths, weaknesses, and performance characteristics under two different conditions, namely with and without feature engineering. The objective is to obtain proper insights into the effectiveness and efficiency of these algorithms in predicting diabetes. The analysis focuses on widely used algorithms, including stochastic gradient descent (SGD) and advanced variants like Nesterov accelerated gradient and adaptive learning rate techniques (e.g., Adam, AdaGrad, AdaMax, and AdaDelta). By evaluating their performance on the dataset under two different scenarios this research provides valuable insights into the performance of these algorithms. The obtained result show that, SGD variants (classic SGD, momentum, Nesterov), RMSProp, Adam, AdaMax, and Nadam outperformed AdaGrad and AdaDelta in minimizing error (lower MAE values) in both scenarios.

References

Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. Proceedings of COMPSTAT'2010, 177-186. DOI: https://doi.org/10.1007/978-3-7908-2604-3_16

Sutskever, I., Martens, J., Dahl, G. E., & Hinton, G. E. (2013). On the importance of initialization and momentum in deep learning. Proceedings of ICML'2013, 1139-1147.

Tieleman, T., & Hinton, G. (2012). Lecture 6

Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12, 2121-2159.

Zeiler, M. D. (2012). ADADELTA: An adaptive learning rate method. arXiv preprint arXiv:1212.5701.

Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR'2015).

Kingma, D. P., & Lei Ba, J. (2017). Adamax: A variant of the Adam optimizer. arXiv preprint arXiv:1412.6980.

H. Tao and X. Lu, "On Comparing Six Optimization Algorithms for Network-based Wind Speed Forecasting," 2018 37th Chinese Control Conference (CCC), Wuhan, China, 2018, pp. 8843-8850, doi: 10.23919/ChiCC.2018.8482567. DOI: https://doi.org/10.23919/ChiCC.2018.8482567

Aatila Mustapha et al 2021 J. Phys.: Conf. Ser. 1743 01200 DOI: https://doi.org/10.1088/1742-6596/1743/1/012002

Polyak, B. T. (1964). Some methods of speeding up the convergence of iteration methods. USSR computational mathematics and mathematical physics, 4(5), 1-17. DOI: https://doi.org/10.1016/0041-5553(64)90137-5

Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536. DOI: https://doi.org/10.1038/323533a0

Ruder, S. (2016). An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747.

Reddi, S. J., Kale, S., & Kumar, S. (2018). On the convergence of Adam and beyond. arXiv preprint arXiv:1904.09237.

Dozat, T. (2016). Incorporating Nesterov Momentum into Adam. ICLR Workshop.

Mehmet Akturk, accessed 22 June 2023 https://www.kaggle.com/datasets/mathchi/diabetes-data-set

Bishop, C.M. (2006) Pattern Recognition and Machine Learning. Springer, Berlin.

Dean, J. and Ghemawat, S. (2008) MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM, 51, 107-11. https://doi.org/10.1145/1327452.1327492 DOI: https://doi.org/10.1145/1327452.1327492

Bengio, Y. (2012). Practical Recommendations for Gradient-Based Training of Deep Architectures. In: Montavon, G., Orr, G.B., Müller, KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol 7700. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35289-8_26 DOI: https://doi.org/10.1007/978-3-642-35289-8_26

Chandra M and Matthias M 2017 Variants of RMSProp and Adagrad with Logarithmic Regret Bounds (arXiv:1706.05507)

Yazan, E., & TALU, M. F., (2017). Comparison of the Stochastic Gradient Descent Based Optimization Techniques . 2017 International Artificial Intelligence and Data Processing Symposium (IDAP), Malatya, Turkey DOI: https://doi.org/10.1109/IDAP.2017.8090299

Voronov, Sergey & Voronov, Ilya & Kovalenko, Roman. (2018). Comparative analysis of stochastic optimization algorithms for image registration. 123-130. 10.18287/1613-0073-2018-2210-123-130. DOI: https://doi.org/10.18287/1613-0073-2018-2210-123-130

Hassan, E., Shams, M.Y., Hikal, N.A. Elmougy Samir. The effect of choosing optimizer algorithms to improve computer vision tasks: a comparative study. Multimedia Tools Appl 82, 16591–16633 (2023). https://doi.org/10.1007/s11042-022-13820-0 DOI: https://doi.org/10.1007/s11042-022-13820-0