PERFORMANCE MEASURE OF VARIOUS MACHINE LEARNING OPTIMIZERS FOR DIABETES PREDICTION IN INDIAN WOMAN.
DOI:
https://doi.org/10.29121/shodhkosh.v4.i2.2023.5206Keywords:
Optimizers, Learning Rate, Gradient, Iteration, MomentumAbstract [English]
This research paper presents a comprehensive comparative analysis of gradient descent optimization algorithms using a Diabetes Prediction dataset. The study explores their strengths, weaknesses, and performance characteristics under two different conditions, namely with and without feature engineering. The objective is to obtain proper insights into the effectiveness and efficiency of these algorithms in predicting diabetes. The analysis focuses on widely used algorithms, including stochastic gradient descent (SGD) and advanced variants like Nesterov accelerated gradient and adaptive learning rate techniques (e.g., Adam, AdaGrad, AdaMax, and AdaDelta). By evaluating their performance on the dataset under two different scenarios this research provides valuable insights into the performance of these algorithms. The obtained result show that, SGD variants (classic SGD, momentum, Nesterov), RMSProp, Adam, AdaMax, and Nadam outperformed AdaGrad and AdaDelta in minimizing error (lower MAE values) in both scenarios.
References
Bottou, L. (2010). Large-scale machine learning with stochastic gradient descent. Proceedings of COMPSTAT'2010, 177-186. DOI: https://doi.org/10.1007/978-3-7908-2604-3_16
Sutskever, I., Martens, J., Dahl, G. E., & Hinton, G. E. (2013). On the importance of initialization and momentum in deep learning. Proceedings of ICML'2013, 1139-1147.
Tieleman, T., & Hinton, G. (2012). Lecture 6
Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12, 2121-2159.
Zeiler, M. D. (2012). ADADELTA: An adaptive learning rate method. arXiv preprint arXiv:1212.5701.
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR'2015).
Kingma, D. P., & Lei Ba, J. (2017). Adamax: A variant of the Adam optimizer. arXiv preprint arXiv:1412.6980.
H. Tao and X. Lu, "On Comparing Six Optimization Algorithms for Network-based Wind Speed Forecasting," 2018 37th Chinese Control Conference (CCC), Wuhan, China, 2018, pp. 8843-8850, doi: 10.23919/ChiCC.2018.8482567. DOI: https://doi.org/10.23919/ChiCC.2018.8482567
Aatila Mustapha et al 2021 J. Phys.: Conf. Ser. 1743 01200 DOI: https://doi.org/10.1088/1742-6596/1743/1/012002
Polyak, B. T. (1964). Some methods of speeding up the convergence of iteration methods. USSR computational mathematics and mathematical physics, 4(5), 1-17. DOI: https://doi.org/10.1016/0041-5553(64)90137-5
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536. DOI: https://doi.org/10.1038/323533a0
Ruder, S. (2016). An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747.
Reddi, S. J., Kale, S., & Kumar, S. (2018). On the convergence of Adam and beyond. arXiv preprint arXiv:1904.09237.
Dozat, T. (2016). Incorporating Nesterov Momentum into Adam. ICLR Workshop.
Mehmet Akturk, accessed 22 June 2023 https://www.kaggle.com/datasets/mathchi/diabetes-data-set
Bishop, C.M. (2006) Pattern Recognition and Machine Learning. Springer, Berlin.
Dean, J. and Ghemawat, S. (2008) MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM, 51, 107-11. https://doi.org/10.1145/1327452.1327492 DOI: https://doi.org/10.1145/1327452.1327492
Bengio, Y. (2012). Practical Recommendations for Gradient-Based Training of Deep Architectures. In: Montavon, G., Orr, G.B., Müller, KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol 7700. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35289-8_26 DOI: https://doi.org/10.1007/978-3-642-35289-8_26
Chandra M and Matthias M 2017 Variants of RMSProp and Adagrad with Logarithmic Regret Bounds (arXiv:1706.05507)
Yazan, E., & TALU, M. F., (2017). Comparison of the Stochastic Gradient Descent Based Optimization Techniques . 2017 International Artificial Intelligence and Data Processing Symposium (IDAP), Malatya, Turkey DOI: https://doi.org/10.1109/IDAP.2017.8090299
Voronov, Sergey & Voronov, Ilya & Kovalenko, Roman. (2018). Comparative analysis of stochastic optimization algorithms for image registration. 123-130. 10.18287/1613-0073-2018-2210-123-130. DOI: https://doi.org/10.18287/1613-0073-2018-2210-123-130
Hassan, E., Shams, M.Y., Hikal, N.A. Elmougy Samir. The effect of choosing optimizer algorithms to improve computer vision tasks: a comparative study. Multimedia Tools Appl 82, 16591–16633 (2023). https://doi.org/10.1007/s11042-022-13820-0 DOI: https://doi.org/10.1007/s11042-022-13820-0
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Surendra Goura, Md. Tabrez Nafis, Suraiya Parveena

This work is licensed under a Creative Commons Attribution 4.0 International License.
With the licence CC-BY, authors retain the copyright, allowing anyone to download, reuse, re-print, modify, distribute, and/or copy their contribution. The work must be properly attributed to its author.
It is not necessary to ask for further permission from the author or journal board.
This journal provides immediate open access to its content on the principle that making research freely available to the public supports a greater global exchange of knowledge.