ADVANCES IN COMPUTER VISION: NEW HORIZONS AND ONGOING CHALLENGES
DOI:
https://doi.org/10.29121/shodhkosh.v5.i5.2024.1893Keywords:
Computer Vision Deep Learning, Convolutional Neural Networks (CNNs), Generative Adversarial Networks (GANs), Image Recognition, Explainable AI, Semantic SegmentationAbstract [English]
Computer vision, a rapidly evolving field at the intersection of computer science and artificial intelligence, has witnessed unprecedented growth in recent years. This comprehensive review paper provides an overview of the advancements and challenges in computer vision, synthesizing the latest research findings, methodologies, and applications. We explore the historical evolution of computer vision and discuss recent advancements in algorithms and techniques, including deep learning models such as convolutional neural networks (CNNs) and generative adversarial networks (GANs). Diverse applications of computer vision across domains such as healthcare, autonomous vehicles, surveillance, and augmented reality are also examined. Despite remarkable progress, computer vision faces significant challenges, including robustness to adversarial attacks, interpretability, ethical considerations, and regulatory compliance. We discuss these challenges in-depth and highlight the importance of interdisciplinary collaboration in addressing them. Additionally, recent trends and future directions in computer vision research, such as self-supervised learning and explainable AI, are identified. By synthesizing insights from academic research and industrial developments, this review paper aims to provide a comprehensive understanding of the current landscape of computer vision and guide future research endeavors.
References
Pan, Z., Yu, W., Yi, X., Khan, A., Yuan, F., and Zheng, Y., 2019. Recent progress on generative adversarial networks (GANs): A survey. IEEE Access, 7, pp.36322-36333. DOI: https://doi.org/10.1109/ACCESS.2019.2905015
Ulrich, M., Steger, C. and Baumgartner, A., 2003. Real-time object recognition using a modified generalized Hough transform. Pattern Recognition, 36(11), pp.2557-2570. DOI: https://doi.org/10.1016/S0031-3203(03)00169-9
Kukacka, M., Neocognitron: A Survey of a Classical Hybrid Neural Network Model.
Wang, Y., Li, Z., Wang, L. and Wang, M., 2013. A Scale Invariant Feature Transform Based Method. J. Inf. Hiding Multim. Signal Process. 4(2), pp.73-89.
Bay, H., Tuytelaars, T. and Van Gool, L., 2006. Surf: Speeded up robust features. In Computer Vision–ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, May 7-13, 2006. Proceedings, Part I 9 (pp. 404-417). Springer Berlin Heidelberg. DOI: https://doi.org/10.1007/11744023_32
Zhang, Y., Jin, R. and Zhou, Z.H., 2010. Understanding bag-of-words model: a statistical framework. International journal of machine learning and cybernetics, 1, pp.43-52. DOI: https://doi.org/10.1007/s13042-010-0001-0
Zakaria, N. and Hassim, Y.M.M., 2024. A Review Study of the Visual Geometry Group Approaches for Image Classification. Journal of Applied Science, Technology and Computing, 1(1), pp.14-28.
Alom, M.Z., Taha, T.M., Yakopcic, C., Westberg, S., Sidike, P., Nasrin, M.S., Van Esesn, B.C., Awwal, A.A.S. and Asari, V.K., 2018. The history began from Alexnet: A comprehensive survey on deep learning approaches. arXiv preprint arXiv:1803.01164.
Majib, M.S., Rahman, M.M., Sazzad, T.S., Khan, N.I. and Dey, S.K., 2021. Vgg-scnet: A vgg net-based deep learning framework for brain tumor detection on MRI images. IEEE Access, 9, pp.116942-116952. DOI: https://doi.org/10.1109/ACCESS.2021.3105874
Anand, R., Shanthi, T., Nithish, M.S. and Lakshman, S., 2020. Face recognition and classification using GoogleNET architecture. In Soft Computing for Problem Solving: SocProS 2018, Volume 1 (pp. 261-269). Springer Singapore. DOI: https://doi.org/10.1007/978-981-15-0035-0_20
Reddy, A.S.B. and Juliet, D.S., 2019, April. Transfer learning with ResNet-50 for malaria cell-image classification. In 2019 International Conference on Communication and Signal Processing (ICCSP) (pp. 0945-0949). IEEE. DOI: https://doi.org/10.1109/ICCSP.2019.8697909
Lindsay, G.W., 2021. Convolutional neural networks as a model of the visual system: Past, present, and future. Journal of cognitive neuroscience, 33(10), pp.2017-2031. DOI: https://doi.org/10.1162/jocn_a_01544
Zoumpourlis, G., Doumanoglou, A., Vretos, N. and Daras, P., 2017. Non-linear convolution filters for cnn-based learning. In Proceedings of the IEEE International Conference on Computer Vision (pp. 4761-4769). DOI: https://doi.org/10.1109/ICCV.2017.510
Iandola, F., Moskowitz, M., Karayev, S., Girshick, R., Darrell, T. and Keutzer, K., 2014. Densenet: Implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869.
Pouyanfar, S., Chen, S.C. and Shyu, M.L., 2017, July. An efficient deep residual-inception network for multimedia classification. In 2017 IEEE International Conference on Multimedia and Expo (ICME) (pp. 373-378). IEEE. DOI: https://doi.org/10.1109/ICME.2017.8019447
Hoang, V.T. and Jo, K.H., 2021, July. Practical analysis on architecture of EfficientNet. In 2021 14th International Conference on Human System Interaction (HSI) (pp. 1-4). IEEE. DOI: https://doi.org/10.1109/HSI52170.2021.9538782
Chen, C.F.R., Fan, Q. and Panda, R., 2021. Crossvit: Cross-attention multi-scale vision transformer for image classification. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 357-366). DOI: https://doi.org/10.1109/ICCV48922.2021.00041
Gao, J., Yang, Y., Lin, P. and Park, D.S., 2018. Computer vision in healthcare applications. Journal of Healthcare Engineering, 2018. DOI: https://doi.org/10.1155/2018/5157020
Alfahdawi, M.G., Alheeti, K.M.A. and Al-Rawi, S.S., 2021, June. Intelligent Object Recognition System for Autonomous and Semi-Autonomous Vehicles. In 2021 International Conference on Communication & Information Technology (ICICT) (pp. 227-233). IEEE. DOI: https://doi.org/10.1109/ICICT52195.2021.9568417
Nadeem, U., Shah, S.A.A., Sohel, F., Togneri, R. and Bennamoun, M., 2019. Deep learning for scene understanding. Handbook of deep learning applications, pp.21-51. DOI: https://doi.org/10.1007/978-3-030-11479-4_2
Olatunji, I.E. and Cheng, C.H., 2019. Video analytics for visual surveillance and applications: An overview and survey. Machine Learning Paradigms: Applications of Learning and Analytics in Intelligent Systems, pp.475-515. DOI: https://doi.org/10.1007/978-3-030-15628-2_15
Canedo, D. and Neves, A.J., 2019. Facial expression recognition using computer vision: A systematic review. Applied Sciences, 9(21), p.4678. DOI: https://doi.org/10.3390/app9214678
Lipton, A.J., Heartwell, C.H., Haering, N. and Madden, D., 2002, October. Critical asset protection, perimeter monitoring, and threat detection using automated video surveillance. In Proceedings of the Thirty-Sixth Annual International Carnahan Conference on Security Technology.
Sharma, R. and Molineros, J., 1995, March. Role of computer vision in augmented virtual reality. In Stereoscopic Displays and Virtual Reality Systems II (Vol. 2409, pp. 220-231). SPIE. DOI: https://doi.org/10.1117/12.205864
Kim, W.S., 1999. Computer vision-assisted virtual reality calibration. IEEE Transactions on Robotics and Automation, 15(3), pp.450-464. DOI: https://doi.org/10.1109/70.768178
Vergara-Villegas, O.O., Cruz-Sánchez, V.G., de Jesús Ochoa-Domínguez, H., de Jesús Nandayapa-Alfaro, M. and Flores-Abad, Á., 2014. Automatic product quality inspection using computer vision systems. Lean manufacturing in the developing world: Methodology, case studies and trends from Latin America, pp.135-156. DOI: https://doi.org/10.1007/978-3-319-04951-9_7
Kragic, D. and Christensen, H.I., 2005. Advances in robot vision. Robotics and Autonomous Systems, 52(1), pp.1-3. DOI: https://doi.org/10.1016/j.robot.2005.03.007
Gour L. and Waoo A. A., 2018. Implementing Fault Resilient Strategies in Cloud Computing via Federated Learning Approach, Journal of Innovation in Applied Research, Vol.1Issue 1m pp. 1-5.
Lokendra Gour and Akhilesh A Waoo. Fault-tolerant framework with federated learning for reliable and robust distributed system. In THEETAS 2022: Proceedings of The International Conference on Emerging Trends in Artificial Intelligence and Smart Systems, THEETAS 2022, 16-17 April 2022, Jabalpur, India, page 219. European Alliance for Innovation, 2022. DOI: https://doi.org/10.4108/eai.16-4-2022.2318146
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Rahul Kumar Majhi, Akhilesh A. Waoo

This work is licensed under a Creative Commons Attribution 4.0 International License.
With the licence CC-BY, authors retain the copyright, allowing anyone to download, reuse, re-print, modify, distribute, and/or copy their contribution. The work must be properly attributed to its author.
It is not necessary to ask for further permission from the author or journal board.
This journal provides immediate open access to its content on the principle that making research freely available to the public supports a greater global exchange of knowledge.