ADVANCES IN COMPUTER VISION: NEW HORIZONS AND ONGOING CHALLENGES

Authors

  • Rahul Kumar Majhi Department of Computer Science and Information Technology, AKS University, Satna, India
  • Akhilesh A. Waoo Department of Computer Science and Information Technology, AKS University, Satna, India

DOI:

https://doi.org/10.29121/shodhkosh.v5.i5.2024.1893

Keywords:

Computer Vision Deep Learning, Convolutional Neural Networks (CNNs), Generative Adversarial Networks (GANs), Image Recognition, Explainable AI, Semantic Segmentation

Abstract [English]

Computer vision, a rapidly evolving field at the intersection of computer science and artificial intelligence, has witnessed unprecedented growth in recent years. This comprehensive review paper provides an overview of the advancements and challenges in computer vision, synthesizing the latest research findings, methodologies, and applications. We explore the historical evolution of computer vision and discuss recent advancements in algorithms and techniques, including deep learning models such as convolutional neural networks (CNNs) and generative adversarial networks (GANs). Diverse applications of computer vision across domains such as healthcare, autonomous vehicles, surveillance, and augmented reality are also examined. Despite remarkable progress, computer vision faces significant challenges, including robustness to adversarial attacks, interpretability, ethical considerations, and regulatory compliance. We discuss these challenges in-depth and highlight the importance of interdisciplinary collaboration in addressing them. Additionally, recent trends and future directions in computer vision research, such as self-supervised learning and explainable AI, are identified. By synthesizing insights from academic research and industrial developments, this review paper aims to provide a comprehensive understanding of the current landscape of computer vision and guide future research endeavors.

References

Pan, Z., Yu, W., Yi, X., Khan, A., Yuan, F., and Zheng, Y., 2019. Recent progress on generative adversarial networks (GANs): A survey. IEEE Access, 7, pp.36322-36333. DOI: https://doi.org/10.1109/ACCESS.2019.2905015

Ulrich, M., Steger, C. and Baumgartner, A., 2003. Real-time object recognition using a modified generalized Hough transform. Pattern Recognition, 36(11), pp.2557-2570. DOI: https://doi.org/10.1016/S0031-3203(03)00169-9

Kukacka, M., Neocognitron: A Survey of a Classical Hybrid Neural Network Model.

Wang, Y., Li, Z., Wang, L. and Wang, M., 2013. A Scale Invariant Feature Transform Based Method. J. Inf. Hiding Multim. Signal Process. 4(2), pp.73-89.

Bay, H., Tuytelaars, T. and Van Gool, L., 2006. Surf: Speeded up robust features. In Computer Vision–ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, May 7-13, 2006. Proceedings, Part I 9 (pp. 404-417). Springer Berlin Heidelberg. DOI: https://doi.org/10.1007/11744023_32

Zhang, Y., Jin, R. and Zhou, Z.H., 2010. Understanding bag-of-words model: a statistical framework. International journal of machine learning and cybernetics, 1, pp.43-52. DOI: https://doi.org/10.1007/s13042-010-0001-0

Zakaria, N. and Hassim, Y.M.M., 2024. A Review Study of the Visual Geometry Group Approaches for Image Classification. Journal of Applied Science, Technology and Computing, 1(1), pp.14-28.

Alom, M.Z., Taha, T.M., Yakopcic, C., Westberg, S., Sidike, P., Nasrin, M.S., Van Esesn, B.C., Awwal, A.A.S. and Asari, V.K., 2018. The history began from Alexnet: A comprehensive survey on deep learning approaches. arXiv preprint arXiv:1803.01164.

Majib, M.S., Rahman, M.M., Sazzad, T.S., Khan, N.I. and Dey, S.K., 2021. Vgg-scnet: A vgg net-based deep learning framework for brain tumor detection on MRI images. IEEE Access, 9, pp.116942-116952. DOI: https://doi.org/10.1109/ACCESS.2021.3105874

Anand, R., Shanthi, T., Nithish, M.S. and Lakshman, S., 2020. Face recognition and classification using GoogleNET architecture. In Soft Computing for Problem Solving: SocProS 2018, Volume 1 (pp. 261-269). Springer Singapore. DOI: https://doi.org/10.1007/978-981-15-0035-0_20

Reddy, A.S.B. and Juliet, D.S., 2019, April. Transfer learning with ResNet-50 for malaria cell-image classification. In 2019 International Conference on Communication and Signal Processing (ICCSP) (pp. 0945-0949). IEEE. DOI: https://doi.org/10.1109/ICCSP.2019.8697909

Lindsay, G.W., 2021. Convolutional neural networks as a model of the visual system: Past, present, and future. Journal of cognitive neuroscience, 33(10), pp.2017-2031. DOI: https://doi.org/10.1162/jocn_a_01544

Zoumpourlis, G., Doumanoglou, A., Vretos, N. and Daras, P., 2017. Non-linear convolution filters for cnn-based learning. In Proceedings of the IEEE International Conference on Computer Vision (pp. 4761-4769). DOI: https://doi.org/10.1109/ICCV.2017.510

Iandola, F., Moskowitz, M., Karayev, S., Girshick, R., Darrell, T. and Keutzer, K., 2014. Densenet: Implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869.

Pouyanfar, S., Chen, S.C. and Shyu, M.L., 2017, July. An efficient deep residual-inception network for multimedia classification. In 2017 IEEE International Conference on Multimedia and Expo (ICME) (pp. 373-378). IEEE. DOI: https://doi.org/10.1109/ICME.2017.8019447

Hoang, V.T. and Jo, K.H., 2021, July. Practical analysis on architecture of EfficientNet. In 2021 14th International Conference on Human System Interaction (HSI) (pp. 1-4). IEEE. DOI: https://doi.org/10.1109/HSI52170.2021.9538782

Chen, C.F.R., Fan, Q. and Panda, R., 2021. Crossvit: Cross-attention multi-scale vision transformer for image classification. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 357-366). DOI: https://doi.org/10.1109/ICCV48922.2021.00041

Gao, J., Yang, Y., Lin, P. and Park, D.S., 2018. Computer vision in healthcare applications. Journal of Healthcare Engineering, 2018. DOI: https://doi.org/10.1155/2018/5157020

Alfahdawi, M.G., Alheeti, K.M.A. and Al-Rawi, S.S., 2021, June. Intelligent Object Recognition System for Autonomous and Semi-Autonomous Vehicles. In 2021 International Conference on Communication & Information Technology (ICICT) (pp. 227-233). IEEE. DOI: https://doi.org/10.1109/ICICT52195.2021.9568417

Nadeem, U., Shah, S.A.A., Sohel, F., Togneri, R. and Bennamoun, M., 2019. Deep learning for scene understanding. Handbook of deep learning applications, pp.21-51. DOI: https://doi.org/10.1007/978-3-030-11479-4_2

Olatunji, I.E. and Cheng, C.H., 2019. Video analytics for visual surveillance and applications: An overview and survey. Machine Learning Paradigms: Applications of Learning and Analytics in Intelligent Systems, pp.475-515. DOI: https://doi.org/10.1007/978-3-030-15628-2_15

Canedo, D. and Neves, A.J., 2019. Facial expression recognition using computer vision: A systematic review. Applied Sciences, 9(21), p.4678. DOI: https://doi.org/10.3390/app9214678

Lipton, A.J., Heartwell, C.H., Haering, N. and Madden, D., 2002, October. Critical asset protection, perimeter monitoring, and threat detection using automated video surveillance. In Proceedings of the Thirty-Sixth Annual International Carnahan Conference on Security Technology.

Sharma, R. and Molineros, J., 1995, March. Role of computer vision in augmented virtual reality. In Stereoscopic Displays and Virtual Reality Systems II (Vol. 2409, pp. 220-231). SPIE. DOI: https://doi.org/10.1117/12.205864

Kim, W.S., 1999. Computer vision-assisted virtual reality calibration. IEEE Transactions on Robotics and Automation, 15(3), pp.450-464. DOI: https://doi.org/10.1109/70.768178

Vergara-Villegas, O.O., Cruz-Sánchez, V.G., de Jesús Ochoa-Domínguez, H., de Jesús Nandayapa-Alfaro, M. and Flores-Abad, Á., 2014. Automatic product quality inspection using computer vision systems. Lean manufacturing in the developing world: Methodology, case studies and trends from Latin America, pp.135-156. DOI: https://doi.org/10.1007/978-3-319-04951-9_7

Kragic, D. and Christensen, H.I., 2005. Advances in robot vision. Robotics and Autonomous Systems, 52(1), pp.1-3. DOI: https://doi.org/10.1016/j.robot.2005.03.007

Gour L. and Waoo A. A., 2018. Implementing Fault Resilient Strategies in Cloud Computing via Federated Learning Approach, Journal of Innovation in Applied Research, Vol.1Issue 1m pp. 1-5.

Lokendra Gour and Akhilesh A Waoo. Fault-tolerant framework with federated learning for reliable and robust distributed system. In THEETAS 2022: Proceedings of The International Conference on Emerging Trends in Artificial Intelligence and Smart Systems, THEETAS 2022, 16-17 April 2022, Jabalpur, India, page 219. European Alliance for Innovation, 2022. DOI: https://doi.org/10.4108/eai.16-4-2022.2318146

Downloads

Published

2024-05-31

How to Cite

Majhi, R. K., & Waoo, A. A. (2024). ADVANCES IN COMPUTER VISION: NEW HORIZONS AND ONGOING CHALLENGES. ShodhKosh: Journal of Visual and Performing Arts, 5(5), 431–438. https://doi.org/10.29121/shodhkosh.v5.i5.2024.1893