AI-BASED EDUCATIONAL VIDEO SUMMARIZATION

Authors

  • Dr. Satish Choudhury Associate Professor, Department of Electrical and Electronics Engineering, Institute of Technical Education and Research, Siksha 'O' Anusandhan (Deemed to be University) Bhubaneswar, Odisha, India
  • Mani Nandini Sharma Assistant Professor, School of Fine Arts & Design, Noida International University, Noida, Uttar Pradesh, India.
  • Rajeev Sharma Centre of Research Impact and Outcome, Chitkara University, Rajpura- 140417, Punjab, India
  • Ganesh Rambhau Gandal Smt. kashibai Navale College of Engineering, Pune, Maharashtra, India.
  • Dr. Satish Choudhury Associate Professor, Department of Electrical and Electronics Engineering, Institute of Technical Education and Research, Siksha 'O' Anusandhan (Deemed to be University) Bhubaneswar, Odisha, India
  • Avni Garg Chitkara Centre for Research and Development, Chitkara University, Himachal Pradesh, Solan, 174103, India

DOI:

https://doi.org/10.29121/shodhkosh.v6.i2s.2025.6747

Keywords:

AI-Based Summarization, Educational Videos, Deep Learning, Natural Language Processing, Computer Vision, Multimodal Analysis, Adaptive Learning, Content Extraction, Automatic Speech Recognition, Personalized Education

Abstract [English]

The proliferation of digital educational content in exponential amounts has led to the creation of an urgency among the efficient methods of summarization that can be used to create large instructional videos into meaningful and succinct features. Educational video summarization is an AI-powered system based on advanced machine learning and natural language processing and computer vision algorithms to provide short, context-rich summaries to make accessibility and understandability more accessible and consumer-friendly among learners. This method combines the multimodal analysis of data based on speech recognition, literature transcription, and understanding of the visual scene to determine the most important instructional points and eliminate superfluous information. Transformer based architectures of deep learning are used to learn semantic associations among spoken words, visual images, and instructional gestures. The models are used to extract relevant pedagogically coherent summaries in accordance with learning objectives. The suggested structure works in the steps of video segmentation, feature extraction, content ranking, and the creation of summaries. At the same time, visual attention models are used to examine the frame and identify slides, demonstrations, and the focus points of the instructor to make sure that the most important educational aspects are kept. The condensed version can be delivered as text-based, video-based, or a combination of both and it promotes adaptive learning systems and customized learning. The AI summarization has shown to be very effective in reducing cognitive overload, improved content discoverability and facilitated efficient learning as students can concentrate on the key information. In addition, it helps teachers and learning institutions in the production of highlight reels, course previews, and searchable knowledge bases. Consequently, this technology will provide a non-discriminatory learning environment in which different learners will enjoy personalized learning experiences. The future directions are to combine affective computing and learner-feedback to further streamline the summary relevance and pedagogical influence.

References

Ansari, S. A., and Zafar, A. (2023). Multi Video Summarization Using Query Based Deep Optimization Algorithm. International Journal of Machine Learning and Cybernetics, 14(10), 3591–3606. https://doi.org/10.1007/s13042-023-01852-3 DOI: https://doi.org/10.1007/s13042-023-01852-3

Chai, C., et al. (2021). Graph-Based Structural Difference Analysis for Video Summarization. Information Sciences, 577, 483–509. https://doi.org/10.1016/j.ins.2021.07.012 DOI: https://doi.org/10.1016/j.ins.2021.07.012

Chen, B., Meng, F., Tang, H., and Tong, G. (2023). Two-Level Attention Module Based on Spurious-3D Residual Networks for Human Action Recognition. Sensors, 23(3), 1707. https://doi.org/10.3390/s23031707 DOI: https://doi.org/10.3390/s23031707

Dey, A., Biswas, S., and Le, D.-N. (2024). Workout Action Recognition in Video Streams using an Attention Driven Residual DC-GRU Network. Computers, Materials and Continua, 79(2), 3067–3087. https://doi.org/10.32604/cmc.2024.049512 DOI: https://doi.org/10.32604/cmc.2024.049512

Hu, W., et al. (2023). Query-Based Video Summarization with Multi-Label Classification Network. Multimedia Tools and Applications, 82(24), 37529–37549. https://doi.org/10.1007/s11042-023-15126-1 DOI: https://doi.org/10.1007/s11042-023-15126-1

Kadam, P., et al. (2022). Recent Challenges and Opportunities in Video Summarization with Machine Learning Algorithms. IEEE Access, 10, 122762–122785. https://doi.org/10.1109/ACCESS.2022.3223379 DOI: https://doi.org/10.1109/ACCESS.2022.3223379

Ul Haq, H. B., Asif, M., Ahmad, M. B., Ashraf, R., and Mahmood, T. (2022). An Effective Video Summarization Framework Based on the Object of Interest Using Deep Learning. Mathematical Problems in Engineering, 2022, Article 7453744. https://doi.org/10.1155/2022/7453744 DOI: https://doi.org/10.1155/2022/7453744

Vora, D., Kadam, P., Mohite, D. D., et al. (2025). AI-Driven Video Summarization for Optimizing Content Retrieval and Management Through Deep Learning Techniques. Scientific Reports, 15, 4058. https://doi.org/10.1038/s41598-025-87824-9 DOI: https://doi.org/10.1038/s41598-025-87824-9

Wadibhasme, R. N., Chaudhari, A. U., Khobragade, P., Mehta, H. D., Agrawal, R., and Dhule, C. (2024). Detection and Prevention of Malicious Activities in Vulnerable Network Security Using Deep Learning. In 2024 International Conference on Innovations and Challenges in Emerging Technologies (ICICET) (1–6). IEEE. https://doi.org/10.1109/ICICET59348.2024.10616289 DOI: https://doi.org/10.1109/ICICET59348.2024.10616289

Weng, Z., Li, X., and Xiong, S. (2024). Action Recognition Using Attention-Based Spatio-Temporal Vlad Networks and Adaptive Video Sequences Optimization. Scientific Reports, 14(1), 26202. https://doi.org/10.1038/s41598-024-75640-6 DOI: https://doi.org/10.1038/s41598-024-75640-6

Wu, G., Lin, J., and Silva, C. T. (2022). IntentVizor: Towards Generic Query Guided Interactive Video Summarization. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 10493–10502). IEEE.sss https://doi.org/10.1109/CVPR52688.2022.01025 DOI: https://doi.org/10.1109/CVPR52688.2022.01025

Xiao, S., Zhao, Z., Zhang, Z., Guan, Z., and Cai, D. (2020). Query-Biased Self-Attentive Network for Query-Focused Video Summarization. IEEE Transactions on Image Processing, 29, 5889–5899. https://doi.org/10.1109/TIP.2020.2985868 DOI: https://doi.org/10.1109/TIP.2020.2985868

Zhao, B., Gong, M., and Li, X. (2022). Hierarchical Multimodal Transformer to Summarize Videos. Neurocomputing, 468, 360–369. https://doi.org/10.1016/j.neucom.2021.10.039 DOI: https://doi.org/10.1016/j.neucom.2021.10.039

Downloads

Published

2025-12-16

How to Cite

Choudhury, S., Sharma, M. N., Sharma, R., Gandal, G. R., Choudhury, S., & Garg, A. (2025). AI-BASED EDUCATIONAL VIDEO SUMMARIZATION. ShodhKosh: Journal of Visual and Performing Arts, 6(2s), 272–280. https://doi.org/10.29121/shodhkosh.v6.i2s.2025.6747