ML AND RAG-BASED INTELLIGENT SYSTEM FOR YOGA POSE RECOGNITION AND CORRECTIVE GUIDANCE

Harish  Barapatre; Pratik  Malgunde; Atharva  Pratap; Rayan  Shaikh

doi:10.29121/ijetmr.v13.i4.2026.1768

Authors

Dr. Harish Barapatre Associate Professor, Department of Computer Engineering, Yadavrao Tasgaonkar Institute of Engineering and Technology, Bhivpuri Road Karjat, Maharashtra, 410201, India
Pratik Malgunde Student, Department of Computer Engineering, Yadavrao Tasgaonkar Institute of Engineering and Technology, Bhivpuri Road Karjat, Maharashtra, 410201 India
Atharva Pratap Student, Department of Computer Engineering, Yadavrao Tasgaonkar Institute of Engineering and Technology, Bhivpuri Road Karjat, Maharashtra, 410201 India
Rayan Shaikh Student, Department of Computer Engineering, Yadavrao Tasgaonkar Institute of Engineering and Technology, Bhivpuri Road Karjat, Maharashtra, 410201 India

DOI:

https://doi.org/10.29121/ijetmr.v13.i4.2026.1768

Keywords:

Yoga Pose Recognition, Machine Learning, Retrieval-Augmented Generation, Computer Vision, Human Pose Estimation, Digital Health, Ai-Based Fitness Systems

Abstract

Yoga pose recognition has gained significant importance in digital health and fitness systems, where accurate posture assessment and corrective feedback are critical for safe practice. Traditional computer vision–based approaches rely on pose estimation models but often lack contextual understanding and personalized guidance. To address this limitation, this paper proposes a hybrid framework that integrates Machine Learning (ML)–based pose recognition with Retrieval-Augmented Generation (RAG) for intelligent feedback generation. The system utilizes human pose estimation techniques to extract skeletal keypoints and classify yoga poses using supervised learning models. Subsequently, a RAG module retrieves relevant expert knowledge from a curated yoga knowledge base and generates context-aware corrective suggestions. This dual-layer architecture ensures both high recognition accuracy and meaningful interpretability of results. The proposed approach aims to bridge the gap between static classification systems and interactive AI-driven coaching by enabling real-time feedback and adaptive recommendations. The framework is designed as a conceptual model with potential applicability in mobile health applications, smart fitness systems, and remote yoga training platforms. By combining data-driven learning with knowledge retrieval mechanisms, the system enhances both usability and reliability in real-world scenarios.

Downloads

Download data is not yet available.

References

Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32. DOI: https://doi.org/10.1023/A:1010933404324

Cao, Z., Hidalgo, G., Simon, T., Wei, S., and Sheikh, Y. (2021). OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(1), 172–186. DOI: https://doi.org/10.1109/TPAMI.2019.2929257

Carreira, J., and Zisserman, A. (2017). Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (6299–6308). DOI: https://doi.org/10.1109/CVPR.2017.502

Chen, T., and Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (785–794). DOI: https://doi.org/10.1145/2939672.2939785

Cortes, C., and Vapnik, V. (1995). Support-Vector Networks. Machine Learning, 20(3), 273–297. DOI: https://doi.org/10.1023/A:1022627411411

Devlin, J., Chang, M.‑W., Lee, K., and Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (4171–4186). DOI: https://doi.org/10.18653/v1/N19-1423

Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., and Zisserman, A. (2010). The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338. DOI: https://doi.org/10.1007/s11263-009-0275-4

Girshick, R. (2015). Fast R‑CNN. In Proceedings of the IEEE International Conference on Computer Vision (1440–1448). DOI: https://doi.org/10.1109/ICCV.2015.169

Goodfellow, I., Pouget‑Abadie, J., Mirza, M., Xu, B., Warde‑Farley, D., Ozair, S., Courville, A., Bengio, Y. (2014). Generative Adversarial nets. In Advances in Neural Information Processing Systems (2672–2680).

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (770–778). DOI: https://doi.org/10.1109/CVPR.2016.90

He, P., Liu, W., Gao, J., and Chen, W. (2020). DeBERTa: Decoding-enhanced BERT with Disentangled Attention. arXiv preprint arXiv:2006.03654.

Hochreiter, S., and Schmidhuber, J. (1997). Long Short‑Term Memory. Neural Computation, 9(8), 1735–1780. DOI: https://doi.org/10.1162/neco.1997.9.8.1735

Kingma, D. P., and Ba, J. (2015). Adam: A Method for Stochastic Optimization. In Proceedings of the International Conference on Learning Representations.

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). ImageNet classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems (1097–1105).

LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature, 521, 436–444. DOI: https://doi.org/10.1038/nature14539

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.‑T., Rocktäschel, T., Riedel, S., and Kiela, D. (2020). Retrieval‑Augmented Generation for Knowledge‑Intensive NLP Tasks. In Advances in Neural Information Processing Systems.

Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. Arxiv Preprint arXiv:1301.3781.

Pavlakos, G., Zhou, X., Derpanis, K. G., and Daniilidis, K. (2017). Coarse‑To‑Fine Volumetric Prediction for Single‑Image 3d Human Pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (7025–7034). DOI: https://doi.org/10.1109/CVPR.2017.139

Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016). You Only Look once: Unified, Real‑Time Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (779–788). DOI: https://doi.org/10.1109/CVPR.2016.91

Ren, S., He, K., Girshick, R., and Sun, J. (2017). Faster R‑CNN: Towards Real‑Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137–1149. DOI: https://doi.org/10.1109/TPAMI.2016.2577031

Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., and Blake, A. (2011). Real‑Time Human Pose Recognition in Parts from Single Depth Images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (1297–1304). DOI: https://doi.org/10.1109/CVPR.2011.5995316

Simonyan, K., and Zisserman, A. (2014). Very Deep Convolutional Networks for Large‑Scale Image Recognition. arXiv preprint arXiv:1409.1556.

Toshev, A., and Szegedy, C. (2014). DeepPose: Human Pose Estimation via deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (1653–1660). DOI: https://doi.org/10.1109/CVPR.2014.214

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (5998–6008).

Zhang, F., Bazarevsky, V., Vakunov, A., Sung, G., Chang, C.‑L., and Grundmann, M. (2019). MediaPipe: A Framework for Building Perception pipelines. arXiv preprint arXiv:1906.08172.