EMOTION RECOGNITION IN AI-GENERATED MUSIC
DOI:
https://doi.org/10.29121/shodhkosh.v6.i3s.2025.6763Keywords:
Emotion Recognition, AI-Generated Music, Affective Computing, Music Cognition, Valence–Arousal, Human–AI Interaction, Music TherapyAbstract [English]
The paper under investigation explores the problem of emotion recognition in AI-generated music by applying the concepts of cognitive psychology, affective neuroscience, and computational creativity. It presents a valence-arousal modeling, deep neural architecture and listener-based feedback loop approach to the emotional interpretation, generation, and critical analysis of emotional signals in music through the use of AI. The system uses hybrid CNN -RNN and Transformer models to derive the temporal-spectral features and project them onto the affective dimensions including valence, arousal, and tension. Managerial-level evaluation A multi-stage test that integrates perceptual ratings, physiological sensing and behavioral analysis proves the emotional fidelity of the system, with the system able to yield high classification accuracy of 92.4% and a r correlation coefficient of r = 0.89 between predicted and perceived emotional levels in humans. Findings are valid that although AI is efficient in reproducing the syntax of emotion, its perception is representational and not experiential. Ethical issues related to authenticity, authorship and privacy are considered with a focus on transparent and culture-inclusive model design. Altogether, the study contributes to computational affective musicology which places AI as a creative partner that recreates and enhances emotional expression, but not eliminates it.
References
Albornoz, E. M., Sánchez-Gutiérrez, M., Martinez-Licona, F., Rufiner, H. L., and Goddard, J. (2014). Spoken Emotion Recognition Using Deep Learning. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications (104–111). Springer. https://doi.org/10.1007/978-3-319-12568-8_13 DOI: https://doi.org/10.1007/978-3-319-12568-8_13
Chen, N., and Wang, S. (2017). High-Level Music Descriptor Extraction Algorithm Based on Combination of Multi-Channel CNNs and LSTM. In Proceedings of the 18th International Society for Music Information Retrieval Conference (509–514).
Eerola, T., and Vuoskoski, J. K. (2011). A Comparison of the Discrete and Dimensional Models of Emotion in Music. Psychology of Music, 39(1), 18–49. https://doi.org/10.1177/0305735610362821 DOI: https://doi.org/10.1177/0305735610362821
Florence, M., and Uma, M. (2020). Emotional Detection and Music Recommendation System Based on User Facial Expression. IOP Conference Series: Materials Science and Engineering, 912, Article 062007. https://doi.org/10.1088/1757-899X/912/6/062007 DOI: https://doi.org/10.1088/1757-899X/912/6/062007
Gómez-Cañón, J. S., Cano, E., Herrera, P., and Gómez, E. (2021). Transfer Learning from Speech to Music: Towards Language-Sensitive Emotion Recognition Models. In Proceedings of the 28th European Signal Processing Conference (136–140). https://doi.org/10.23919/Eusipco47968.2020.9287548 DOI: https://doi.org/10.23919/Eusipco47968.2020.9287548
Han, B. J., Rho, S., Dannenberg, R. B., and Hwang, E. (2009). SMERS: Music Emotion Recognition Using Support Vector Regression. In Proceedings of the 10th International Society for Music Information Retrieval Conference (651–656).
Hizlisoy, S., Yildirim, S., and Tufekci, Z. (2021). Music Emotion Recognition Using Convolutional Long Short-Term Memory Deep Neural Networks. Engineering Science and Technology, an International Journal, 24, 760–767. https://doi.org/10.1016/j.jestch.2020.10.009 DOI: https://doi.org/10.1016/j.jestch.2020.10.009
Kılıç, B., and Aydın, S. (2022). Classification of Contrasting Discrete Emotional States Indicated by Eeg-Based Graph Theoretical Network Measures. Neuroinformatics, 20, 863–877. https://doi.org/10.1007/s12021-022-09579-2 DOI: https://doi.org/10.1007/s12021-022-09579-2
Koh, E., and Dubnov, S. (2021). Comparison and Analysis of Deep Audio Embeddings for Music Emotion Recognition. arXiv. https://arxiv.org/abs/2104.06517
Kuppens, P., Tuerlinckx, F., Yik, M., Koval, P., Coosemans, J., Zeng, K. J., and Russell, J. A. (2017). The Relation Between Valence and Arousal in Subjective Experience Varies with Personality and Culture. Journal of Personality, 85, 530–542. https://doi.org/10.1111/jopy.12258 DOI: https://doi.org/10.1111/jopy.12258
McFee, B., Raffel, C., Liang, D., Ellis, D. P. W., McVicar, M., Battenberg, E., and Nieto, O. (2015). librosa: Audio and Music Signal Analysis in Python. In Proceedings of the 14th Python in Science Conference (18–25). https://doi.org/10.25080/Majora-7b98e3ed-003 DOI: https://doi.org/10.25080/Majora-7b98e3ed-003
Miller, J. D. (2017). Statistics for data science: Leverage the Power of Statistics for Data Analysis, Classification, Regression, Machine Learning, and Neural Networks. Packt Publishing.
Olson, D., Russell, C. S., and Sprenkle, D. H. (2014). Circumplex model: Systemic Assessment and Treatment of Families. Routledge. https://doi.org/10.4324/9781315804132 DOI: https://doi.org/10.4324/9781315804132
Park, J., Lee, J., Park, J., Ha, J.-W., and Nam, J. (2018). Representation Learning of Music Using Artist Labels. In Proceedings of the 19th International Society for Music Information Retrieval Conference (717–724).
Pons, J., Nieto, O., Prockup, M., Schmidt, E. M., Ehmann, A. F., and Serra, X. (2018). End-To-End Learning for Music Audio Tagging at Scale. In Proceedings of the 19th International Society for Music Information Retrieval Conference (637–644).
Rachman, F. H., Sarno, R., and Fatichah, C. (2020). Music Emotion Detection Using Weighted Audio and Lyric Features. In Proceedings of the 6th Information Technology International Seminar (229–233). https://doi.org/10.1109/ITIS50118.2020.9321046 DOI: https://doi.org/10.1109/ITIS50118.2020.9321046
Saari, P., Eerola, T., and Lartillot, O. (2011). Generalizability and Simplicity as Criteria in Feature Selection: Application to Mood Classification in Music. IEEE Transactions on Audio, Speech, and Language Processing, 19(6), 1802–1812. https://doi.org/10.1109/TASL.2010.2101596 DOI: https://doi.org/10.1109/TASL.2010.2101596
Sangnark, S., Lertwatechakul, M., and Benjangkaprasert, C. (2018). Thai Music Emotion Recognition by Linear Regression. In Proceedings of the 2nd International Conference on Automation, Control and Robots (62–66). https://doi.org/10.1145/3293688.3293696 DOI: https://doi.org/10.1145/3293688.3293696
Satayarak, N., and Benjangkaprasert, C. (2022). On the Study of Thai Music Emotion Recognition Based on Western Music Model. Journal of Physics: Conference Series, 2261(1), Article 012018. https://doi.org/10.1088/1742-6596/2261/1/012018 DOI: https://doi.org/10.1088/1742-6596/2261/1/012018
Sharma, A. K., Purohit, N., Joshi, S., Lakkewar, I. U., and Khobragade, P. (2026). Securing IoT Environments: Deep Learning-Based Intrusion Detection. In F. S. Masoodi and A. Bamhdi (Eds.), Deep Learning for Intrusion Detection. Wiley. https://doi.org/10.1002/9781394285198.ch12 DOI: https://doi.org/10.1002/9781394285198.ch12
Yang, Y. H., Lin, Y. C., Su, Y. F., and Chen, H. H. (2008). A Regression Approach to Music Emotion Recognition. IEEE Transactions on Audio, Speech, and Language Processing, 16(2), 448–457. https://doi.org/10.1109/TASL.2007.911513 DOI: https://doi.org/10.1109/TASL.2007.911513
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Deepak Prasad, Prateek Garg, Tanveer Ahmad Wani, Sangeet Saroha, Dr. Varalakshmi S, Simran Kalra, Avinash Somatkar

This work is licensed under a Creative Commons Attribution 4.0 International License.
With the licence CC-BY, authors retain the copyright, allowing anyone to download, reuse, re-print, modify, distribute, and/or copy their contribution. The work must be properly attributed to its author.
It is not necessary to ask for further permission from the author or journal board.
This journal provides immediate open access to its content on the principle that making research freely available to the public supports a greater global exchange of knowledge.























