EMOTION RECOGNITION IN AI-GENERATED MUSIC

Deepak  Prasad; Prateek  Garg; Tanveer Ahmad  Wani; Sangeet Saroha; Varalakshmi  S; Simran Kalra; Avinash  Somatkar

doi:10.29121/shodhkosh.v6.i3s.2025.6763

Authors

Deepak Prasad Assistant Professor, Department of Journalism and Mass Communication, Vivekananda Global University, Jaipur, India
Prateek Garg Chitkara Centre for Research and Development, Chitkara University, Himachal Pradesh, Solan, 174103, India
Tanveer Ahmad Wani Professor, School of Sciences, Noida International University203201, Greater Noida, Uttar Pradesh, India
Sangeet Saroha Lloyd Law College, Greater Noida, Uttar Pradesh 201306, India
Dr. Varalakshmi S Associate Professor, Department of Management Studies, JAIN (Deemed-to-be University), Bengaluru, Karnataka, India
Simran Kalra Centre of Research Impact and Outcome, Chitkara University, Rajpura- 140417, Punjab, India
Avinash Somatkar Department of Mechanical Engineering, Vishwakarma Institute of Technology, Pune, Maharashtra, 411037 India

DOI:

https://doi.org/10.29121/shodhkosh.v6.i3s.2025.6763

Keywords:

Emotion Recognition, AI-Generated Music, Affective Computing, Music Cognition, Valence–Arousal, Human–AI Interaction, Music Therapy

Abstract [English]

The paper under investigation explores the problem of emotion recognition in AI-generated music by applying the concepts of cognitive psychology, affective neuroscience, and computational creativity. It presents a valence-arousal modeling, deep neural architecture and listener-based feedback loop approach to the emotional interpretation, generation, and critical analysis of emotional signals in music through the use of AI. The system uses hybrid CNN -RNN and Transformer models to derive the temporal-spectral features and project them onto the affective dimensions including valence, arousal, and tension. Managerial-level evaluation A multi-stage test that integrates perceptual ratings, physiological sensing and behavioral analysis proves the emotional fidelity of the system, with the system able to yield high classification accuracy of 92.4% and a r correlation coefficient of r = 0.89 between predicted and perceived emotional levels in humans. Findings are valid that although AI is efficient in reproducing the syntax of emotion, its perception is representational and not experiential. Ethical issues related to authenticity, authorship and privacy are considered with a focus on transparent and culture-inclusive model design. Altogether, the study contributes to computational affective musicology which places AI as a creative partner that recreates and enhances emotional expression, but not eliminates it.

References

Albornoz, E. M., Sánchez-Gutiérrez, M., Martinez-Licona, F., Rufiner, H. L., and Goddard, J. (2014). Spoken Emotion Recognition Using Deep Learning. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications (104–111). Springer. https://doi.org/10.1007/978-3-319-12568-8_13 DOI: https://doi.org/10.1007/978-3-319-12568-8_13

Chen, N., and Wang, S. (2017). High-Level Music Descriptor Extraction Algorithm Based on Combination of Multi-Channel CNNs and LSTM. In Proceedings of the 18th International Society for Music Information Retrieval Conference (509–514).

Eerola, T., and Vuoskoski, J. K. (2011). A Comparison of the Discrete and Dimensional Models of Emotion in Music. Psychology of Music, 39(1), 18–49. https://doi.org/10.1177/0305735610362821 DOI: https://doi.org/10.1177/0305735610362821

Florence, M., and Uma, M. (2020). Emotional Detection and Music Recommendation System Based on User Facial Expression. IOP Conference Series: Materials Science and Engineering, 912, Article 062007. https://doi.org/10.1088/1757-899X/912/6/062007 DOI: https://doi.org/10.1088/1757-899X/912/6/062007

Gómez-Cañón, J. S., Cano, E., Herrera, P., and Gómez, E. (2021). Transfer Learning from Speech to Music: Towards Language-Sensitive Emotion Recognition Models. In Proceedings of the 28th European Signal Processing Conference (136–140). https://doi.org/10.23919/Eusipco47968.2020.9287548 DOI: https://doi.org/10.23919/Eusipco47968.2020.9287548

Han, B. J., Rho, S., Dannenberg, R. B., and Hwang, E. (2009). SMERS: Music Emotion Recognition Using Support Vector Regression. In Proceedings of the 10th International Society for Music Information Retrieval Conference (651–656).

Hizlisoy, S., Yildirim, S., and Tufekci, Z. (2021). Music Emotion Recognition Using Convolutional Long Short-Term Memory Deep Neural Networks. Engineering Science and Technology, an International Journal, 24, 760–767. https://doi.org/10.1016/j.jestch.2020.10.009 DOI: https://doi.org/10.1016/j.jestch.2020.10.009

Kılıç, B., and Aydın, S. (2022). Classification of Contrasting Discrete Emotional States Indicated by Eeg-Based Graph Theoretical Network Measures. Neuroinformatics, 20, 863–877. https://doi.org/10.1007/s12021-022-09579-2 DOI: https://doi.org/10.1007/s12021-022-09579-2

Koh, E., and Dubnov, S. (2021). Comparison and Analysis of Deep Audio Embeddings for Music Emotion Recognition. arXiv. https://arxiv.org/abs/2104.06517

Kuppens, P., Tuerlinckx, F., Yik, M., Koval, P., Coosemans, J., Zeng, K. J., and Russell, J. A. (2017). The Relation Between Valence and Arousal in Subjective Experience Varies with Personality and Culture. Journal of Personality, 85, 530–542. https://doi.org/10.1111/jopy.12258 DOI: https://doi.org/10.1111/jopy.12258

McFee, B., Raffel, C., Liang, D., Ellis, D. P. W., McVicar, M., Battenberg, E., and Nieto, O. (2015). librosa: Audio and Music Signal Analysis in Python. In Proceedings of the 14th Python in Science Conference (18–25). https://doi.org/10.25080/Majora-7b98e3ed-003 DOI: https://doi.org/10.25080/Majora-7b98e3ed-003

Miller, J. D. (2017). Statistics for data science: Leverage the Power of Statistics for Data Analysis, Classification, Regression, Machine Learning, and Neural Networks. Packt Publishing.

Olson, D., Russell, C. S., and Sprenkle, D. H. (2014). Circumplex model: Systemic Assessment and Treatment of Families. Routledge. https://doi.org/10.4324/9781315804132 DOI: https://doi.org/10.4324/9781315804132

Park, J., Lee, J., Park, J., Ha, J.-W., and Nam, J. (2018). Representation Learning of Music Using Artist Labels. In Proceedings of the 19th International Society for Music Information Retrieval Conference (717–724).

Pons, J., Nieto, O., Prockup, M., Schmidt, E. M., Ehmann, A. F., and Serra, X. (2018). End-To-End Learning for Music Audio Tagging at Scale. In Proceedings of the 19th International Society for Music Information Retrieval Conference (637–644).

Rachman, F. H., Sarno, R., and Fatichah, C. (2020). Music Emotion Detection Using Weighted Audio and Lyric Features. In Proceedings of the 6th Information Technology International Seminar (229–233). https://doi.org/10.1109/ITIS50118.2020.9321046 DOI: https://doi.org/10.1109/ITIS50118.2020.9321046

Saari, P., Eerola, T., and Lartillot, O. (2011). Generalizability and Simplicity as Criteria in Feature Selection: Application to Mood Classification in Music. IEEE Transactions on Audio, Speech, and Language Processing, 19(6), 1802–1812. https://doi.org/10.1109/TASL.2010.2101596 DOI: https://doi.org/10.1109/TASL.2010.2101596

Sangnark, S., Lertwatechakul, M., and Benjangkaprasert, C. (2018). Thai Music Emotion Recognition by Linear Regression. In Proceedings of the 2nd International Conference on Automation, Control and Robots (62–66). https://doi.org/10.1145/3293688.3293696 DOI: https://doi.org/10.1145/3293688.3293696

Satayarak, N., and Benjangkaprasert, C. (2022). On the Study of Thai Music Emotion Recognition Based on Western Music Model. Journal of Physics: Conference Series, 2261(1), Article 012018. https://doi.org/10.1088/1742-6596/2261/1/012018 DOI: https://doi.org/10.1088/1742-6596/2261/1/012018

Sharma, A. K., Purohit, N., Joshi, S., Lakkewar, I. U., and Khobragade, P. (2026). Securing IoT Environments: Deep Learning-Based Intrusion Detection. In F. S. Masoodi and A. Bamhdi (Eds.), Deep Learning for Intrusion Detection. Wiley. https://doi.org/10.1002/9781394285198.ch12 DOI: https://doi.org/10.1002/9781394285198.ch12

Yang, Y. H., Lin, Y. C., Su, Y. F., and Chen, H. H. (2008). A Regression Approach to Music Emotion Recognition. IEEE Transactions on Audio, Speech, and Language Processing, 16(2), 448–457. https://doi.org/10.1109/TASL.2007.911513 DOI: https://doi.org/10.1109/TASL.2007.911513

EMOTION RECOGNITION IN AI-GENERATED MUSIC

Authors

DOI:

Keywords:

Abstract [English]

References

Downloads

Published

How to Cite

Issue

Section

License

Custom-Block-Full

Current Issue