SOUND EMOTION MAPPING USING DEEP LEARNING
DOI:
https://doi.org/10.29121/shodhkosh.v6.i1s.2025.6625Keywords:
Affective Computing, Emotion Recognition, CNN–LSTM, Audio Processing, Deep Learning, Speech Analysis, Temporal Modeling, Log-Mel FeaturesAbstract [English]
Emotion recognition from sound is an important area of affective computing where machines can use vocal cues as an indication of human emotions for empathic and adaptive interactions. Traditional methods based on handcrafted acoustic features like MFCCs and LPC are restricted in terms of nonlinear and context-dependent emotional dynamics and mostly suffer from speaker and recording condition variations. To solve these issues, in this study the deep learning-based sound emotion mapping framework based on the combination of Convolutional Neural Networks (CNNs) for the spatial feature extraction and Long Short-Term Memory (LSTMs) for the temporal modeling has been proposed. CNN layers detect spectrogram patterns of the log-mel spectrograms and prosodic cues whereas LSTMs detect the transitions in emotions sequentially resulting in a powerful end-to-end system which does not require manual feature design. Using RAVDESS and Berlin EMO-DB testing datasets, the proposed CNN-LSTM model obtained an accuracy of 93.2% and 91.4% respectively over performing SVM and CNN-only baselines. Attention-weight visualization showed that the model attention is concentrated on the mid-frequency region, which is in line with the psychoacoustic theories of emotional prosody.
References
Al-Talabani, A. A., and Al-Jubouri, M. A. (2021). Emotion Recognition from Speech Signals Using Machine Learning Techniques: A Review. Biomedical Signal Processing and Control, 69, Article 102936.
Balaji, A., Balanjali, D., Subbaiah, G., Reddy, A. A., and Karthik, D. (2025). Federated Deep Learning for Robust Multi-Modal Biometric Authentication Based on Facial and Eye-Blink Cues. International Journal of Advanced Computer Engineering and Communication Technology, 14(1), 17–24. DOI: https://doi.org/10.65521/ijacect.v14i1.167
Chaturvedi, I., Noel, T., and Satapathy, R. (2022). Speech Emotion Recognition Using Audio Matching. Electronics, 11(23), Article 3943. https://doi.org/10.3390/electronics11233943 DOI: https://doi.org/10.3390/electronics11233943
Chintalapudi, K. S., Patan, I. A. K., Sontineni, H. V., Muvvala, V. S. K., Gangashetty, S. V., and Dubey, A. K. (2023). Speech Emotion Recognition Using Deep Learning. In Proceedings of the 2023 International Conference on Computer Communication and Informatics (ICCCI) (pp. 1–5). IEEE. https://doi.org/10.1109/ICCCI56745.2023.10128612 DOI: https://doi.org/10.1109/ICCCI56745.2023.10128612
Cummins, N., Sethu, V., Kundu, S., and McKeown, G. (2013). The PASCAL Affective Audio-Visual Database. In Proceedings of the 21st ACM International Conference on Multimedia (pp. 1025–1028).
De Silva, U., Madanian, S., Templeton, J. M., Poellabauer, C., Schneider, S., and Narayanan, A. (2023). Design Concept of a Mental Health Monitoring Application with Explainable Assessments [Conference paper]. In ACIS 2023 Proceedings (Paper 28). AIS Electronic Library.
Gupta, and Mishra, D. (2023). Sentimental Voice Recognition: An Approach to Analyse the Emotion by Voice. In Proceedings of the 2023 International Conference on Electrical, Electronics, Communication and Computers (ELEXCOM) (pp. 1–6). IEEE. https://doi.org/10.1109/ELEXCOM58812.2023.10370064 DOI: https://doi.org/10.1109/ELEXCOM58812.2023.10370064
Hook, J., Noroozi, F., Toygar, O., and Anbarjafari, G. (2019). Automatic Speech-Based Emotion Recognition Using Paralinguistic Features. Bulletin of the Polish Academy of Sciences: Technical Sciences, 67(3), 1–10. https://doi.org/10.24425/bpasts.2019.129647 DOI: https://doi.org/10.24425/bpasts.2019.129647
Kusal, S., Patil, S., Kotecha, K., Aluvalu, R., and Varadarajan, V. (2020). AI-Based Emotion Detection for Textual Big Data: Techniques and Contribution. Big Data and Cognitive Computing, 5(3), Article 43. https://doi.org/10.3390/bdcc5030043 DOI: https://doi.org/10.3390/bdcc5030043
Li, H., Zhang, X., and Wang, M.-J. (2021). Research on Speech Emotion Recognition Based on Deep Neural Network. In Proceedings of the 2021 IEEE 6th International Conference on Signal and Image Processing (ICSIP) (pp. 795–799). IEEE. https://doi.org/10.1109/ICSIP52628.2021.9689043 DOI: https://doi.org/10.1109/ICSIP52628.2021.9689043
Livingstone, S. R., and Russo, F. A. (2018). The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS). Zenodo. https://doi.org/10.5281/zenodo.1188976
Mittal, A., Arora, V., and Kaur, H. (2021). Speech Emotion Recognition Using HuBERT Features and Convolutional Neural Networks. In Proceedings of the 2021 6th International Conference on Computing, Communication and Security (ICCCS) (pp. 1–5). IEEE. https://doi.org/10.1109/ICCCS51487.2021.9776325 DOI: https://doi.org/10.1109/ICCCS51487.2021.9776325
Scherer, K. R. (2003). Vocal communication of emotion: A Review of Research Paradigms. Speech Communication, 40(1–2), 227–256. https://doi.org/10.1016/S0167-6393(02)00084-5 DOI: https://doi.org/10.1016/S0167-6393(02)00084-5
Trigeorgis, G., et al. (2016). Adieu Features? End-To-End Speech Emotion Recognition Using a Deep Convolutional Recurrent Network. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5200–5204). IEEE. https://doi.org/10.1109/ICASSP.2016.7472669 DOI: https://doi.org/10.1109/ICASSP.2016.7472669
Zhang, Y., Yang, Y., Li, Y., Li, W., and Zhao, J. (2021). Speech Emotion Recognition Based on HuBERT and Attention Mechanism. In Proceedings of the 2021 6th International Conference on Automation, Control and Robotics Engineering (CACRE) (pp. 277–280). IEEE.
Zhao, J., Mao, X., and Chen, L. (2019). Speech Emotion Recognition Using Deep 1D and 2D CNN LSTM networks. Biomedical Signal Processing and Control, 47, 312–323. https://doi.org/10.1016/j.bspc.2018.08.035 DOI: https://doi.org/10.1016/j.bspc.2018.08.035
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Dr. Sachin Vasant Chaudhari, Sadhana Sargam, Harsimrat Kandhari, Madhur Taneja, Mr. Sourav Panda, Dr. L.Sujihelen

This work is licensed under a Creative Commons Attribution 4.0 International License.
With the licence CC-BY, authors retain the copyright, allowing anyone to download, reuse, re-print, modify, distribute, and/or copy their contribution. The work must be properly attributed to its author.
It is not necessary to ask for further permission from the author or journal board.
This journal provides immediate open access to its content on the principle that making research freely available to the public supports a greater global exchange of knowledge.























