MUSIC SENTIMENT ANALYTICS: UNDERSTANDING AUDIENCE REACTIONS USING MULTI-MODAL DATA FROM STREAMING PLATFORMS

Atanu Dutta; Mahesh Kurulekar; Ramneek Kelsang Bawa; Sharyu Ikhar; Rajendra V. Patil; Anureet Kaur

doi:10.29121/shodhkosh.v6.i4s.2025.6947

Authors

Atanu Dutta Assistant Professor, School of Music, AAFT University of Media and Arts, Raipur, Chhattisgarh-492001, India
Mahesh Kurulekar Assistant Professor, Department of Civil Engineering, Vishwakarma Institute of Technology, Pune, Maharashtra-411037, India
Ramneek Kelsang Bawa Associate Professor, School of Business Management, Noida International University, Noida, Uttar Pradesh, India
Dr. Sharyu Ikhar Chief Operating Officer, Researcher Connect Innovations and Impact Private Limited, India
Rajendra V. Patil Assistant Professor, Department of Computer Engineering, SSVPS Bapusaheb Shivajirao Deore College of Engineering, Dhule (M.S.), India
Anureet Kaur Department of Computer Applications, CT University, Ludhiana, Punjab, India

DOI:

https://doi.org/10.29121/shodhkosh.v6.i4s.2025.6947

Keywords:

Music Sentiment Analysis, Multi-Modal Data, Audience Reactions, Streaming Platforms, Machine Learning, Natural Language Processing

Abstract [English]

Streaming services are adding more and more user-generated content, which provides us valuable insight on how people feel and how involved they are. Knowing how people feel and react to music may help to make music selection systems, marketing strategies, and outreach efforts to musicians much more efficient. This paper investigates how multi-modal information could be utilised for temper evaluation in song by means of textual, audio, and visual facts from streaming services. This paper indicates a whole technique for mood evaluation. It does this by way of integrating tune word data, audio alerts which includes velocity, rhythm, and pitch with visual cloth such as album cowl and music videos. The machine analyses songs the use of sophisticated device mastering techniques like natural language processing (NLP), audio sign processing to extract musical characteristics, and pc imaginative and prescient fashions to decide how people experience approximately what they see. Combining those many varieties of information enables we recognize more approximately how diverse items have an impact on emotional responses and makes mood categorisation algorithms more consistent. Performance metrics like as memory, accuracy, precision, and F1-rating are in comparison throughout several models to look how well multi-modal techniques carry out in comparison to unmarried-modal research. The findings imply that combining textual, spoken, and visible statistics produces better results than relying solely on conventional sentiment evaluation fashions, subsequently enabling more precise and thorough temper forecasts. This research illustrates how sophisticated mood analytics might not only improve listening but also support marketing decisions and artist strategies in the competitive music sector.

References

Ahmed, N., Al Aghbari, Z., and Girija, S. (2023). A Systematic Survey on Multimodal Emotion Recognition Using Learning Algorithms. Intelligent Systems with Applications, 17, 200171. https://doi.org/10.1016/j.iswa.2022.200171 DOI: https://doi.org/10.1016/j.iswa.2022.200171

Arabian, H., Alshirbaji, T. A., Chase, J. G., and Moeller, K. (2024). Emotion Recognition Beyond Pixels: Leveraging Facial Point Landmark Meshes. Applied Sciences, 14(8), 3358. https://doi.org/10.3390/app14083358 DOI: https://doi.org/10.3390/app14083358

Boughanem, H., Ghazouani, H., and Barhoumi, W. (2023). Multichannel Convolutional Neural Network for Human Emotion Recognition from In-The-Wild Facial Expressions. The Visual Computer, 39, 5693–5718. https://doi.org/10.1007/s00371-022-02690-0 DOI: https://doi.org/10.1007/s00371-022-02690-0

Caroppo, A., Leone, A., and Siciliano, P. (2020). Comparison Between Deep Learning Models and Traditional Machine Learning Approaches for Facial Expression Recognition in Ageing Adults. Journal of Computer Science and Technology, 35(6), 1127–1146. https://doi.org/10.1007/s11390-020-9665-4 DOI: https://doi.org/10.1007/s11390-020-9665-4

Ezzameli, K., and Mahersia, H. (2023). Emotion Recognition from Unimodal to Multimodal Analysis: A Review. Information Fusion, 99, 101847. https://doi.org/10.1016/j.inffus.2023.101847 DOI: https://doi.org/10.1016/j.inffus.2023.101847

Ghorbanali, A., and Sohrabi, M. K. (2023). A Comprehensive Survey on Deep Learning-Based Approaches for Multimodal Sentiment Analysis. Artificial Intelligence Review, 56, 1479–1512. https://doi.org/10.1007/s10462-023-10555-8 DOI: https://doi.org/10.1007/s10462-023-10555-8

Gladys, A. A., and Vetriselvi, V. (2023). Survey on Multimodal Approaches to Emotion Recognition. Neurocomputing, 556, 126693. https://doi.org/10.1016/j.neucom.2023.126693 DOI: https://doi.org/10.1016/j.neucom.2023.126693

Hazmoune, S., and Bougamouza, F. (2024). Using Transformers for Multimodal Emotion Recognition: Taxonomies and State-Of-The-Art Review. Engineering Applications of Artificial Intelligence, 133, 108339. https://doi.org/10.1016/j.engappai.2024.108339 DOI: https://doi.org/10.1016/j.engappai.2024.108339

Khanbebin, S. N., and Mehrdad, V. (2023). Improved Convolutional Neural Network-Based Approach Using Hand-Crafted Features for Facial Expression Recognition. Multimedia Tools and Applications, 82, 11489–11505. https://doi.org/10.1007/s11042-022-14122-1 DOI: https://doi.org/10.1007/s11042-022-14122-1

Liu, H., Lou, T., Zhang, Y., Wu, Y., Xiao, Y., Jensen, C. S., and Zhang, D. (2024). Eeg-Based Multimodal Emotion Recognition: A Machine Learning Perspective. IEEE Transactions on Instrumentation and Measurement, 73, Article 4003729. https://doi.org/10.1109/TIM.2024.3369130 DOI: https://doi.org/10.1109/TIM.2024.3369130

Madan, B. S., Zade, N. J., Lanke, N. P., Pathan, S. S., Ajani, S. N., and Khobragade, P. (2024). Self-Supervised Transformer Networks: Unlocking New Possibilities for Label-Free Data. Panamerican Mathematical Journal, 34(4), 194–210. https://doi.org/10.52783/pmj.v34.i4.1878 DOI: https://doi.org/10.52783/pmj.v34.i4.1878

Morais, E., Hoory, R., Zhu, W., Gat, I., Damasceno, M., and Aronowitz, H. (2022). Speech Emotion Recognition Using Self-Supervised Features (arXiv:2202.03896). arXiv. DOI: https://doi.org/10.1109/ICASSP43922.2022.9747870

Pan, B., Hirota, K., Jia, Z., and Dai, Y. (2023). A Review of Multimodal Emotion Recognition from Datasets, Preprocessing, Features, and Fusion Methods. Neurocomputing, 561, 126866. https://doi.org/10.1016/j.neucom.2023.126866 DOI: https://doi.org/10.1016/j.neucom.2023.126866

Singh, U., Abhishek, K., and Azad, H. K. (2024). A Survey of Cutting-Edge Multimodal Sentiment Analysis. ACM Computing Surveys, 56(1), 1–38. https://doi.org/10.1145/3652149 DOI: https://doi.org/10.1145/3652149

Zhang, S., Yang, Y., Chen, C., Zhang, X., Leng, Q., and Zhao, X. (2023). Deep Learning-Based Multimodal Emotion Recognition from Audio, Visual, and Text Modalities: A Systematic Review of Recent Advancements and Future Prospects. Expert Systems with Applications, 237, 121692. https://doi.org/10.1016/j.eswa.2023.121692 DOI: https://doi.org/10.1016/j.eswa.2023.121692