|
ShodhKosh: Journal of Visual and Performing ArtsISSN (Online): 2582-7472
Emotion Recognition in Contemporary Art Installations Swati Chaudhary 1 1 Assistant
Professor, School of Business Management, Noida International University, India
2 Assistant
Professor, Department of Design, Vivekananda Global University, Jaipur, India 3 Associate Professor, Department of Computer Applications, Siksha 'O' Anusandhan (Deemed to be University), Bhubaneswar, Odisha,
India 4 Department of Artificial Intelligence and Data science Vishwakarma
Institute of Technology, Pune, Maharashtra, 411037, India 5 Centre of Research Impact and Outcome, Chitkara University, Rajpura-
140417, Punjab, India 6 Associate Professor, Department of Management, ARKA JAIN University
Jamshedpur, Jharkhand, India
1. INTRODUCTION With the developing modern art perspective, emotion has re-introduced as a primary factor linking human experience with technology mediation. Now artists no longer have to be restricted to unimovable visual objects; they are more and more employing interactive systems, which are able to detect, process, and act in response to human feelings, to turn spectators into players in dynamic and immersive spaces. Artificial intelligence (AI), Affective computing, and cognitive science-based emotion recognition technologies allow art pieces to change dynamically in response to the affective moods of the audience. This combination of art and machine intelligence does not merely widen the expressive possibilities of installations, but also disrupts the traditional concept of authorship, perception and aesthetic experience. The study of emotion in art has philosophical and psychological foundations. Classical aesthetic theory put emotion as a reaction to beauty and sublimity whereas 20 th century modernism subjected emotion to the subjectivity of abstraction and conceptualism. However, in the 21 st century, convergence of neuroscience and computation has provided new methods of analysis and syntheses of emotional phenomena Wang et al. (2022). Other models like the six fundamental emotions described by Ekman, the wheel of emotion described by Plutchik and the dimensional valence-arousal model offer scientific frameworks through which one can understand affective states. Modern AI systems, which are able to decode human emotion on the basis of facial expression, vocal tone, gesture, and physiological indicators, are based on these models. These computational systems are used in art installations to interpret human affect into artistic response, to change light, sound, projection or spatial arrangement in response to the perceived emotional state. These emotion sensitive systems make the gallery a living organism which sees and responds thus passive observation becomes a two-way communication. The combination of human feelings with intelligent systems makes it possible to have emergent kinds of aesthetic experience where the reaction of each visitor brings about the creation of the artwork Ting et al. (2023). This individualization of the experience of art is an indication of paradigm shift to the participatory and empathetic art. Technologically, the emergence of deep learning, the multimodal signal processing, and sensor fusion has increased the speed of emotion recognition. Convolutional Neural Networks (CNNs) are universally regarded as the face and visual recognition, whereas Long Short-Term Memory (LSTM) networks are able to capture the temporal features of speech, gesture, and physiological-related changes like heart rate or EEG activity. These models are used in real-time in art installations as they process data, which form feedback loops where the system continuously reacts to human affective cues Song et al. (2020). Not only does it add to the feeling of immersion, but also provides novel information on the psychological and perceptual aspects of the audience engagement. 2. Background and Literature Review 2.1. Emotion theories and models (Ekman, Plutchik, dimensional models) Emotional studies have developed out of philosophical arguments to organized scientific models, which define the way emotional conditions can be shaped, quantified, and understood in psychology and artificial intelligence. The six fundamental emotions, including happiness, sadness, anger, fear, disgust, and surprise, proposed by Paul Ekman formed the basis of the universal, facially expressive foundation that played the key role in computer-based vision-based emotion recognition. Robert Plutchik extended this taxonomy with his dozens-based taxonomy the wheel of emotions by adding eight basic emotions and their degrees of variations in intensity that were organized in a circle to demonstrate the oppositional relationships and the complex combinations Yang et al. (2021). These categorical methods were subsequently supplemented by dimensional models, including Russell in his circumplex model of affect, and charting the emotions in two or three continuous dimensions: valence (pleasantness), arousal (activation) and dominance (control). The dimensional perspective allows a more fluid description of the state of affect, including the details not included in discrete categories. The theories have the conceptual framework of transferring emotion to responsive artwork systems in the context of modern art installations Yang et al. (2021). They enable encoding of the affective parameters into visual, auditory, and spatial dynamics, which enable installations to tune behavior on the basis of emotional valence and degree of intensity. 2.2. Computational Emotion Recognition: Techniques and Datasets Computational emotion recognition is a psychological, signal processing, and machine learning method that can be used to determine affective states through multimodal data. The methods are generally based on the study of facial expressions, tone of voice, body language, and such physiological reactions as heartbeat, galvanic skin response, or EEG. Convolutional Neural Networks (CNNs) are popular in the analysis of both static and dynamic facial images, and the Long Short-Term Memory (LSTM) networks are applicable to analyze time variations in sequential data (speech or biosignals) Yang et al. (2018). Transformer-based architectures and multimodal fusion models, a combination of visual, auditory, and physiological cues, have been developed recently as part of an effort to make them more robust. The datasets like FER2013, AffectNet, DEAP, and SEED all offer standardized benchmarks of training and testing such models, in which emotional expressions are disseminated in various cultural and environmental settings. Normalization, feature (e.g. action units, Mel-frequency cepstral coefficients) and noise reduction can be considered preprocessing steps to provide consistency and reliability Xu et al. (2022). Emotion recognition models in art settings are made to work in real-time, allowing installations to respond dynamically to affective cues of viewers. Such systems transform the form of interaction among the audience in an ever-changing performance based on data, whereby the input in terms of emotion modulates the visual or auditory output. With the maturation of computational emotion recognition, there is deeper artistic exploration, namely, the synthesis of human psychology and machine perception as well as the creation of affectively intelligent environments that are responsive and accurate to human emotion Yang et al. (2023). 2.3. Evolution of Interactive and Immersive Art Installations Interactive art has kept up with technological development, starting with kinetic sculptures and sensor-controlled spaces, and currently being dominated by AI-based, multimodal installations. During the mid 20 th century, Nam June Paik, Myron Krueger, and Rafael Lozano-Hemmer were pioneers who started to experiment with audience participation as a part of the artistic expression. As the digital media emerged and computational interactivity grew, art installations became responsive ecosystems with the ability to adjust to real-time Rombach et al. (2022). The invention of computer vision, motion tracking, and biofeedback systems also turned viewers into an active participant as opposed to a passive one. Projection mapping, virtual reality (VR), and augmented reality (AR) technologies were also part of immersion to add to the experience dimension by enabling audiences to interact in digital-physical hybrid spaces. Modern trends combine the elements of artificial intelligence and emotion recognition, which allows installations to sense and comprehend the emotional conditions of their audience Ramesh et al. (2022). Table 1 provides a summary of datasets, methods, issues, and important contextual commentaries in brief. Such systems make use of a visual, auditory, and physiological input to control artistic output: to increase or decrease the intensity of light, or the design of sound or visual patterns to reflect or compare human emotion. Table 1
3. Conceptual Framework 3.1. Human–AI interaction in emotional expression The meeting point of human feeling and artificial intelligence in art installations changes the nature of creative authorship and the audience involvement. The human-AI interaction in expression of emotion is anchored on the notion that technology will not only detect but also imitate feelings, where machines will be able to have an empathetic conversation with humans. In emotive art, this connection can be seen in the form of a loop: the user records their emotions with the help of sensors, such as facial recognition and voice tone or physiological measurements, which are processed by artificial intelligence models that recognize the affective state Xu et al. (2023). The installation is responsive and undergoes dynamic visual, auditory or kinetic changes to create an individualized emotional experience. This two-way communication enables art to develop on the fly, both under the influence of human feeling and under the influence of the algorithm. Human factor brings in authenticity and spontaneity whereas AI brings precision, adaptability, and complexity. Figure 1 explains that human beings and AI engage in a collaborative interaction to decode emotion expression. The two are united so as to have a symbiotic system in which feeling is the mutual language of communication. Ideologically, this framework makes AI more of a computational mechanism rather than a collaborative maker of interpretative and reflective human affect. Figure 1 |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Table 2 Quantitative Performance of Emotion Recognition Models |
|||||
|
Model Type |
Input Modality |
Accuracy (%) |
Precision (%) |
Recall (%) |
F1-Score |
|
CNN (Visual) |
Facial Expression |
90.3 |
89.6 |
88.7 |
89.1 |
|
LSTM (Physiological) |
EEG, HRV |
87.9 |
86.5 |
85.8 |
86.1 |
|
CNN + LSTM (Hybrid) |
Visual + Auditory + EEG |
92.4 |
91.8 |
91.1 |
91.4 |
|
Transformer-Based |
Multimodal (Fusion) |
94.2 |
93.5 |
93.1 |
93.3 |
|
SVM Baseline |
Visual (Static) |
81.7 |
80.1 |
79.4 |
79.7 |
Table 2 provides a comparative study of emotion recognition models used in the context of modern art installations. The findings show that deep learning architectures significantly outperform the traditional machine learning models. The CNN model, which was trained on the data of facial expressions, had an accuracy of 90.3 percent, which was quite effective in detecting spatial emotional clues including micro-expression and at the eye-region dynamics. The differences between the performance of emotion-recognition models of various evaluation metrics are compared in Figure 3.
Figure 3

Figure 3 Comparative Performance of Emotion Recognition
Models Across Key Evaluation Metrics
LSTM networks, which paid attention to physiological measures, such as EEG and HRV, they made 87.9% accuracy, with a high sensitivity to internal affective pattern but a little lower robustness, associated with signal noise and variations between individuals. Figure 4 indicates that there is a progressive increase in the performance of the lowest-performing and the best-performing emotion models.
Figure 4

Figure 4 Visualization of Average Performance Gain from
Lowest to Best Emotion Model
The hybrid CNN + LSTM model gave the optimal trade off between accuracy and recall and real-time flexibility with an accuracy of 92.4%. This illustrates the robustness of multimodal fusion to represent the temporalization of emotions as well as the static ones. Transformer-based fusion models also surged accuracy to 94.2, which underscores the importance of using complex and cross-modal dependencies in a model with high accuracy.
7. Conclusion
The paper on Emotion Recognition in Contemporary Art Installations helps us understand the potential of using artificial intelligence to fill the gap between humans and art by creating real-time emotion-responsive systems. The installation makes human emotion dynamic experiences by combining multimodal sensing technologies, including EEG, facial recognition and vocal analysis with the help of advanced neural networks: CNNs and LSTMs. This blending remakes the viewer into a more active participant in the process rather than a passive viewer with every emotional reaction affecting the changing aesthetic story of the art piece. The findings confirm that emotional sensitive systems can successfully read out complicated affective stimuli and convert them into adaptive artistic responses. The significant expansion in the levels of engagement, empathy and self-reflection among the participants highlight the transformative nature of affective computing in the creative field. These emotive spaces in contrast to the traditional fixed installations are also learning and evolving and create a kind of two-way communication between human and machine. Emotion is thus the input, and result through this dialogue: a changing medium of artistic co-creation. As a concept, the study can be seen as a part of the further discussion on human-AI collaboration, with emotional intelligence as a crucial element of digital aesthetics. It provides a conceptual framework that can be scaled up to future artistic systems that will be sensitive and contextually aware in perceiving, interpreting and expressing emotion.
CONFLICT OF INTERESTS
None.
ACKNOWLEDGMENTS
None.
REFERENCES
Fan, S., Shen, Z., Jiang, M., Koenig, B. L., Kankanhalli, M. S., and Zhao, Q. (2022). Emotional Attention: From Eye Tracking to Computational Modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(2), 1682–1699. https://doi.org/10.1109/TPAMI.2022.3169234
Lu, J., Goswami, V., Rohrbach, M., Parikh, D., and Lee, S. (2020). 12-in-1: Multi-Task Vision and Language Representation Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) ( 10434–10443). https://doi.org/10.1109/CVPR42600.2020.01045
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., and Chen, M. (2022). Hierarchical Text-Conditional Image Generation with CLIP latents (arXiv:2204.06125).
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. (2022). High-Resolution Image Synthesis with Latent Diffusion Models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (10674–10685). https://doi.org/10.1109/CVPR52688.2022.01042
Sharma, P., Ding, N., Goodman, S., and Soricut, R. (2018). Conceptual Captions: A Cleaned, Hypernymed, Image Alt-Text Dataset for Automatic Image Captioning. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) (2556–2565). https://doi.org/10.18653/v1/P18-1238
Song, T., Zheng, W., Song, P., and Cui, Z. (2020). EEG Emotion Recognition Using Dynamical Graph Convolutional Neural Networks. IEEE Transactions on Affective Computing, 11(3), 532–541. https://doi.org/10.1109/TAFFC.2018.2817622
Ting, Z., Zipeng, Q., Weiwei, G., Cheng, Z., and Dingli, J. (2023). Research on the Measurement and Characteristics of Museum Visitors’ Emotions Under Digital Technology Environment. Frontiers in Human Neuroscience, 17, Article 1251241. https://doi.org/10.3389/fnhum.2023.1251241
Wang, Y., Song, W., Tao, W., Liotta, A., Yang, D., Li, X., Gao, S., Sun, Y., Ge, W., Zhang, W., et al. (2022). A Systematic Review on Affective Computing: Emotion Models, Databases, and Recent Advances. Information Fusion, 83–84, 19–52. https://doi.org/10.1016/j.inffus.2022.03.009
Xu, L., Huang, M. H., Shang, X., Yuan, Z., Sun, Y., and Liu, J. (2023). Meta Compositional Referring Expression Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (19478–19487). https://doi.org/10.1109/CVPR52729.2023.01866
Xu, L., Wang, Z., Wu, B., and Lui, S. S. Y. (2022). MDAN: Multi-Level Dependent Attention Network for Visual Emotion Analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (9469–9478). https://doi.org/10.1109/CVPR52688.2022.00926
Yang, J., Gao, X., Li, L., Wang, X., and Ding, J. (2021). SOLVER: Scene-Object Interrelated Visual Emotion Reasoning Network. IEEE Transactions on Image Processing, 30, 8686–8701. https://doi.org/10.1109/TIP.2021.3118983
Yang, J., Huang, Q., Ding, T., Lischinski, D., Cohen-Or, D., and Huang, H. (2023). EmoSet: A Large-Scale Visual Emotion Dataset with Rich Attributes. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (20326–20337). https://doi.org/10.1109/ICCV51070.2023.01864
Yang, J., Li, J., Wang, X., Ding, Y., and Gao, X. (2021). Stimuli-Aware Visual Emotion Analysis. IEEE Transactions on Image Processing, 30, 7432–7445. https://doi.org/10.1109/TIP.2021.3106813
Yang, J., She, D., Lai, Y.-K., Rosin, P. L., and Yang, M.-H. (2018). Weakly Supervised Coupled Networks for Visual Sentiment Analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (7584–7592). https://doi.org/10.1109/CVPR.2018.00791
Zhang, P., Li, X., Hu, X., Yang, J., Zhang, L., Wang, L., Choi, Y., and Gao, J. (2021). VinVL: Revisiting Visual Representations in Vision-Language Models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (5575–5584). https://doi.org/10.1109/CVPR46437.2021.00553
|
|
This work is licensed under a: Creative Commons Attribution 4.0 International License
© ShodhKosh 2024. All Rights Reserved.