|
ShodhKosh: Journal of Visual and Performing ArtsISSN (Online): 2582-7472
Machine Learning for Predicting Audience Preferences in Dance Faizan Anwar Khan 1 1 Assistant
Professor, Department of Computer Science and Engineering, Presidency
University, Bangalore, Karnataka, India 2 Associate
Professor, Centre for Internet of Things, Siksha ‘O’ Anusandhan (Deemed to be
University), Bhubaneswar, Odisha, India 3 Assistant
Professor, Department of Computer Science and Engineering, Faculty of
Engineering and Technology, Parul Institute of Engineering and Technology,
Parul University, Vadodara, Gujarat, India 4 Centre
of Research Impact and Outcome, Chitkara University, Rajpura–140417, Punjab,
India 5 Assistant
Professor, Department of Computer Science, Panimalar Engineering College,
Chennai, Tamil Nadu, India 6 Department
of DESH, Vishwakarma Institute of Technology, Pune, Maharashtra, 411037, India
1. Introduction Dance
is a highly expressive art as it has rhythm, movement, emotion, and cultural
identity. It crosses the linguistic boundaries, stirring psychological and
emotive reactions that are different among viewers based on the genre, tempo,
choreography, and personal perception. The past few years witnessed the
increasing overlap of performing arts and artificial intelligence (AI)
providing new possibilities to study and model audience behavior in
computational terms Ajili et al. (2019). The merger of
machine learning (ML) and dance performance analytics opens the possibilities
of quantifying subjective appreciation, determining the underlying emotional
motivation and making more accurate predictions concerning what the audience
will like. The convergence is indicative of an overall movement towards
data-based cultural analytics, in which creative experience is augmented with
technological meaning. The trends behind the use of machine learning in the
prediction of audience tastes are the growing digitalization of performing arts
and the accessibility of multimodal data Advani and Gokhale (2023). Motion
capture systems include some of the most advanced sensors, motion capture, and
online streaming platforms, which means that now huge volumes of data (both the
movement paths of dancers and the immediate response of the audience) may be
analyzed. Familiar methods of audience studies (focus groups or surveys) were
based on qualitative approaches that, despite their informative nature, were
not scalable and objective. This is unlike ML techniques which can be able to
handle various inputs like movement dynamics, musical qualities, visual signals
and indicators of audience engagement (e.g., applause intensity, gaze tracking,
heart rate variability) to reveal hidden elements of preference. This
quantitative layer is an extension of the artistic interpretation with the
establishment of the bridge between emotional resonance and algorithmic
knowing. The fundamental aim of the given research is to create and test a
machine learning model that will help predict the choice of the audience in
dance performances based on multimodal data sources. Combining the visual (pose
and choreography), auditory (music tempo, rhythm) and affective (emotion
recognition) elements, the system is bound to acquire correlations between the
elements of performance and the reactions of the audience Baía Reis et al. (2025). Several ML
models are examined in the proposed model which are Support Vector Machines
(SVM), Random Forests (RF), and deep neural networks like CNNs and LSTMs to
compare predictive accuracy and interpretability. Further, it examines how
cultural background and diversity in performance genre affects how the audience
perceives things hither providing a more inclusive perspective on the
interaction with art. What
makes this research important besides advancing the subject of computational
aesthetics is the fact that it would give artists, choreographers and cultural
institutions valuable information that can be put to action. Choreography
refinement, stage design optimization and customized audience experience can be
predicted using predictive models, in both live and digital space. In addition,
the research focuses on ethical application of AI, where artistic freedom and
cultural integrity will be the focal point of technological enhancement Braccini et al. (2025). In general,
this study helps to fill the new direction of AI-assisted performance studies,
making the use of machine learning as an assistant but not a substitute of
human creativity and cultural interpretation. 2. Related Work Machine
learning, affective computing and the performing arts converging have made it
possible to think about the perception and aesthetical experience of the
audience in new ways. Although audience engagement is not a new concept in
music, theater, or visual arts, the area of predictive analytics in dance
performance has not been explored extensively yet Feng et al. (2022). The initial
research was largely based on qualitative research techniques, which included
surveys and interviews, and the studies focused on the embodied and emotional
components of the audience reaction with the help of observational and
physiological approaches. Though these methods were informative, they were not
scalable and predictive. The current developments in machine learning and deep
learning have shown that it is possible to model aesthetic judgment using
data-driven methods that can capture temporal dynamics. The available schemata
however are mostly related to single modalities and personal reactions only,
and little concern is expressed regarding multimodal fusion, where movement,
music, and physiological feedback are combined. Besides, the issues of
interpretability, cultural variety and the ethical usage of data have not been
adequately tackled Hardy et al. (2023). This paper
addresses these gaps by presenting an elucidated and comprehensive multimodal
model of predicting collective audience preferences in dance with a focus on
the holistic engagement models and cultural inclusivity Kim et al. (2022). 3. Theoretical Framework Dance,
being a multimodal way of human expression conveys itself through movement,
rhythm, emotion, and space arrangement. In contrast to verbal or written
artworks, it conveys the meaning via embodied movement, where physical
gestures, rhythm, and coordination can trigger the emotions and cognition of
the spectator Lei et al. (2022). Theoretical
underpinnings of predicting preferences of the audience in dance are then found
on the crossroad of aesthetic psychology, affective computing and multimodal
perception theory. All these frameworks describe the way the audience perceives
dance performances and how the interpretations could be reproduced by
computational systems based on the measurable signals. Communicatively, dance
is the dynamic response between dancer and spectator so as to create feedback
process of emotion, energy, and empathy. Basing on embodied cognition theories,
audiences do not only see the dance but they feel it through their own body as
they internally simulate the movement of the dancer Li et al. (2021). This effect
of mirror neurons systems that is termed as neural mirroring facilitates the
concept of dancing in terms of sensorimotor synchronization coupled with
affective resonance. Hence, the preference of the audience is focused not only
on the visual appeal of the choreography but also on their ability to provoke
empathy of the emotions and establish physical rhythmic harmony. The emotional
experience that this represents can be converted into measurable data in the
context of machine learning Liu et al. (2022). Figure 1
Figure 1
Multimodal Data Fusion Architecture Auditory modality codes rhythm, tempo and
intensity of music accompaniment whereas as a physiological modality, audience
cues including galvanic skin response (GSR), heart rate variability (HRV) and
gaze movement are coded Sanders (2021). These two are
the external and internalized audiences as shown in figure 1. In order to
operationalize this theoretical basis, the suggested framework frames the issue
of audience preference prediction as a fusion problem that is multimodal and
each modality adds complementary information to the framework. The hierarchical
levels are used in building the feature space: 1)
Basic sensory characteristics (e.g., velocity of motion, spectral
energy, heart beat rate), 2)
Middle-range semantic characteristics (i.e. patterns of movements,
music-emotion correspondence), and 3)
Cognitive-affective features at the high level (e.g. perceived
joy, tension, flow). The
interpretive power of the system enables it to determine which choreographic or
auditory signals are most consistently associated with positive audience
reactions, and this bridges the gap between human artistic intuition and
algorithmic Theoretical constructs like arousal-valence modeling which further
elucidates the responses emotional polarity where energetic and harmonious
sequences tend to evoke large arousal and positive valence resulting in higher
preference scores Sumi (2025). The model
therefore incorporates the emotional psychology with the computational learning
in a single structure through which the choreography features and the energy
used in performance generates quantifiable viewer responses. 4. Dataset Development and Feature Engineering The
success of machine learning models as predictors in the performing arts sector
is highly influenced by the diversity, richness, and integrity of the dataset.
In the prediction of audience preferences in dance, the dataset would combine
the multimodal sources of sources of information such as the visual, auditory,
physiological and behavioral aspects of performance and audience response Tang et al. (2018). 4.1. Data Sources and Collection Framework The
information that was employed in the study was obtained in sixty dances
performances of classical, contemporary, folk, and fusion genres, where four
hundred and eighty audience members participated. The recordings were made in
controlled conditions to guarantee that they aligned in terms of time and
quality. To provide the information about the specifics of choreography, body
dynamics, and gesture expressiveness, the visual data was collected with the
help of high-definition frontal and lateral cameras with the frequency of 30
frames per second Tsuchida et al. (2019). The sound was
captured using a multi-microphone array device that maintained the sound
quality of music and the reaction of the audience including the applause and
verbal exclamations. Physiological measurements were found through wearable
biometric to record heart rate variability, galvanic skin response and skin
temperature and gaze-tracking system mapped attention distribution in the
course of the performance. Table 1
Post-performance
behavioral data was used to complement these sensor readings by the form of
ratings, likes and comments that had been retrieved through digital interfaces.
Combined, these modalities offered a comprehensive and coordinated information
about the work of art expression as well as about the cognitive-emotional
reactions of the audience. 4.2. Data Preprocessing and Synchronization Preprocessing
and synchronization had to be done, to provide the temporal and semantic
consistency of all the streams of data. Raw video data was segmented and posed
estimated using deep learning-based architectures, including OpenPose and Media
Pipe, which allowed determining the coordinates of skeletal joints and movement
patterns of every frame. Short-Time Fourier Transform (STFT) and Mel-Frequency
Cepstral Coefficients (MFCC) were used to process audio data in order to store
rhythmic, tonal, and spectral characteristics Wallace et al. (2024). Physiological
and gaze tracking data were cleaned with Kalman filters in order to reduce
sensor noise and interpolated so as to match with video and audio samples. Each
of the modalities was synchronized at the temporal resolution of one second,
which resulted in a single set of data in which a time frame was a correlated
visual, auditory and physiological event. This cross-modal correspondence
enhanced cross-modal learning and latency between expressive performance
signals and reactions of the audience Wang
et al. (2020). The structural
premises of extracting meaningful and comparable features across types of data,
therefore, established themselves in the preprocessing phase.
Where,
Vt, At, Pt, and Bt. The feature vectors (Vt), (At), (Pt), and (Bt) of the
visual, auditory, physiological, and behavioral features at time (t) and (T) is
the total sequence length. 4.3. Feature Extraction and Engineering The
extraction and engineering of features has been done in a hierarchical manner
in order to extract insights in sensory, semantic and affective layers. Visual
representations were formed using motion velocity, angular displacement,
spatial symmetry, and limb-specific coordination measures based on skeleton
keypoints and convolutional-based neural networks trained to use dance-specific
datasets such as AIST++ were also fine-tuned to produce choreographic
embeddings. The audio characteristics included rhythmic, tempo, spectral and
chroma distributions that reflected musical style and energy. Physiological
cues were transformed into the quantitative affective measures which comprised
of mean HRV, GSR peak frequency and thermal variation in order to determine
emotional states of excitement, calmness or a state of tension. Table 2
Behavioral
feedback in textual form was vectorized based on BERT embeddings to map
sentiment and engagement indicators like the number of clicks and dwell-time
were introduced to reflect the audience attention. Z-score scaling was used to
normalize the features to make them comparable and principal component analysis
(PCA) helped to minimize the dimensionality, targeting the most predictive
attributes. It was a very detailed feature set that made this model able to
record both the objective composition of performance and the subjective emotion
of the audience. 4.4. Feature Fusion Strategy A
hybrid fusion strategy was used to combine these heterogeneous features, which
used early, late, and attention-based fusion mechanisms. Low-level features
within modalities were early fused together and this enabled the model to
identify synchronous interaction between motion, rhythm and physiological
arousal. Late fusion averaged the forecasts of modality specific submodels to
ensemble averaging between sources of information, the unique contribution of
each source. The attention-based hybrid fusion employed a transformer mechanism
which dynamically varied modality weights based on the context and prioritized
the most relevant cues at any point in time such as giving priority to motion
features during expressive sequences and giving priority to physiological data
during emotional times. In mathematical terms, the multimodal fusion procedure
can be determined as:
Fm
the feature representation of modality (m) and (alpha m) is attention weight
that is obtained by softmax normalization. Final choice of preference is
obtained as:
where
g(fθ) refers to the trained neural model and (Ffusion) is the objective
loss of the model. This combination method yielded a more predictive model as
well as an increase in interpretability as the relative importnce of the
various sensory inputs in influencing perceived artistic preference was
revealed. Thus, the feature engineering process and the dataset formed the
foundation of a strong, explainable, and culturally adaptive predictive system
that can have the power to close the gap between artistic creativity and the
predictive analytics of the audience. 5. Proposed Machine Learning Framework The
suggested Machine Learning (ML) model of predicting the preferences of the
audience in the field of dance incorporates multimodal streams of data, fusion
of features, and predictive modeling in a hierarchy of architecture. This
section shows the conceptual design, mathematical formulation, and workflow of
the system how various sensory inputs such as visual, auditory, and
physiological are learned, processed, and assessed to provide an estimate of
the preferences of the audience in a way that can be explained in an
interpretable manner. On
the top of the system, there are four main layers: (1) Data Input and
Preprocessing Layer, (2) Feature Encoding and Fusion Layer, (3) Predictive
Modeling Layer and (4) Evaluation and Interpretation Layer as illustrated in Figure 2 At the top of
the system, synchronized multimodal data (dance videos, music tracks,
physiological signals, and behavioral responses) is consumed on the basis of
the dataset designed in Section 4. These inputs are then converted into
high-dimensional representations in visual, audio, and physiological encoders,
respectively, based on CNN, spectrogram and LSTM encoders respectively. These
embeddings then get integrated into the fusion layer via attention-weighted
processes forming a single vector representing a combination of sensory and
affective processes. This predictive layer thereupon utilizes hybrid deep
learning devices that apply Convolutional Neural Networks (CNNs) to spatial
feature study, alongside Long Short-Term Memories (LSTMs) networks, to study
the audience preference score which is manifested as ( {y}_t ). Lastly, the
performance is checked by the evaluation layer which incorporates both the Mean
Squared Error (MSE), Mean Absolute Error (MAE) and Coefficient of Determination
(R2) which are all measures of performance as they can be interpreted using
post-hoc methods with SHAP values and attention visualization maps. The
mathematical model of the multimodal learning process can be put in the
following form. The input data may be expressed in a form of a tuple:
Vt,
At and Pt (t) are the vectors of features of the visual, auditory and
physiological features at time (t). Both modalities are initially coded by
modality selective transformation functions:
where
(fv), (fa), and (fp) are the CNN, audio spectrogram encoder and LSTM-based
physiological feature extractor. The modalities representations are appended to
one big feature space:
A
context-dependent relevance can be stressed by an attention mechanism which
calculates adaptive modality weights:
where
(Wm) s the number of modalities (M) and (Wm) are parameters of learnable
attention. The fused representation (Hfusion) is in turn inputted into the
predictive model:
The
aim of the model is to reduce the gap between the values of predicted and
actual audience preferences as a loss value:
Where
( λ) is the regularization coefficient which avoids overfitting.
Backpropagation and Adam optimizer are used to optimize the network so that the
network reaches a minimum value of the loss. Figure 2
Figure 2 Deployment Architecture of
ML Framework 6. Experimental Setup and Evaluation The
effectiveness, interpretativeness and strength of the suggested machine
learning framework in predicting the preferences of the audience in dance are
confirmed by the experimental phase of this study. The model was found to be
well structured to be both computationally efficient and ecologically valid so
that the predictive values of the model are based on actual performance
conditions. This part explains the experimental design, measurement measures,
comparative benchmarks and interpretation of results which all determine the
efficacy of the system in projecting multimodal artistic expression to results
on audience engagement. Phase
-1 Experimental Environment and Configuration All
the experiments were implemented using the NVIDIA RTX A6000 graphics card (48
GB VRAM), AMD Ryzen Thread ripper 3990X processor, and 256 GB of RAM and were
running on Ubuntu 22.04 LTS and Python 3.10 with TensorFlow 2.15 as a primary
deep learning framework. It used a number of support libraries that were OpenCV
to preprocess videos, Librosa to extract audio features and NeuroKit2 to
analyse physiological signals, which were incorporated into the implementation
pipeline. The information was stored in a PostgreSQL database, indexed through
timestamp-based primary keys, so that the information could be retrieved in
high speed in multimodal form. The dataset was segmented into 70 percent
training, 15 percent validation and 15 percent testing segment, whereby
stratification was done based on dance genre, to prevent preference to certain
styles. Mini-batches of 64 samples were employed in every training epoch, and
the learning rates were varied dynamically with the help of a cosine decay
scheduler with initial values ( ETA 0 = 10 3 ). Multimodal streams were merged
using the hybrid model consisting of convolutional filters (spatial and visual
processing) and LSTM units (temporal sequence understanding) and transformation
of multimodal streams through transformer-based fusion block. It was trained
over 120 epochs and early stopped using validation loss.
where
(
Besides
these measures, Pearson correlation (r) between anticipated and actually
obtained scores in the preferences of the audience was also calculated to
determine linear concordance. To assess qualitative features, SHAP (SHapley
Additive Explanations) and Grad-CAM images were used to understand which of the
modalities (visual, auditory, or physiological) made the greatest contribution
to each prediction. All the baselines were trained with the same conditions and
dataset splits. The proposed model has been found to perform better on all
metrics than the baselines, as indicated in Table 3, with a higher
predictive accuracy and interpretability. Table 3
The
findings substantiate the claim according to which the integration of temporal
(LSTM) and spatial (CNN) and contextual (Transformer) features contributes
significantly to the enhancement of the multimodal learning power. The hybrid
model had lower MSE and higher correlation than the best baseline
(Transformer-only) indicating that it has better generalization and sensitivity
to the affective cues. 7. Results and Discussion The
findings of the current paper prove that the proposed CNNLSTMTransformer fusion
model has the potential to be used to successfully predict the preferences of
the audience in dance by incorporating multimodal data streams of visual,
auditory, and physiological cues. By analysing the framework, quantitative and
qualitative analysis demonstrates both the high accuracy, interpretability and
adjustability of the framework when examining the dynamic relationship between
choreography, emotion and audience involvement. The comparative analysis (as
presented in Table 4 and depicted in Figure 4) confirms that the suggested
hybrid model is far better in terms of the core assessment measures Mean
Squared Error (MSE), Mean Absolute Error (MAE), Coefficient of Determination (R
2 ) and Pearson correlation (r). The model attained MSE of 0.061, an MAE of
0.159, R 2 of 0.94 with correlation coefficient of 0.97 making it to be 15
percent lower error with 18 percent higher correlation as compared to the
strongest baseline (Transformer-only). This was improved by the combination of
CNN filters with spatial motion encoding abilities, LSTM units with temporal
dependencies and Transformer attention with contextual feature fusion. This
three component framework augmented the model with the capacity to portray the
linear progression of the performance energy and the resonance of the audience
in the instant. Figure 3
Figure 3
Radar Chart – Performance Metric Comparison Figure 4 reveals that
the radar chart shows that the proposed CNNLSTMTransformer fusion model has the
most balanced profile in all assessment measures and creates almost symmetrical
performance polygon that represents stability and generalization. Figure 4 also
represents the comparative results in the form of a bar graph. The suggested
model continually registers the least error (MSE/MAE) and the most significant
indicators of accuracy (R2 and r), which confirm the statistical
excellence of the multimodal fusion model. Combined, these numbers reinforce
the fact that the fusion-based learning process not only improves the number of
errors but also strengthens the correlation, which should place the model as an
effective instrument in the prediction of preferences in aesthetic judgment. Figure 4
Figure 4 Model Performance Across
Metrics In
addition to the predictive accuracy, interpretability is also vital to the
validation of the AI-assisted analysis in creative fields. SHAP and Grad-CAM
analysis showed distinct modality-level data on the audience preference
prediction, showing indicators of physiological arousal (HRV, GSR, and gaze
patterns) as the most significant features that contribute around 38% to the
model decisions. Movement entropy, spatial symmetry, and gesture velocity
contributed 33, and beat synchronization, tonal brightness, and tempo variation
formed the rest 29. These results emphasize the physicality and sensuality of
the experience of dancing, in which physical resonance and rhythmic conformity
are very influential in the audience participation. SHAP analyses also revealed
that peak points of rhythmic synchrony and expressive gestures were always
correlated with greater preference scores, as it has been theorized in the
domain of psychology of dance. The model showed strong generalization between
audiences and dance styles (R 2 difference < 0.02), as well as low-latency
inference, which can be used in real-time applications. In practice, the model
can be used to provide data-driven understanding of choreographic optimization,
adaptive programming by cultural institutions and empirical exploration of
aesthetic experience, which relieves the quantitative AI modelling of the
qualitative depth of performing arts. 8. Conclusion and Future Work The
paper has developed a complete machine learning system in predicting
preferences among the audience in dance through a unified multimodal system
that combines visual, audio and physiological signals. The suggested
CNN-LSTM-Transformer fusion model has achieved successful results, closing the
divide between artistic and computational intelligence with an excellent
predictive accuracy (MSE = 0.061, R2 = 0.94, r = 0.97) and great
interpretability. The model was able to acquire the dynamic emotional and perceptual
structures that dictate the aesthetic experience of a dance performance through
a systemic combination of spatial, temporal and contextual learning processes.
The results prove the value of the idea that the quantitative modeling of the
audience engagement is possible without undermining the cultural or artistic
values. The SHAP and Grad-CAM interpretive analyses have proved that attributes
such as physiological and choreographic characteristics are mainly determined
factors that influence the audience preference. Besides, the framework has its
practical implications to choreographers, cultural institutions, and digital
performance platforms, which allows real-time preference analytics,
personalized recommendations, and adjustable stage design, which react to the
sentiment of the audience. This study will be followed up by future research
with increased cross-cultural datasets to examine regional differences in
aesthetic response and combining multilingual emotion modeling of different
audiences. Also, live performance based on adaptive systems will enable real
time responses by the audience with edge computing. This will continue to focus
on the inclusion of the ethical AI mechanisms and privacy preserving
architectures. Finally, the research will be a step towards the creation of an
emotionally intelligent and culturally sensitive AI that is capable of
improving creativity, inclusiveness, and interest in the performing arts. CONFLICT OF INTERESTS None. ACKNOWLEDGMENTS None. REFERENCES Advani, M., and Gokhale, N. (2023). Influence of Brand-Related User-Generated Content and Brand Engagement on Instagram. AIP Conference Proceedings, 2523, 020105. https://doi.org/10.1063/5.0139347 Ajili, I., Mallem, M., and Didier, J.-Y. (2019). Human Motions and Emotions Recognition Inspired by LMA Qualities. The Visual Computer, 35, 1411–1426. https://doi.org/10.1007/s00371-018-1569-6 Baía Reis, A., Vašků, P., and Solmošiová, S. (2025). Artificial Intelligence in Dance Choreography: A Practice-as-Research Exploration of Human–AI Co-Creation Using ChatGPT-4. International Journal of Performance Arts and Digital Media, 1–21. https://doi.org/10.1080/14794713.2024.2437365 Braccini, M., De Filippo, A., Lombardi, M., and Milano, M. (2025). Dance Choreography Driven by Swarm Intelligence in Extended Reality Scenarios: Perspectives and Implications. In Proceedings of the IEEE International Conference on Artificial Intelligence and Extended and Virtual Reality (AIxVR), 348–354. IEEE. Feng, H., Zhao, X., and Zhang, X. (2022). Automatic Arrangement of Sports Dance Movement Based on Deep Learning. Computational Intelligence and Neuroscience, 2022, Article 9722558. https://doi.org/10.1155/2022/9722558 Hardy, W., Paliński, M., Rożynek, S., and Gaenssle, S. (2023). Promoting Music Through User-Generated Content: TikTok Effect on Music Streaming. In Proceedings of the 98th Annual Conference. Kim, H. J., Neff, M., and Lee, S.-H. (2022). The Perceptual Consistency and Association of the LMA Effort Elements. ACM Transactions on Applied Perception, 19(4), 1–17. https://doi.org/10.1145/3550453 Lei, Y., Li, X., and Chen, Y. J. (2022). Dance Evaluation Based on Movement and Neural Network. Journal of Mathematics, 2022, 1–7. https://doi.org/10.1155/2022/8147356 Li, R., Yang, S., Ross, D. A., and Kanazawa, A. (2021). AI Choreographer: Music-Conditioned 3D Dance Generation with AIST++. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 13401–13412). https://doi.org/10.1109/ICCV48922.2021.01317 Liu, A.-A., Wang, X., Xu, N., Guo, J., Jin, G., Zhang, Q., Tang, Y., and Zhang, S. (2022). A Review of Feature Fusion-Based Media Popularity Prediction Methods. Visual Informatics, 6, 78–89. https://doi.org/10.1016/j.visinf.2022.03.003 Sanders, C. D., Jr. (2021). An Exploration Into Digital Technology And Applications for the Advancement of Dance Education (Master’s thesis). University of California, Irvine. Sumi, M. (2025). Simulation of Artificial Intelligence Robots In Dance Action Recognition and Interaction Process Based on Machine Vision. Entertainment Computing, 52, 100773. https://doi.org/10.1016/j.entcom.2024.100773 Tang, T., Mao, H., and Jia, J. (2018). AniDance: Real-Time Dance Motion Synthesis to the Song. In Proceedings of the ACM International Conference on Multimedia (pp. 1237–1239). https://doi.org/10.1145/3240508.3240586 Tsuchida, S., Fukayama, S., Hamasaki, M., and Goto, M. (2019). AIST Dance Video Database: Multi-Genre, Multi-Dancer, and Multi-Camera Database for Dance Information Processing. In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), 1–6. Wallace, B., Nymoen, K., Torresen, J., and Martin, C. P. (2024). Breaking from Realism: Exploring the Potential of Glitch in AI-Generated Dance. Digital Creativity, 35, 125–142. https://doi.org/10.1080/14626268.2023.2286415 Wang, S., Li, J., Cao, T., Wang, H., Tu, P., and Li, Y. (2020). Dance Emotion Recognition Based on Laban Motion Analysis using Convolutional Neural Network and Long Short-Term Memory. IEEE Access, 8, 124928–124938. https://doi.org/10.1109/ACCESS.2020.3007274
© ShodhKosh 2025. All Rights Reserved. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||