|
ShodhKosh: Journal of Visual and Performing ArtsISSN (Online): 2582-7472
Predicting Visual Appeal in Advertising Photography Dr. S L Jany Shabu 1 1 Associate
Professor, Department of Computer Science and Engineering, Sathyabama Institute
of Science and Technology, Chennai, Tamil Nadu, India 2 Centre
of Research Impact and Outcome, Chitkara University, Rajpura- 140417, Punjab,
India 3 Assistant Professor, Department of Mechanical Engineering, ARKA JAIN
University Jamshedpur, Jharkhand, India 4 Assistant Professor, School of Business Management, Noida International
University, India 5 Department of Electronics and Telecommunication Engineering, Vishwakarma
Institute of Technology, Pune, Maharashtra, 411037, India 6 Associate Professor, Department of Mechanical Engineering, Siksha 'O' Anusandhan (Deemed to be University), Bhubaneswar, Odisha,
India
1. INTRODUCTION Visual communication has taken the fore in the modern day advertisement as photographs are not only used as the representational element but also as a persuasive element in the perception of consumers, their engagement and their buying habit. Aesthetic value of the visual content has become one of the preferred factors in determining the success of a campaign in an ever-saturated digital marketplace, which includes social media, e-commerce platforms, and immersive advertising spaces. The skill of assessing and maximizing the visual appeal has never been as important as it is today, because advertisers strategically use photography to build up stories, differentiate their products and create their brand name. Historically, aesthetic evaluation in advertisement photography has been largely dependent on professional photographers, creative directors and the marketing departments. Although efficient, this type of human-based assessment is subjective in nature by definition, labor-intensive and cross-cultural or context-dependent in its variability Yu (2022). With the emergence of artificial intelligence and machine learning, it is becoming possible to automate and scale visual quality assessment and make data-driven choices as part of creative processes. Computational predictors of visual appeal can be regarded as models that predict the visual appeal of images based on attribute measurements of an image. Contrary to technical measurements of image quality, the visual appeal to advertising is affected by compositional balance, the lighting style, color harmony, emotional appeal, clarity of the subject matter, and the meaning that the photograph suggests. These dimensions are closely interacting with each other, which makes it difficult to quantify them manually and requires complex computational methods Guo (2021). The recent progress in computer vision, especially in deep learning, enables machines to learn hierarchical representations of visual centers, including not only the low-level features of an image (edges, textures, etc.) but also the high-level features of a picture (mood or style). Convolutional Neural Networks (CNNs) have been shown to perform well in image classification and aesthetic scoring tasks whereas transformer-based architectures have been shown to be more capable of modeling long-range dependencies and global relationships in images. Such developments would create a chance to build a predictive system specifically designed to meet the needs of the advertising photography where aesthetic value as well as the effect on the consumer are strategic Wang and Park (2023). Nevertheless, visual appeal prediction in a marketing setting cannot be done solely based on analysis of technical features, but it needs to take into consideration the psychological and behavioral responses. Advertisement pictures are decoded by consumers based on a blend of image preferences, cultural anticipations, emotional arousal, and brand identifications. Thus, any model of computation will have to combine various capabilities, such as spatial composition, color semantics, lighting gradient, subject prominence and stylistic cues, to simulate human aesthetic judgment. Besides, advertisement data sets are difficult to manage because of the differences in genres, brand image, and visual orientation Zhang and Huang (2024). An effective strategy should be based on well-marked datasets which capture actual perceptions of consumers as opposed to the mere binary aesthetic tags. The proposed research will help to fill these gaps by suggesting a systematic process of predicting visual appeal based on handcrafted aesthetic descriptors as well as learned deep representations with CNN and transformer models. 2. Background Work The study of visual aesthetics prediction has developed at a very fast pace in the past decade, incorporating computer vision, psychological, and advertising science knowledge. Initial attempts at computational aesthetics concentrated more on manual image descriptors based on the principles of classical rules of photography like balance, contrast, color harmony, and the rule of thirds. The concept of using low-level visual features to determine the aesthetic scores based on machine learning was pioneered by Datta et al. (2006), and thus, the quantitative modeling of beauty. Later researchers expanded these methods to include texture features, edge density and colour histograms to elicit higher-level aesthetics Ramdani and Belgiawan (2023). These models however tended to have issues with subjectivity, as well as contextual dependence, which is a major problem when gauging advertising imagery as the emotional tone and the intended branding purpose are highly defining in the perceived appeal. The invention of deep learning was a breakthrough in evaluation of aesthetics. Convolutional neural networks (CNNs) that were trained on large scale image datasets, including AVA (Aesthetic Visual Analysis) started to surpass traditional handcrafted methods, by automatically learning hierarchy of representations of composition, object salience and spatial harmony Kim and Yoon (2021). Lu et al. were the first to present multi-patch CNNs which tested aesthetic areas in an image to be more sensitive to localised design features. More recent advancements in transformer-based architectures (including Vision Transformers (ViT) and Swin Transformers) have enabled generation of long-range dependencies, semantic coherence, and contextual relations across an entire image, which is important to comprehend the holistic effect of the advertising image data. Meanwhile, aesthetic prediction studies have also been applied to field-specific tasks, such as to fashion photography, social media imagery, and visualization in product design Sheng et al. (2020). Table 1 is a synthesis of the related research on visual aesthetics and advertising image analysis. The field of advertising photography however is under-researched especially when it comes to incorporating affective-semantic aspects including emotion, narrating and involving the consumer. The current literature tends to generalize on the beauty prediction without paying attention to the persuasion intention or perception scales among viewers Jacobs et al. (2024). Table 1
3. Conceptual Framework 3.1. Definition of visual appeal in advertising photography The concept of visual appeal to advertising photography is that which is perceived as attractive, harmonious and capable of communicating in a photograph with respect to drawing attention, and inspiring positive emotional and cognitive response in the viewer. It is not confined to the aesthetic beauty but to the strategicity of visual elements in relation to the marketing purpose- evoking desire, trust and brand remembrance. Also in the advertising situations, visual appeal is an artistic and psychological phenomenon which defines how effective photograph is in relaying a message and persuading a consumer to act Jiang et al. (2024). In contrast to general aesthetic assessment when the focus is on the beauty, advertising photography is more focused on the purposeful appeal, how form, color, lighting, and composition combine to increase the desirability of the product, as well as the narrative integrity. A combination of these dimensions determines the aesthetic experience of the viewer, the duration of engagement, retention, and purchase intention. The modern computational aesthetic is where visual attraction is determined by quantifiable parameters, including symmetry, contrast, saturation and space structure Chan and Septianto (2024). Deep learning models can now give an approximation to these perceptual judgments by the analysis of complex visual hierarchies. 3.2. Influencing Factors: Composition, Lighting, Color Harmony, Subject, motion, and Style The aesthetic and perceptual conditions that come together to form a union of interdependent elements create a visual charm of an advertising photograph. The composition controls the structural harmonization and space arrangement of visual objects such as the rule of thirds, leading lines, and focal point that direct the attention of the viewers. Composition is effective to make a text more clear and engaging so that the audience can intuitively process the main visual information. The perception of product texture and quality depends on lighting which determines mood, depth and realism Fang et al. (2023). Figure 1presents the determinants of visual appeal in advertisement photography processes. Depending on the brand message, high-key lighting can be used to either create freshness or luxurious appeal, whereas low-key arrangements are used to create a sense of intimacy or mystery. Figure 1 |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Table 2 Model Performance Comparison on Visual Appeal Prediction |
||||
|
Model Type |
MAE ↓ |
RMSE ↓ |
R² Score ↑ |
Pearson Correlation (r)
↑ |
|
Handcrafted Features (ML) |
0.412 |
0.528 |
0.72 |
0.81 |
|
Deep CNN |
0.301 |
0.402 |
0.84 |
0.89 |
|
Deep CNN |
0.279 |
0.374 |
0.86 |
0.9 |
|
Transformer |
0.244 |
0.341 |
0.89 |
0.91 |
The findings shown in Table 2 illustrate how the accuracy of prediction improves steadily with the development of models based on manually designed features to deep learning-based models. The feature based model that was evidenced by hand performance moderately with R 2 of 0.72 and Pearson correlation of 0.81, which means that it is not very good at capturing the intricate aesthetic dependencies. Figure 4 presents model performance differences in comparison with MAE and RMSE.
Figure 4

Figure 4 Comparative Analysis of Model Performance Using MAE
and RMSE
This performance can be taken as the limitation of manually designed features that merely pay attention to color balance and composition without the insight into semantic depth. Conversely, deep CNN models, including VGG19 and ResNet50, demonstrated significant improvement and reached the R 2 of 0.84 -0.86.
Figure 5

Figure 5 Comparative Performance of ML, CNN, and Transformer
Models Using Correlation Metrics
Those models were a good representation of the hierarchical visual representations of edges, texture and object relationships, which help humans to perceive beauty and balance. Figure 5 presents performance differences between ML, CNN, and transformer models based on a correlation
8. Conclusion
This study has provided a complete model of predicting visual attractiveness in advertising photography combining the concepts of computational aesthetics, machine learning, and visual psychology. A hybrid manner modeling that incorporates Convolutional Neural Networks (CNNs) and Vision Transformer (ViT) the research study established how local compositional features and global contextual associations can be successfully captured to predict perceived aesthetic quality. Important handcrafted details and rich visual embeddings were also added to the interpretation which made the quantifiable visual characteristics, including color harmony, lighting uniformity, and compositional balance, to be correlated with human aesthetic judgments. It has been empirically proven that deep transformer-based models are more successful in capturing emotional tone and spatial coherence, which are significant to the success of an advertisement, as compared to conventional methods. The proposed system was highly consistent with expert and consumer ratings, which confirms its capability to model aesthetic reasoning by use of data-driven processes. In addition to the technical performance, the framework is a contribution to a larger comprehension of how aspects of design impact viewer involvement, affective response and brand perceptions. Limitations include however also diverse data sets and subjective variability of aesthetic labeling in the study. These observations create prospects of further studies that will incorporate multimodal data, textual, auditory and contextual cues to understand the complete sensory impressiveness of advertising.
CONFLICT OF INTERESTS
None.
ACKNOWLEDGMENTS
None.
REFERENCES
Bouwman, E. P., Bolderdijk, J. W., Onwezen, M. C., and Taufik, D. (2022). “Do you Consider Animal Welfare to be Important?” Activating Cognitive Dissonance Via Value Activation Can Promote Vegetarian Choices. Journal of Environmental Psychology, 83, 101871. https://doi.org/10.1016/j.jenvp.2022.101871
Chan, E. Y., and Septianto, F. (2024). Self-Construals and Health Communications: The Persuasive Roles of Guilt and Shame. Journal of Business Research, 170, 114357. https://doi.org/10.1016/j.jbusres.2023.114357
Fang, J., Wen, Z., and He, Z. (2023). Moderated Mediation Model Analysis of Common Categorical Variables. Applied Psychology, 29, 291–299.
Fechner, D., and Isbanner, S. (2025). Understanding the Intention–Behaviour Gap in Meat Reduction: The Role of Cognitive Dissonance in Dietary Change. Appetite, 214, 108204. https://doi.org/10.1016/j.appet.2025.108204
Gradidge, S., Zawisza, M., Harvey, A. J., and McDermott, D. T. (2021). A Structured Literature Review of the Meat Paradox. Social Psychological Bulletin, 16, e5953. https://doi.org/10.32872/spb.5953
Guo, L. (2021). Application of Animal Images in Food Packaging Design: Taking Traditional Tibetan Auspicious Patterns as an Example. Green Packaging, 6, 96–99.
Hurst, K. F., and Sintov, N. D. (2022). Guilt Consistently Motivates Pro-Environmental Outcomes While Pride Depends on Context. Journal of Environmental Psychology, 80, 101776. https://doi.org/10.1016/j.jenvp.2022.101776
Ioannidou, M., Lesk, V., Stewart-Knox, B., and Francis, K. B. (2023). Moral Emotions and Justifying Beliefs about Meat, Fish, Dairy and Egg Consumption: A Comparative Study of Dietary Groups. Appetite, 186, 106544. https://doi.org/10.1016/j.appet.2023.106544
Jacobs, T. P., Wang, M., Leach, S., Siu, H. L., Khanna, M., Chan, K. W., Chau, H. T., Tam, K. Y. Y., and Feldman, G. (2024). Revisiting the Motivated Denial of Mind to Animals Used for Food: Replication Registered Report of Bastian et al. (2012). International Review of Social Psychology, 37, 6. https://doi.org/10.5334/irsp.932
Jiang, L. A., Feng, Y., Zhou, W., Yang, Z., and Su, X. (2024). Too Anthropomorphized to Keep Distance: The Role of Social Psychological Distance on Meat Inclinations. Appetite, 196, 107272. https://doi.org/10.1016/j.appet.2024.107272
Kim, D. J. M., and Yoon, S. (2021). Guilt of the Meat-Eating Consumer: When Animal Anthropomorphism Leads to Healthy Meat Dish Choices. Journal of Consumer Psychology, 31, 665–683. https://doi.org/10.1002/jcpy.1215
Nielsen, R. S., Gamborg, C., and Lund, T. B. (2024). Eco-guilt and Eco-Shame in Everyday Life: An Exploratory Study of the Experiences, Triggers, and Reactions. Frontiers in Sustainability, 5, 1357656. https://doi.org/10.3389/frsus.2024.1357656
Ramdani, M. A., and Belgiawan, P. F. (2023). Designing Instagram Advertisement Content: What Design Elements Influence Customer Attitude and Purchase Behavior? Contemporary Management Research, 19, 1–26. https://doi.org/10.7903/cmr.23023
Sheng, G., Xia, Q., and Yue, B. (2020). Effectiveness of Green Advertising from the Perspective of Image Proximity. Xinwen Yu Chuanbo Pinglun, 73, 59–69.
Wang, Z., and Park, J. (2023). “Human-like” is Powerful: The Effect of Anthropomorphism on Psychological Closeness and Purchase Intention in Insect Food Marketing. Food Quality and Preference, 109, 104901. https://doi.org/10.1016/j.foodqual.2023.104901
Yu, H. (2022). Application of Animal Anthropomorphic Images Mixed with Graffiti Style in Packaging Design. Xin Mei Yu, 10, 99–101. https://doi.org/10.18282/l-e.v10i5.2687
Zhang, Y., and Huang, S. (2024). The Influence of Visual Marketing on Consumers’ Purchase Intention of Fast Fashion Brands in China: An Exploration Based on the fsQCA Method. Frontiers in Psychology, 15, 1190571. https://doi.org/10.3389/fpsyg.2024.1190571
|
|
This work is licensed under a: Creative Commons Attribution 4.0 International License
© ShodhKosh 2025. All Rights Reserved.