GENERATIVE ART PHOTOGRAPHY USING DIFFUSION MODELS

Authors

  • Nidhi Tewatia Assistant Professor, School of Business Management, Noida International University, India
  • Lalit Khanna Chitkara Centre for Research and Development, Chitkara University, Himachal Pradesh, Solan, India
  • Dr. Jeberson Retna Raj Professor, Department of Computer Science and Engineering, Sathyabama Institute of Science and Technology, Chennai, Tamil Nadu, India
  • Pavas Saini Centre of Research Impact and Outcome, Chitkara University, Rajpura, Punjab, India
  • Dr. Kunal Meher Assistant Professor, UGDX School of Technology, ATLAS Skill Tech University, Mumbai, Maharashtra, India
  • Dr. Bichitrananda Patra Professor, Department of Computer Applications, Institute of Technical Education and Research, Siksha 'O' Anusandhan (Deemed to be University) Bhubaneswar, Odisha, India

DOI:

https://doi.org/10.29121/shodhkosh.v6.i1s.2025.6645

Keywords:

Diffusion Models, Hybrid Diffusion Architecture, Generative Art Photography, Latent Denoising, Prompt Guidance, Photography-Aware Conditioning

Abstract [English]

This work introduces a hybrid diffusion model with the purpose of improving generative art photography via the combination of latent space denoising, dual guidance and photography-aware conditioning. By using text-based semantic control in conjunction with exposure, depth-of-field and color harmonic cues, the system generates more aesthetically consistent images that are more photographically realistic. Experimental results demonstrate that the proposed model has higher visual clarity, more aligned prompt, and more stable light compared to baseline diffusion frameworks, and only needs much fewer sampling steps with the help of auxiliary consistency refinement module. Further analyses like distribution of the aesthetic score, exposure heatmaps, structural-creativity trade-off, and sharpness of texture comparisons verify the validity of the model.

References

Borawake, M., Patil, A., Raut, K., Shelke, K., and Yadav, S. (2025). Deep Fake Audio Recognition Using Deep Learning. International Journal of Research in Advanced Engineering and Technology (IJRAET), 14(1), 108–113. https://doi.org/10.55041/ISJEM03689 DOI: https://doi.org/10.55041/ISJEM03689

Huang, X., Zou, D., Dong, H., Ma, Y.-A., and Zhang, T. (2024). Faster Sampling without Isoperimetry via Diffusion-Based Monte Carlo. In Proceedings of the 37th Conference on Learning Theory (COLT 2024) (2438–2493).

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. (2023). High-Resolution Image Synthesis with Latent Diffusion Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023) (10684–10695). IEEE. DOI: https://doi.org/10.1109/CVPR52729.2023.02161

Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., and Aberman, K. (2023). DreamBooth: Fine-Tuning Text-To-Image Diffusion Models for Subject-Driven Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023) ( 22500–22510). IEEE. https://doi.org/10.1109/CVPR52729.2023.02155 DOI: https://doi.org/10.1109/CVPR52729.2023.02155

Salimans, T., and Ho, J. (2022). Progressive Distillation for Fast Sampling of Diffusion Models. arXiv Preprint arXiv:2202.00512.

Sampath, B., Ayyappa, D., Kavya, G., Rabins, B., and Chandu, K. G. (2025). ADGAN++: A Deep Framework for Controllable and Realistic Face Synthesis. International Journal of Advanced Computer Engineering and Communication Technology (IJACECT), 14(1), 25–31. https://doi.org/10.65521/ijacect.v14i1.168 DOI: https://doi.org/10.65521/ijacect.v14i1.168

Song, Y., Durkan, C., Murray, I., and Ermon, S. (2021). Maximum Likelihood Training of Score-Based Diffusion Models. Advances in Neural Information Processing Systems, 34, 1415–1428.

Ulhaq, A., Akhtar, N., and Pogrebna, G. (2022). Efficient Diffusion Models for Vision: A Survey. arXiv Preprint arXiv:2210.09292.

Wang, Y., et al. (2024). SINSR: Diffusion-Based Image Super-Resolution in a Single Step. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024) (25796–25805). IEEE. https://doi.org/10.1109/CVPR52733.2024.02437 DOI: https://doi.org/10.1109/CVPR52733.2024.02437

Wang, Y., Zhang, W., Zheng, J., and Jin, C. (2023). High-Fidelity Person-Centric Subject-To-Image Synthesis. arXiv Preprint arXiv:2311.10329. DOI: https://doi.org/10.1109/CVPR52733.2024.00733

Wang, Z., Zhao, L., and Xing, W. (2023). StyleDiffusion: Controllable Disentangled Style Transfer via Diffusion Models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2023) (7643–7655). IEEE. https://doi.org/10.1109/ICCV51070.2023.00706 DOI: https://doi.org/10.1109/ICCV51070.2023.00706

Wu, X., Hu, Z., Sheng, L., and Xu, D. (2021). StyleFormer: Real-time Arbitrary Style Transfer via Parametric Style Composition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2021) (10684–10695). IEEE. DOI: https://doi.org/10.1109/ICCV48922.2021.01435

Yang, B., Luo, Y., Chen, Z., Wang, G., Liang, X., and Lin, L. (2023). Law-diffusion: Complex Scene Generation by Diffusion with Layouts. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2023) (pp. 22612–22622). IEEE. https://doi.org/10.1109/ICCV51070.2023.02072 DOI: https://doi.org/10.1109/ICCV51070.2023.02072

Yeh, Y.-Y., et al. (2024). TextureDreamer: Image-Guided Texture Synthesis Through Geometry-Aware Diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024) (4304–4314). IEEE. https://doi.org/10.1109/CVPR52733.2024.00412 DOI: https://doi.org/10.1109/CVPR52733.2024.00412

Yi, X., Han, X., Zhang, H., Tang, L., and Ma, J. (2023). Diff-Retinex: Rethinking Low-Light Image Enhancement with a Generative Diffusion Model. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2023) (12268–12277). IEEE. https://doi.org/10.1109/ICCV51070.2023.01130 DOI: https://doi.org/10.1109/ICCV51070.2023.01130

Zhang, W., Zhai, G., Wei, Y., Yang, X., and Ma, K. (2023). Blind Image Quality Assessment Via Vision–Language Correspondence: A Multitask Learning Perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023) (14071–14081). IEEE. https://doi.org/10.1109/CVPR52729.2023.01352 DOI: https://doi.org/10.1109/CVPR52729.2023.01352

Downloads

Published

2025-12-10

How to Cite

Tewatia, N., Khanna, L., Raj, J. R., Saini, P., Meher, K., & Patra, B. (2025). GENERATIVE ART PHOTOGRAPHY USING DIFFUSION MODELS. ShodhKosh: Journal of Visual and Performing Arts, 6(1s), 171–182. https://doi.org/10.29121/shodhkosh.v6.i1s.2025.6645