PREDICTIVE MODELING OF BOX OFFICE SUCCESS USING MACHINE LEARNING AND HISTORICAL MOVIE DATA

Authors

  • Ankit Shukla Assistant Professor, School of Cinema, AAFT University of Media and Arts, Raipur, Chhattisgarh-492001, India
  • Dr. Ganesh Baliram Dongre Principal, Electronics and Computer Engineering, CSMSS Chh. Shahu College of Engineering, Chhatrapati Sambhajinagar, Maharashtra, India
  • Gouri Moharana Assistant Professor, School of Fine Arts & Design, Noida International University, Noida, Uttar Pradesh, India
  • Dr. Prashant Suresh Salve Associate Professor and Head, Department of Commerce and Research Centre, Babuji Avhad Mahavidyalaya, Pathardi, Dist: Ahmednagar, Maharashtra, Pin-414102, India
  • Kanchan Makarand Sangamwar Assistant Professor, Department of DESH, Vishwakarma Institute of Technology, Pune, Maharashtra-411037, India
  • Dr. Pallavi Pankaj Ahire Assistant Professor, Department of Computer Science and Engineering, Pimpri Chinchwad University, India

DOI:

https://doi.org/10.29121/shodhkosh.v6.i4s.2025.6940

Keywords:

Box Office Prediction, Machine Learning, Historical Movie Data, Predictive Modeling, Ensemble Methods, Regression Analysis, Data-Driven Decision Making, Movie Industry Analytics

Abstract [English]

Machine learning (ML) methods can now be used to more accurately guess which films will do well at the box office because more past movie data is becoming available. This study shows a complete method for predictive modelling that uses machine learning techniques to guess how well a movie will do at the box office before it comes out. The framework uses a lot of different factors, such as subject, budget, cast popularity, director track record, length, release date, language, and social media talk before the movie comes out. Over 5,000 films produced in the last 20 years were carefully chosen and preprocessed to make sure that the data was consistent, that it was normalized, and that any outliers were removed. Exploratory Data Analysis (EDA) was used to find the most important features and key relationships. It was done using supervised machine learning models like Linear Regression, Random Forest, Gradient Boosting, and Support Vector Machines. They were tested using R² score, Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE). Ensemble methods like Gradient Boosting were the most accurate at predicting the future, with a R² score of more than 0.85 on the test set. A study of feature importance found that the production budget, the popularity of the cast, and the time of the movie's release all have a big effect on its box office earnings. The results show that strong predictive modelling can help directors, companies, and investors make smart choices by estimating how much money a movie will make. This study stresses the importance of using data-driven methods to change the film industry's reliance on gut feelings and past experiences into predicting methods based on science.

References

Bhatt, J., and Verma, S. (2020). Box Office Success Prediction Through Artificial Neural Network and Machine Learning Algorithm. In Proceedings of the First Pan IIT International Management Conference. https://doi.org/10.2139/ssrn.3753059 DOI: https://doi.org/10.2139/ssrn.3753059

Cheang, Y. M., and Cheah, T. C. (2021). Predicting Movie Box-Office Success and the Main Determinants of Movie Box Office Sales in Malaysia Using Machine Learning Approach. In Proceedings of the 10th International Conference on Software and Computer Applications (57–62). ACM. https://doi.org/10.1145/3457784.3457793 DOI: https://doi.org/10.1145/3457784.3457793

Chen, S., Ni, S., Zhang, Z., and Zhang, Z. (2022). The Study of Influencing Factors of the Box Office and Prediction Based on Machine Learning Models. In Proceedings of the International Conference on Artificial Intelligence, Robotics and Communication (1–8). Springer. https://doi.org/10.1007/978-981-99-4554-2_1 DOI: https://doi.org/10.1007/978-981-99-4554-2_1

Gegres, F., Azar, D. A., Vybihal, J., and Wang, J. T. L. (2022). Early Prediction of Movie Success Using Machine Learning and Evolutionary Computation. In Proceedings of the 21st International Symposium on Communications and Information Technologies (ISCIT) (177–182). IEEE. https://doi.org/10.1109/ISCIT55906.2022.9931277 DOI: https://doi.org/10.1109/ISCIT55906.2022.9931277

Gupta, S. K., Garg, T., Raj, S., and Singh, S. (2024). Box Office Revenue Prediction using Linear Regression in Machine Learning. In Proceedings of the International Conference on Artificial Intelligence and Quantum Computation-Based Sensor Application (ICAIQSA) (1–7). IEEE. https://doi.org/10.1109/ICAIQSA64000.2024.10882301 DOI: https://doi.org/10.1109/ICAIQSA64000.2024.10882301

Lopes, R. B., and Viterbo, J. (2023). Applying Machine Learning Techniques to Box Office Forecasting. In Proceedings of the International Conference on Information Technology and Systems (189–199). Springer. https://doi.org/10.1007/978-3-031-33261-6_17 DOI: https://doi.org/10.1007/978-3-031-33261-6_17

Quader, N., Gani, M. O., and Chaki, D. (2017). Performance Evaluation of Seven Machine Learning Classification Techniques for Movie Box Office Success Prediction. In Proceedings of the 3rd International Conference on Electrical Information and Communication Technology (EICT) (1–6). IEEE. https://doi.org/10.1109/EICT.2017.8275242 DOI: https://doi.org/10.1109/EICT.2017.8275242

San Arranz, G. (2020). Movie Success Prediction Using Machine Learning Algorithms (Unpublished manuscript).

Vardhan, S. V., Balaji, K. V. S., Kumar, C. A., and Kumar, C. J. (2025). From Buzz to Blockbuster: Predicting Movie Revenue Using a Hybrid Approach Combining Machine Learning and Sentiment Analysis. In Proceedings of the International Conference on Multi-Agent Systems for Collaborative Intelligence (ICMSCI) (pp. 1220–1227). IEEE. https://doi.org/10.1109/ICMSCI62561.2025.10894031 DOI: https://doi.org/10.1109/ICMSCI62561.2025.10894031

Velingkar, G., Varadarajan, R., and Lanka, S. (2022). Movie Box-Office Success Prediction Using Machine Learning. In Proceedings of the Second International Conference on Power Control and Computing Technologies (ICPC2T) (1–6). IEEE. https://doi.org/10.1109/ICPC2T53885.2022.9776798 DOI: https://doi.org/10.1109/ICPC2T53885.2022.9776798

Wadibhasme, R. N., Chaudhari, A. U., Khobragade, P., Mehta, H. D., Agrawal, R., and Dhule, C. (2024). Detection and Prevention of Malicious Activities in Vulnerable Network Security Using Deep Learning. In Proceedings of the International Conference on Innovations and Challenges in Emerging Technologies (ICICET) (1–6). IEEE. https://doi.org/10.1109/ICICET59348.2024.10616289 DOI: https://doi.org/10.1109/ICICET59348.2024.10616289

Zain, B. (2024). Decoding Cinematic Fortunes: A Machine Learning Approach to Predicting Film Success. In Proceedings of the 21st Learning and Technology Conference (LandT) (144–148). IEEE. https://doi.org/10.1109/LT60077.2024.10468906 DOI: https://doi.org/10.1109/LT60077.2024.10468906

Zheng, Y., Zhen, Q., Tan, M., Hu, H., and Zhan, C. (2021). COVID-19’s Impact on the Box Office: Machine Learning and Difference-in-Difference. In Proceedings of the 16th International Conference on Intelligent Systems and Knowledge Engineering (ISKE) (458–463). IEEE. https://doi.org/10.1109/ISKE54062.2021.9755401 DOI: https://doi.org/10.1109/ISKE54062.2021.9755401

Downloads

Published

2025-12-25

How to Cite

Shukla, A., Dongre, G. B., Moharana, G., Salve, P. S., Sangamwar, K. M., & Ahire, P. P. (2025). PREDICTIVE MODELING OF BOX OFFICE SUCCESS USING MACHINE LEARNING AND HISTORICAL MOVIE DATA. ShodhKosh: Journal of Visual and Performing Arts, 6(4s), 603–614. https://doi.org/10.29121/shodhkosh.v6.i4s.2025.6940