PREDICTIVE MODELING OF BOX OFFICE SUCCESS USING MACHINE LEARNING AND HISTORICAL MOVIE DATA
DOI:
https://doi.org/10.29121/shodhkosh.v6.i4s.2025.6940Keywords:
Box Office Prediction, Machine Learning, Historical Movie Data, Predictive Modeling, Ensemble Methods, Regression Analysis, Data-Driven Decision Making, Movie Industry AnalyticsAbstract [English]
Machine learning (ML) methods can now be used to more accurately guess which films will do well at the box office because more past movie data is becoming available. This study shows a complete method for predictive modelling that uses machine learning techniques to guess how well a movie will do at the box office before it comes out. The framework uses a lot of different factors, such as subject, budget, cast popularity, director track record, length, release date, language, and social media talk before the movie comes out. Over 5,000 films produced in the last 20 years were carefully chosen and preprocessed to make sure that the data was consistent, that it was normalized, and that any outliers were removed. Exploratory Data Analysis (EDA) was used to find the most important features and key relationships. It was done using supervised machine learning models like Linear Regression, Random Forest, Gradient Boosting, and Support Vector Machines. They were tested using R² score, Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE). Ensemble methods like Gradient Boosting were the most accurate at predicting the future, with a R² score of more than 0.85 on the test set. A study of feature importance found that the production budget, the popularity of the cast, and the time of the movie's release all have a big effect on its box office earnings. The results show that strong predictive modelling can help directors, companies, and investors make smart choices by estimating how much money a movie will make. This study stresses the importance of using data-driven methods to change the film industry's reliance on gut feelings and past experiences into predicting methods based on science.
References
Bhatt, J., and Verma, S. (2020). Box Office Success Prediction Through Artificial Neural Network and Machine Learning Algorithm. In Proceedings of the First Pan IIT International Management Conference. https://doi.org/10.2139/ssrn.3753059 DOI: https://doi.org/10.2139/ssrn.3753059
Cheang, Y. M., and Cheah, T. C. (2021). Predicting Movie Box-Office Success and the Main Determinants of Movie Box Office Sales in Malaysia Using Machine Learning Approach. In Proceedings of the 10th International Conference on Software and Computer Applications (57–62). ACM. https://doi.org/10.1145/3457784.3457793 DOI: https://doi.org/10.1145/3457784.3457793
Chen, S., Ni, S., Zhang, Z., and Zhang, Z. (2022). The Study of Influencing Factors of the Box Office and Prediction Based on Machine Learning Models. In Proceedings of the International Conference on Artificial Intelligence, Robotics and Communication (1–8). Springer. https://doi.org/10.1007/978-981-99-4554-2_1 DOI: https://doi.org/10.1007/978-981-99-4554-2_1
Gegres, F., Azar, D. A., Vybihal, J., and Wang, J. T. L. (2022). Early Prediction of Movie Success Using Machine Learning and Evolutionary Computation. In Proceedings of the 21st International Symposium on Communications and Information Technologies (ISCIT) (177–182). IEEE. https://doi.org/10.1109/ISCIT55906.2022.9931277 DOI: https://doi.org/10.1109/ISCIT55906.2022.9931277
Gupta, S. K., Garg, T., Raj, S., and Singh, S. (2024). Box Office Revenue Prediction using Linear Regression in Machine Learning. In Proceedings of the International Conference on Artificial Intelligence and Quantum Computation-Based Sensor Application (ICAIQSA) (1–7). IEEE. https://doi.org/10.1109/ICAIQSA64000.2024.10882301 DOI: https://doi.org/10.1109/ICAIQSA64000.2024.10882301
Lopes, R. B., and Viterbo, J. (2023). Applying Machine Learning Techniques to Box Office Forecasting. In Proceedings of the International Conference on Information Technology and Systems (189–199). Springer. https://doi.org/10.1007/978-3-031-33261-6_17 DOI: https://doi.org/10.1007/978-3-031-33261-6_17
Quader, N., Gani, M. O., and Chaki, D. (2017). Performance Evaluation of Seven Machine Learning Classification Techniques for Movie Box Office Success Prediction. In Proceedings of the 3rd International Conference on Electrical Information and Communication Technology (EICT) (1–6). IEEE. https://doi.org/10.1109/EICT.2017.8275242 DOI: https://doi.org/10.1109/EICT.2017.8275242
San Arranz, G. (2020). Movie Success Prediction Using Machine Learning Algorithms (Unpublished manuscript).
Vardhan, S. V., Balaji, K. V. S., Kumar, C. A., and Kumar, C. J. (2025). From Buzz to Blockbuster: Predicting Movie Revenue Using a Hybrid Approach Combining Machine Learning and Sentiment Analysis. In Proceedings of the International Conference on Multi-Agent Systems for Collaborative Intelligence (ICMSCI) (pp. 1220–1227). IEEE. https://doi.org/10.1109/ICMSCI62561.2025.10894031 DOI: https://doi.org/10.1109/ICMSCI62561.2025.10894031
Velingkar, G., Varadarajan, R., and Lanka, S. (2022). Movie Box-Office Success Prediction Using Machine Learning. In Proceedings of the Second International Conference on Power Control and Computing Technologies (ICPC2T) (1–6). IEEE. https://doi.org/10.1109/ICPC2T53885.2022.9776798 DOI: https://doi.org/10.1109/ICPC2T53885.2022.9776798
Wadibhasme, R. N., Chaudhari, A. U., Khobragade, P., Mehta, H. D., Agrawal, R., and Dhule, C. (2024). Detection and Prevention of Malicious Activities in Vulnerable Network Security Using Deep Learning. In Proceedings of the International Conference on Innovations and Challenges in Emerging Technologies (ICICET) (1–6). IEEE. https://doi.org/10.1109/ICICET59348.2024.10616289 DOI: https://doi.org/10.1109/ICICET59348.2024.10616289
Zain, B. (2024). Decoding Cinematic Fortunes: A Machine Learning Approach to Predicting Film Success. In Proceedings of the 21st Learning and Technology Conference (LandT) (144–148). IEEE. https://doi.org/10.1109/LT60077.2024.10468906 DOI: https://doi.org/10.1109/LT60077.2024.10468906
Zheng, Y., Zhen, Q., Tan, M., Hu, H., and Zhan, C. (2021). COVID-19’s Impact on the Box Office: Machine Learning and Difference-in-Difference. In Proceedings of the 16th International Conference on Intelligent Systems and Knowledge Engineering (ISKE) (458–463). IEEE. https://doi.org/10.1109/ISKE54062.2021.9755401 DOI: https://doi.org/10.1109/ISKE54062.2021.9755401
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Ankit Shukla, Dr. Ganesh Baliram Dongre, Gouri Moharana, Dr. Prashant Suresh Salve, Kanchan Makarand Sangamwar, Dr. Pallavi Pankaj Ahire

This work is licensed under a Creative Commons Attribution 4.0 International License.
With the licence CC-BY, authors retain the copyright, allowing anyone to download, reuse, re-print, modify, distribute, and/or copy their contribution. The work must be properly attributed to its author.
It is not necessary to ask for further permission from the author or journal board.
This journal provides immediate open access to its content on the principle that making research freely available to the public supports a greater global exchange of knowledge.























