FRAMEWORK FOR WHEAT VARIETAL DATA EXPLORATION: INSIGHTS FOR ENSEMBLE LEARNING RESEARCH

Shivani Rastogi; Ranjana Sharma

doi:10.29121/shodhkosh.v5.i6.2024.3863

Authors

Shivani Rastogi Research Scholar, TMU Moradabad
Dr. Ranjana Sharma Associate Professor TMU, Moradabad

DOI:

https://doi.org/10.29121/shodhkosh.v5.i6.2024.3863

Keywords:

Ensemble Learning, Deep Learning, Agricultural Data Science, Crop Yield Prediction, Data Exploration and Visualization

Abstract [English]

Agricultural management and production rely heavily on advancements in technology for tasks such as crop yield forecasting, disease detection, and soil classification. However, machine learning models often encounter challenges related to the complexity and variability of agricultural datasets. This study addresses these challenges by integrating deep learning, ensemble learning methods, and extensive dataset exploration to enhance forecasting accuracy and model robustness. Despite the promise of these approaches, limited research has examined their combined effects on model performance. Our findings reveal significant improvements across various agricultural applications. By combining ensemble methods like Random Forest and Gradient Boosting Machines (GBM) with deep learning, the study achieved a 15% reduction in mean absolute error for irrigation scheduling and a 12% increase in recall for weed detection. These results underscore the potential of integrating modern techniques to optimize agricultural decision-making and improve predictive performance in diverse scenarios.

References

REFERENCES

Kamilaris, A., & Prenafeta-Boldú, F. X. (2018). Deep learning in agriculture: A survey. Computers and Electronics in Agriculture, 147, 70-90. https://doi.org/10.1016/j.compag.2018.02.016

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785-794. https://doi.org/10.1145/2939672.2939785

Russakovsky, O., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211-252. https://doi.org/10.1007/s11263-015-0816-y

Jolliffe, I. T., & Cadima, J. (2016). Principal component analysis: A review and recent developments. Philosophical Transactions of the Royal Society A, 374(2065), 20150202. https://doi.org/10.1098/rsta.2015.0202

Wilkinson, L. (2005). The grammar of graphics (2nd ed.). Springer. https://doi.org/10.1007/0-387-28695-0

Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189-1232. https://doi.org/10.1214/aos/1013203451

Zhou, Z.-H. (2009). Ensemble learning. In S. Wang (Ed.), Knowledge Discovery and Data Mining: Challenges and Realities, 1-34. IGI Global. https://doi.org/10.1007/s10115-009-0159-3

Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241-259. https://doi.org/10.1016/S0893-6080(05)80023-1

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer. https://doi.org/10.1007/978-0-387-84858-7

Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119-139. https://doi.org/10.1007/3-540-59119-2_166

Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123-140. https://doi.org/10.1007/BF00058655

Zhou, Z.-H., Wu, J., & Tang, W. (2002). Ensembling neural networks: Many could be better than all. Artificial Intelligence, 137(1-2), 239-263. https://doi.org/10.1016/S0004-3702(02)00190-X

Dietterich, T. G. (2000). Ensemble methods in machine learning. International Workshop on Multiple Classifier Systems, 1-15. https://doi.org/10.1007/3-540-45014-9_1

Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2(3), 18-22.

Sagi, O., & Rokach, L. (2018). Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(4), e1249. https://doi.org/10.1002/widm.1249

Quinlan, J. R. (1996). Bagging, boosting, and C4.5. Proceedings of the 13th National Conference on Artificial Intelligence, 725-730. AAAI Press.

Biau, G. (2012). Analysis of a random forests model. Journal of Machine Learning Research, 13, 1063-1095.

Polikar, R. (2006). Ensemble based systems in decision making. IEEE Circuits and Systems Magazine, 6(3), 21-45. https://doi.org/10.1109/MCAS.2006.1688199

Kotsiantis, S. B. (2007). Supervised machine learning: A review of classification techniques. Informatica, 31, 249-268.

Caruana, R., & Niculescu-Mizil, A. (2006). An empirical comparison of supervised learning algorithms. Proceedings of the 23rd International Conference on Machine Learning, 161-168. ACM. https://doi.org/10.1145/1143844.1143865

Opitz, D., & Maclin, R. (1999). Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, 11, 169-198. https://doi.org/10.1613/jair.614

Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832-844. https://doi.org/10.1109/34.709601

Dorigo, W., et al. (2007). A review on reflective remote sensing and data assimilation techniques for enhanced agroecosystem modeling. International Journal of Applied Earth Observation and Geoinformation, 9(2), 165-193. https://doi.org/10.1016/j.jag.2006.05.003

Pal, M. (2005). Random forest classifier for remote sensing classification. International Journal of Remote Sensing, 26(1), 217-222. https://doi.org/10.1080/01431160412331269698

Abrahamsen, P., & Hansen, S. (2000). Daisy: An open soil-crop-atmosphere system model. Environmental Modelling & Software, 15(3), 313-330. https://doi.org/10.1016/S1364-8152(00)00003-4

Yuan, M., et al. (2018). A study on application of random forests to the classification of landsat 8 satellite data. Remote Sensing, 10(3), 432. https://doi.org/10.3390/rs10030432

Verrelst, J., et al. (2016). Emulation of leaf, canopy and atmosphere radiative transfer models for fast model inversion. Remote Sensing of Environment, 169, 163-174. https://doi.org/10.1016/j.rse.2015.08.025

Witten, I. H., Frank, E., & Hall, M. A. (2011). Data mining: Practical machine learning tools and techniques (3rd ed.). Morgan Kaufmann. https://doi.org/10.1016/C2009-0-19715-5

Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. Proceedings of the Thirteenth International Conference on Machine Learning, 148-156. Morgan Kaufmann.

Dietterich, T. G. (1997). Machine-learning research: Four current directions. AI Magazine, 18(4), 97-136.

Breiman, L. (1999). Pasting small votes for classification in large databases and on-line. Machine Learning, 36(1-2), 85-103. https://doi.org/10.1023/A:1007515423169

Schapire, R. E. (1999). A brief introduction to boosting. Proceedings of the 16th International Joint Conference on Artificial Intelligence, 1401-1406. Morgan Kaufmann.

Ho, T. K. (1995). Random decision forests. Proceedings of the 3rd International Conference on Document Analysis and Recognition, 278-282. IEEE.

Quinlan, J. R. (1996). Bagging, boosting, and C4.5. Proceedings of the 13th National Conference on Artificial Intelligence, 725-730. AAAI Press.

Dietterich, T. G. (2000). Ensemble methods in machine learning. International Workshop on Multiple Classifier Systems, 1-15. Springer. https://doi.org/10.1007/3-540-45014-9_1

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.). Springer. https://doi.org/10.1007/978-0-387-84858-7

Polikar, R. (2006). Ensemble based systems in decision making. IEEE Circuits and Systems Magazine, 6(3), 21-45. https://doi.org/10.1109/MCAS.2006.1688199

REFERENCES

Kamilaris, A., & Prenafeta-Boldú, F. X. (2018). Deep learning in agriculture: A survey. Computers and Electronics in Agriculture, 147, 70-90. https://doi.org/10.1016/j.compag.2018.02.016 DOI: https://doi.org/10.1016/j.compag.2018.02.016

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324 DOI: https://doi.org/10.1023/A:1010933404324

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785-794. https://doi.org/10.1145/2939672.2939785 DOI: https://doi.org/10.1145/2939672.2939785

Russakovsky, O., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211-252. https://doi.org/10.1007/s11263-015-0816-y DOI: https://doi.org/10.1007/s11263-015-0816-y

Jolliffe, I. T., & Cadima, J. (2016). Principal component analysis: A review and recent developments. Philosophical Transactions of the Royal Society A, 374(2065), 20150202. https://doi.org/10.1098/rsta.2015.0202 DOI: https://doi.org/10.1098/rsta.2015.0202

Wilkinson, L. (2005). The grammar of graphics (2nd ed.). Springer. https://doi.org/10.1007/0-387-28695-0 DOI: https://doi.org/10.1007/0-387-28695-0

Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189-1232. https://doi.org/10.1214/aos/1013203451 DOI: https://doi.org/10.1214/aos/1013203451

Zhou, Z.-H. (2009). Ensemble learning. In S. Wang (Ed.), Knowledge Discovery and Data Mining: Challenges and Realities, 1-34. IGI Global. https://doi.org/10.1007/s10115-009-0159-3

Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241-259. https://doi.org/10.1016/S0893-6080(05)80023-1 DOI: https://doi.org/10.1016/S0893-6080(05)80023-1

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer. https://doi.org/10.1007/978-0-387-84858-7

Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119-139. https://doi.org/10.1007/3-540-59119-2_166 DOI: https://doi.org/10.1006/jcss.1997.1504

Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123-140. https://doi.org/10.1007/BF00058655 DOI: https://doi.org/10.1007/BF00058655

Zhou, Z.-H., Wu, J., & Tang, W. (2002). Ensembling neural networks: Many could be better than all. Artificial Intelligence, 137(1-2), 239-263. https://doi.org/10.1016/S0004-3702(02)00190-X DOI: https://doi.org/10.1016/S0004-3702(02)00190-X

Dietterich, T. G. (2000). Ensemble methods in machine learning. International Workshop on Multiple Classifier Systems, 1-15. https://doi.org/10.1007/3-540-45014-9_1

Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2(3), 18-22.

Sagi, O., & Rokach, L. (2018). Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(4), e1249. https://doi.org/10.1002/widm.1249 DOI: https://doi.org/10.1002/widm.1249

Quinlan, J. R. (1996). Bagging, boosting, and C4.5. Proceedings of the 13th National Conference on Artificial Intelligence, 725-730. AAAI Press.

Biau, G. (2012). Analysis of a random forests model. Journal of Machine Learning Research, 13, 1063-1095.

Polikar, R. (2006). Ensemble based systems in decision making. IEEE Circuits and Systems Magazine, 6(3), 21-45. https://doi.org/10.1109/MCAS.2006.1688199

Kotsiantis, S. B. (2007). Supervised machine learning: A review of classification techniques. Informatica, 31, 249-268.

Caruana, R., & Niculescu-Mizil, A. (2006). An empirical comparison of supervised learning algorithms. Proceedings of the 23rd International Conference on Machine Learning, 161-168. ACM. https://doi.org/10.1145/1143844.1143865 DOI: https://doi.org/10.1145/1143844.1143865

Opitz, D., & Maclin, R. (1999). Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, 11, 169-198. https://doi.org/10.1613/jair.614 DOI: https://doi.org/10.1613/jair.614

Ho, T. K. (1998). The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8), 832-844. https://doi.org/10.1109/34.709601 DOI: https://doi.org/10.1109/34.709601

Dorigo, W., et al. (2007). A review on reflective remote sensing and data assimilation techniques for enhanced agroecosystem modeling. International Journal of Applied Earth Observation and Geoinformation, 9(2), 165-193. https://doi.org/10.1016/j.jag.2006.05.003 DOI: https://doi.org/10.1016/j.jag.2006.05.003

Pal, M. (2005). Random forest classifier for remote sensing classification. International Journal of Remote Sensing, 26(1), 217-222. https://doi.org/10.1080/01431160412331269698 DOI: https://doi.org/10.1080/01431160412331269698

Abrahamsen, P., & Hansen, S. (2000). Daisy: An open soil-crop-atmosphere system model. Environmental Modelling & Software, 15(3), 313-330. https://doi.org/10.1016/S1364-8152(00)00003-4 DOI: https://doi.org/10.1016/S1364-8152(00)00003-7

Yuan, M., et al. (2018). A study on application of random forests to the classification of landsat 8 satellite data. Remote Sensing, 10(3), 432. https://doi.org/10.3390/rs10030432 DOI: https://doi.org/10.3390/rs10030432

Verrelst, J., et al. (2016). Emulation of leaf, canopy and atmosphere radiative transfer models for fast model inversion. Remote Sensing of Environment, 169, 163-174. https://doi.org/10.1016/j.rse.2015.08.025 DOI: https://doi.org/10.1016/j.rse.2015.08.025

Witten, I. H., Frank, E., & Hall, M. A. (2011). Data mining: Practical machine learning tools and techniques (3rd ed.). Morgan Kaufmann. https://doi.org/10.1016/C2009-0-19715-5 DOI: https://doi.org/10.1016/B978-0-12-374856-0.00001-8

Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. Proceedings of the Thirteenth International Conference on Machine Learning, 148-156. Morgan Kaufmann.

Dietterich, T. G. (1997). Machine-learning research: Four current directions. AI Magazine, 18(4), 97-136.

Breiman, L. (1999). Pasting small votes for classification in large databases and on-line. Machine Learning, 36(1-2), 85-103. https://doi.org/10.1023/A:1007515423169 DOI: https://doi.org/10.1023/A:1007563306331

Schapire, R. E. (1999). A brief introduction to boosting. Proceedings of the 16th International Joint Conference on Artificial Intelligence, 1401-1406. Morgan Kaufmann.

Ho, T. K. (1995). Random decision forests. Proceedings of the 3rd International Conference on Document Analysis and Recognition, 278-282. IEEE.

Quinlan, J. R. (1996). Bagging, boosting, and C4.5. Proceedings of the 13th National Conference on Artificial Intelligence, 725-730. AAAI Press.

Dietterich, T. G. (2000). Ensemble methods in machine learning. International Workshop on Multiple Classifier Systems, 1-15. Springer. https://doi.org/10.1007/3-540-45014-9_1 DOI: https://doi.org/10.1007/3-540-45014-9_1

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.). Springer. https://doi.org/10.1007/978-0-387-84858-7 DOI: https://doi.org/10.1007/978-0-387-84858-7

Polikar, R. (2006). Ensemble based systems in decision making. IEEE Circuits and Systems Magazine, 6(3), 21-45. https://doi.org/10.1109/MCAS.2006.1688199 DOI: https://doi.org/10.1109/MCAS.2006.1688199