Regression Machine Learning

Red gram is the second mostimportant pulse crop of India after Bengal gram . India accounts for 65 % global seed [4]. Its ability to produce high economic yields even under rainfed conditions and being an indispensable part of Indian meals due to high protein of 22.3 % further assumes significance in yield estimation. Telangana state ranks third in red gram cultivation in an area of 2.3 L ha during 2020-2021 after Maharashtra and Karnataka (http://www.pjtsau.edu.in). A steep rise in support price as compared with other pulses further necessitates seed yield estimation to understand fluctuations in its production due to the vagaries in monsoon as being grown mainly as rainfed crop. In this context of meeting the local and global supply chain demand, machine learning algorithms come in handy in estimating weather based yield estimates. Regression Machine learning has been gaining popularity in agricultural applications due to its success in bioinformatics.

Crop yield prediction with machine learning techniques is the latest subject in literature and was considered for various crops like a wheat and rice [1] and groundnut [9]. Machine learning is a subset of artificial intelligence that enables an algorithm to learn from the experiences without being clearly programmed.

Basically, machine learning can be categorized into three broadcategories namely supervised learning, unsupervised learning, and reinforcement learning. In this article, fivesupervisedregression machine learning algorithms namelyf Gaussian processes, linear regression,support vector machines, k-Nearest neighbors and decision tree wereuto build the most accurate andeffective model since the learning information occurs with required outputs and also theobjective of the study was to determine a common rule of showing input to output. Moreover, regression machine learning algorithms take a data-driven technique to learn useful models and relationships from input data [10] and provides a best way for improving crop yield predictions. In addition, regression machine learning algorithms have some individual benefits like, they can model non-linear relationships between multiple data sources [3].

As [11] studied the effect of rainfall on crop yield using the regression machine learning algorithm and reported that the Gaussian Processes model explained the good degree of relationship between annual rain fall and wheat yield. As proposed systemby [2] in order to improve crop yield using different machine learning algorithms namely back-propagation, k means clustering and random forest. The results explained that, random forest algorithm works well with small and large experimental datasets and with highprecision on evaluating with other algorithms. [6] integrated five field-based soil properties and topographic data to predict maize yield by applying various regression machine learning algorithms namely random forest, neural network, support vector machine. The result determined that a random forest always better than other fitted models. As [5] used a random forest algorithm for global and regional crop yield such as maize, wheat and environmental parameters such as soil, climate, fertilization data etc. Results demonstrated random forest is an effective and dynamic algoritm for crop yield prediction with high accuracy and precision [7], [10]. Thus, the main objective of this study was framed to explore the possibility of suggesting a suitable regression machine learning algorithm for predicting of red gram yields in Ranga Reddy district of Telangana.

MATERIALS AND METHODS

The present investigation was undertaken to appraise the relationship between weather parameters and red gram yield with regression machine learning algorithms.The average yield data for red gram over a period of 31 years i.e. 1988-2019 were collected from the Directorate of Economics and Statistics, Government of Telangana, India. The daily weather data of maximum temperature, minimum temperature, morning relative humidity, evening relative humidity, rainfall , bright sunshine, wind speed and evaporation during the crop season(30^th to 47^thMeteorological Standard Weeks) were also collected from Agro-climate Research Centre, PJTS Agriculture University, Hyderabad. These daily weather data were compiled as weekly for the purpose of analysis.Steps involved as suggesting suitable machine learning algorithm for predicting red gram yields:

Experimental Dataset: It was prepared in an excel sheet with a CSV extension for study by a machine learning system (Weka 3.8.5).
Normalized Dataset: Min-max algorithm was used to normalize the dataset as it one of the most regular ways to normalize data.
Attribute Selection: The attribute evaluator namely “cfsSubsetEval” and search method namely “BestFirst” were used as it selected those feature variables which contribute most to the prediction variable (Table 1).
Evaluate Algorithms: The five regression machine learning algorithms (Table 2) namely Gaussian Processes, Linear Regression, Support Vector Machines, k –Nearest Neighbors and Decision Tree were then employed over the experimental data set. The results of each regression algorithm were noted and compared with each other.
Mean Absolute Error, Root Mean Squared Error, Relative Absolute Error, Root Relative Squared Error and Coefficient of determination values were taken into consideration for each regression algorithm.

RESULTS AND DISCUSSION

An open source system Weka 3.8.5 is a collection of regression machine learning algorithms for regression analysis. The regression machine learning algorithms can be applied directly to an experimental dataset. Weka has several useful regression machine learning algorithms to make crop yield predictions. All regression machine learning algorithms are usually driven by the number of feature variables, the shape of the regression line and the type of target variables. From weka regression algorithms, five machine learning algorithms are evaluated namely Gaussian Processes (GP), Linear Regression (LR), Support Vector Machines (SMOReg), k Nearest Neighbors (IBK) and Decision Tree (Random Forest). The performance of each algorithm is checked in terms of MAE, RMSE, RAE, RRSE and R².The Fig. 1 explains the graphical distribution of each selected attribute. It reveals that the attributes have differed distribution range.

The characteristics of fitted regression machine learning algorithms in Table 3 indicated that the tree based algorithm exhibited better performance as compared with function based algorithms and lazy based algorithm. In case offunction based algorithms, three algorithms were examined namely, Gaussian processes, linear regression andSMOReg. Among these, SMORegshowcased better performance as compared with other fitted algorithms. However, in general, it could be observed that, the highest R² value and lowest MAE value suggests that the fitted random forest algorithm was adequate in predicting the relationship between the weather parameters and red gram yield. Therefore, the random forest algorithm is appropriate to predict Redgram yield algorithmas an efficient and adaptable machine-learning algorithm for crop yield prediction because of its high exactness and precision, ease of use, and usefulness in data analysis..

The results were compared with multiple linear regressions and evaluated using R² and MAE. Results demonstrated random forest is an efficient algorithm for crop yield prediction with high accuracy and precision. [6] studied three machine learning algorithms namely random forest, neural network, and support vector machine for maize yield prediction. Their results also corroborated the finding that the random forest algorithm was consistently better as compared with other fitted algorithms. [5], [8] also opined similarly.

The Fig. 2 shows the prediction accuracy of different fitted regression machine learning algorithms. Out of five algorithms used in this research work, the Random Forest algorithm was better in crop yield predictability as compared with other fitted regression algorithms with 95 % (R²) followed by KNN ( 89 % ), while Gaussian Processes exhibited the lowest predictability (74%).

The Fig. 3 depicts the error results of the different regression machine learning algorithms. Random Forest exhibited lowest Mean Absolute Error (MAE) of 32.7 and Root Mean Squared Error (RMSE) of 40.8. This exposed minimal error estimated during the crop yield prediction processes. In contrast, Gaussian Processes had resulted in the highest error rate with 71.6 and 85.5 of MAE and RMSE, respectively.

A predicted yield error rate of the random forest algorithm for training and testing data set, respectively as shown in the Fig.4 and Fig. 5demonstrated that, the predicted yields were both over and underestimated for different years. In the case of a training data set, the predicted yield was underestimated by 11.9 %, 2.2 %, 8.3 % , 7.9%, 1.3 %,10.6 %, 3.9 %,10.3 %, 6.5 % , 1.8 % and 8.5 % for the years 1994,1995,1999,2001,2003,2004,2005,2006,2009,2013 and 2014 respectively. But, predicted yield were overestimated by 17.8 %, 13.0 %, 4.5 %, 5.4 %, 30.4 %, 4.8 %, 11.0 %, 5.6 %, 18.7 %, 47.8 %, 0.4 %, 15.7 % and 7.2%accordingly for the years 1988, 1989, 1990, 1991, 1992, 1996, 1997, 2000, 2002, 2007, 2010, 2011 and 2012, respectively. For the testing data set, the predicted yield was underestimated by 17.8 % and 29.2 % for the years 2015 and 2016 respectively while, overestimated by 27.9 % and 12.4 % for the years 2017 and 2018, respectively. The predicted yield error rates ranged from –11.9 % to 47.8 % for a training data set and while it ranged from – 29.2 % to 27.9 % in the case of testing data set.

The predicted yield based on the training data set is presented in Table 4. The same is demonstrated in Fig.6. It was noticed that the actual yield and the predicted yield were close to each other. The residual ranged from -76 to 99 while it ranged from -148 to 111 in case of testing data by the same Random Forest model .

CONCLUSION

Appraising the relationship between red gram crop yields and the weather is an important dimension of its seed yield estimation because the crop is mainly grown as a rainfed crop. Five supervised regression machine learning algorithms namely, Gaussian processes, linear regression,support vector machines, k-nearest neighbors and decision tree were used in the study as these have been gaining popularity in agricultural applications due to its success in yield estimation. Among these, the Random Forest algorithm was found be superior with crop yield predictability of 95 % (R²), lowest Mean Absolute Error (MAE) of 32.7 and Root Mean Squared Error (RMSE) of 40.8 as compared with other fitted regression algorithms.

Future scope of work

The study can be further extrapolated and make the model robust by interfacing with GIS for better utilization by the stakeholders.

Author statement (Disclaimer):

The contents and views expressed in this research paper are the views of the authors and do not necessarily reflect the views of the organizations they belong to”.

Conflict of Interest: All the authors declare that there exists no conflict of interest.

Acknowledgement:

The authors are thankful to Directorate of Economics and Statistics, Government of Telangana, for sparing redgram yield data and Telangana State Development and Planning Society, Govt of Telangana , Hyderabad for providing weather data.

References

Baby Akula, RS Parmar, M P.Raj, and K. Indudhar Reddy. (2021). Prediction for rice yield using data mining approach in Ranga Reddy district of Telangana. J of Agrometeorol. 23(2):242-248.
Bhanumathi, S., Vineeth, M and Rohit, N. (2019). “Crop Yield Prediction and Efficient use of Fertilizers,” 2019 International Conference on Communication and Signal Processing (ICCSP), 2019, pp. 0769-0773, doi: 10.1109/ICCSP.2019.8698087.
Chlingaryan, Anna, Sukkarieh, Salah, Whelan and Brett. (2018). Machine learning approaches for crop yield prediction and nitrogen status estimation in precision agriculture: A review. Computers and Electronics in Agriculture. 151. 61-69.
Food and Agriculture Organization Of the United States 2019. FAOSTAT Statistical Database. (http://www.fao.org/).
Jeong JH, Resop JP, Mueller ND, Fleisher DH, Yun K, et al. (2016) Random Forests for Global and Regional Crop Yield Predictions. PLOS ONE 11(6): e015657.
Khanal, Uttam, Wilson, Clevo, Lee, Boon L., Hoang, Viet-Ngu (2018). Climate change adaptation strategies and food productivity in Nepal: a counterfactual analysis. Climatic Change. 148(4): 575–590.
Konstantinos G. Liakos, PatriziaBusato, DimitriosMoshou, Simon Pearson and DionysisBochtis, 2018. Machine Learning in Agriculture: A Review, Sensors: 2018,18: (8), 2674; https://doi.org/10.3390/s1808
Maya Gopal, P,S., Bhargavi, R. 2019. A novel approach for efficient crop yield prediction. Computers and Electronics in Agriculture.165:1-9.
Shah Vinita, Shah Prachi. 2018 Groundnut Crop Yield Prediction Using Machine Learning Techniques, International Journal of Scientific Research in Computer Science, Engineering and Information Technology © 2018 IJSRCSEIT | Volume 3 | Issue 5 | ISSN : 2456-3307 pages 1093-1097
Simon Willcock Javier Martínez-López Danny A.P. Hooftman Kenneth J. Bagstad Stefano Balbi Alessia Marzo Carlo Prato Saverio Sciandrello Giovanni Signorello Brian Voigt Ferdinando Villa James M. Bullock Ioannis N. Athanasiadis. Machine learning for ecosystem services.Ecosystem Services: 33 (2018) 165–174
Vagh, Y. (2012). An Investigation into the effect of stochastic annual rainfall on crop yields in South Western Australia. International Journal of Information and Education Technology, 2(3), 227-232.

Post Views: 926

Agriculture Association of Textile Chemical and Critical Reviews Journal (AATCC Review)

Blog

Appraising relation between weather and Red gram yield estimates with

Abstract

Regression Machine Learning

About Journal

Tags

Article Processing Charges (APC)