Article Type: Research Article Article Citation: Henny Pramoedyo,
Sativandi Riza, Afiati Oktaviarina, and Deby Ardianti. (2021). GEOGRAPHICALLY
WEIGHTED REGRESSION AND MULTIPLE LINEAR REGRESSION FOR TOPSOIL TEXTURE
PREDICTION. International Journal of Research -GRANTHAALAYAH, 9(2), 64-71. https://doi.org/10.29121/granthaalayah.v9.i2.2021.3112 Received Date: 15 January 2021 Accepted Date: 23 February 2021 Keywords: Regression Geographically
Weighted Regression Soil Texture
Modelling Terrain Analysis Digital Elevation
Model Land resource management requires extensive land mapping. Conventional soil mapping takes a long time and is expensive; therefore, geographic information system data as a predictor in soil texture modeling can be used as an alternative solution to shorten time and reduce costs. Through digital elevation model data, topographic variability can be obtained as an independent variable in predicting soil texture. Geographically weighted regression is used to observe the effects of spatial heterogeneity. This study uses a data set of 50 observation points, each of which had soil particle-size fraction attributes and eight local morphological variables. The covariates used in this study are eastness aspects, northness aspects, slope, unsphericity curvature, vertical curvature, horizontal curvature, accumulation curvature, and elevation. Prediction using geographically weighted regression shows more results compared to multiple linear regression models. The spatial location can affect product Y, with the R2 value of 0.81 in the sand fraction, 0.57 in the silt fraction, and 0.33 in the clay fraction.
1. INTRODUCTIONSoil texture is influenced by topographic variability, which modifies water flow and material distribution to produce a soil pattern in a landscape [1]. Mapping of soil texture is needed as the main source of information in land resource management [2]. Soil mapping is conducted using conventional methods, which require large amounts of time and high costs. This results in minimal information regarding the broad spatial distribution of soil textures. In studies on soil texture mapping, many methods are utilized, including modeling [2], [3], [4], which produces soil texture mapping efficiently and accurately. The combination of statistical modeling and GIS is an alternative solution to shorten the time and reduce costs. Hence, GIS data can be used as predictor variables in modeling [5], including GIS data for topographic variability to predict soil texture, which is the digital elevation model (DEM) [6]. Through DEM data, topographic variability can be obtained as an independent predictor of soil texture. The simplest modeling, when there are two or more predictor variables, is multiple regression analysis. Multiple linear regression can model or predict an object by looking at the relationship between the dependent variable and a group of independent variables [7]. However, in regression analysis, several assumptions must be met. This regression is applied to modeling data that are influenced by spatial aspects or geographic conditions, and there will be assumptions that are difficult to fulfill that lead to spatial heterogeneity [8]. Spatial heterogeneity is a condition defined by different conditions from one location to another [9]. Additionally, this study uses geographically weighted regression (GWR) to observe the effects of spatial heterogeneity. GWR is based on a non-parametric technique of a locally weighted regression developed in statistics for curve fitting and smoothing [10]. Then, we compare the results of simple multi-linear regression with modeling using GWR. This study expects to produce a soil texture prediction model with high accuracy. 2. MATERIALS AND METHODThe topsoil at a depth of 0-10 cm based on 50 randomly selected samples was taken from the Kalikonto watershed, in Malang, during June-July 2020. Soil texture content was then derived from the laboratory analysis and used as the primary data in this study. This was because soil texture is a combination of three particle-size fractions (PSFs): sand, silt, and clay. Modeling is conducted on the three PSFs, which are the Y variables. The X variables used in this study are eastness aspects (Ae) as X1, northness aspects (An) as X2, slope (S) as X3, unsphericity curvature (M) as X4, vertical curvature (Kv) as X5, horizontal curvature (Kh) as X6, accumulation curvature (Ka) as X7, and elevation (Elv) as X8. 2.1. DATA SETSThis study's data sets consisted of 50 observation points, each of which had soil PSF attributes, and eight local morphological variables (LMV), which showed curvature diversity of a topography [11]. The LMV was obtained from the formula shown in Table 1. However, to obtain this variable, an analysis of the DEM data was performed to obtain the value derived from the elevation, which is the DEM digital number value. To obtain the derived value of the elevation, the following formula is used [12]: Where z is the elevation, and w is the cell size in pixels. We apply a 3x3 window calculation to perform this analysis. Table 1: Formula to obtain the LMV [11]..
2.2. MULTILINEAR REGRESSION ANALYSISMultilinear regression analysis is the development of a simple regression analysis that explains and describes the relationship between the response variable and more than one predictor variable [13]. The regression equation model that can be formed with n observations and p predictor variables can be written as follows [7]: Where:
Before starting the analysis, we performed several assumption tests as a standard procedure in regression analysis. We conducted the normality test, heterogeneity test, and non-multicollinearity test. 2.3. GEOGRAPHICALLY WEIGHTED REGRESSIONIn the spatial aspect, we tested the spatial autocorrelation by using the test statistic Moran’s I, based on the following hypotheses [14]: Hypotheses: (no spatial correlation). (there is a spatial correlation), if true test statistic,
and
Where is the mean of , is the element of weighted matrix, is Moran’s index, is the expected value of Moran’s index, and is the number of samples. The Breusch–Pagan test was used to test the spatial heterogeneity, based on the following hypotheses [15]: Hypothesis: H1 : there are at least one j where If true test statistic, Where is is , is the galat vector, is the weighting matrix, is the matrix containing the standard predictor variable, and T is The GWR model considers geographic factors and produces local estimators of the parameter model for each point or location [16]. The GWR model is as follows: Where yi is the observed value of the ith predictor variable, xik is the kth predictor variable's observed value, is the regression model intercept value, is the kth predictor variable regression coefficient, and is the i-error. The weighted least square method is used to estimate the parameter of the GWR model that produces different weighting in each location. The following is the parameter estimation for the GWR model [16]: From equation (5), the parameter coefficient of the GWR model for each location has different values. The
weighting forms by kernel function are divided into fixed kernel and adaptive kernel
(Fotheringham). The fixed kernel
function has the same bandwidth
in all locations.[17] Where is the bandwidth, is the adaptive bandwidth, and is the Euclidean distance with, Where is the coordinate point in location, and is the coordinate point in location. Additionally, is optimum bandwidth with the cross validation (CV) method Where n is the number of samples, and is the estimated value of Partial testing in the GWR parameter model is used to determine which predictor variable influences the response variable for each location. Based on the following hypotheses: Hypotheses: The statistics test can be written as [16]:
Where, and is a diagonal matrix element Reject if the test statistic 3. RESULTS AND DISCUSSION3.1. MULTILINEAR REGRESSION RESULTFor the sand model, the equation for the multiple linear regression model obtained is as follows: Based on the model obtained, An, M, Kv, and Kh have a positive relationship to the sand soil fraction. Meanwhile, Ae, S, Ka, and Elv have a negative relationship with the sand soil fraction. For example, the lower the Ka value, the lower the sand soil fraction. This multiple linear regression model produces an R2 value of 0.6285, which means that the study's independent variables simultaneously affect the sand soil fraction of 62.85%, and other variables outside the research variables influence the remaining 37.15%. The equation for the silt model obtained is as follows: Based on this model, An, M, Kv, and Ka have a negative relationship with the silt soil fraction. Meanwhile, Ae, S, Kh, and Elv have a positive relationship with the sand soil fraction. For example, the lower the Ka value, the silt soil fraction will increase. This multiple linear regression model produces an R2 value of 0.5503, which means that the study's independent variables simultaneously affect the sand soil fraction by 55.03%, and the remaining 44.97% is influenced by other variables outside the research variables. For the clay model, the equation for the multiple linear regression model obtained is as follows:
Based on this model, An, M, and Kh have a negative relationship with the clay fraction. Meanwhile, Ae, S, Kv, Ka, and Elv positively correlate with the clay soil fraction. For example, the lower the Ka value, the higher the clay soil fraction. The multiple linear regression model produces an R2 value of 0.3034, which means that the independent variables simultaneously affect the sand soil fraction of 30.34% and the remaining 69.66% for other variables outside the research variables. The above models met the standard test for multiple regression analysis. 3.2. GWR ANALYSIS RESULTBased on the results of the spatial dependence test in this study, the p-value of the three types of soil is smaller than α = 0.05; therefore, a spatial dependence on observations exists. Likewise, with the results of the heterogeneity test in the three PSFs, spatial heterogeneity exists. Therefore, based on testing the spatial aspect, spatial dependence on observations and spatial heterogeneity exist, so the multiple linear regression method is not appropriate for describing the phenomenon of soil types. Therefore, it is better to use a model that accommodates the location factor of the observation. The first step in GWR modeling is to determine the optimal bandwidth and minimum CV by using fixed Gaussian spatial weighting. The minimum CV and bandwidth results are shown in Table 2. Table 2: Minimum CV and bandwidth
Then the GWR result is shown in Table 3. Table 3: GWR Model
Table 4: MLR and GWR models comparison
Based on Table 4, the value of the R2 GWR model for the three types of soil is greater than the value of the multiple regression R2, meaning that the GWR model is better for modeling the existing data. 4. CONCLUSIONPrediction using GWR shows more results compared to multiple linear regression models. The spatial location can affect product Y, with the R2 value of 0.81 in the sand fraction, 0.57 in the silt fraction, and 0.33 in the clay fraction. SOURCES OF FUNDINGThis research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors. CONFLICT OF INTERESTThe author have declared that no competing interests exist. ACKNOWLEDGMENTThis research is sponsored
by grants professors and doctors, faculty of mathematics and natural sciences,
University of Brawijaya. The authors are grateful to the anonymous referees for
a careful checking of the details and for helpful comments that improved the
overall presentation of this paper. REFERENCES [12] I. V. Florinsky, Digital Terrain
Analysis in Soil Science and Geology. 2012. [14] M. Fischer and A. Getis, Handbook
of Applied Spatial Analysis. New York: Springer, 2010.
This work is licensed under a: Creative Commons Attribution 4.0 International License © Granthaalayah 2014-2020. All Rights Reserved. |