Akaike Weight Bertalanffy-Pütter Differential Equation Least Squares Near-Optimal Models Forecasting Model-Uncertainty
Aic- Akaike Information
Criterion Bp-Model- Bertalanffy-Pütter Model Sse- Sum of Squared Errors Using a classical example for technology diffusion, the mechanization of agriculture in Spain since 1951, we considered the forecasting-intervals from the near-optimal Bertalanffy-Pütter (BP) models. We used BP-models, as they considerably reduced the hitherto best fit (sum of squared errors) reported in literature. And we considered near-optimal models (their sum of squared errors is almost best), as they allowed to quantify model-uncertainty. This approach supplemented traditional sensitivity analyses (variation of model parameters), as for the present models and data even slight changes in the best-fit parameters resulted in very poorly fitting model curves.
## 1. INTRODUCTION## 1.1. BACKGROUND
Technology forecasting uses a wide range of methods (Firat
et al., 2008). This paper focuses on a popular phenomenological approach, trend
analysis for the diffusion of technologies by means of sigmoidal (S-shaped)
model curves. Here, an often considered three-parameter model was the Verhulst
model of logistic growth (e.g. Adamuthe & Thampi, 2019; Naseri &
Elliott, 2013; Yamakawa et al., 2013). This paper
proposes a five-parameter model to describe the success (growth, diffusion) of
a technology, the Bertalanffy-Pütter (BP) model. It
generalizes many conventional three-parameter models (e.g. logistic growth) and
therefore improves their fit. In addition, the two additional parameters allow
to identify near-optimal three-parameter models that do not differ
significantly from the best-fit BP-model. Comparing the forecasts from these
near-optimal models with the forecast from the best-fit model results in new
quantitative indicators for assessing model-uncertainty ## 1.2. PROBLEM
Model uncertainties arise, when gaps in knowledge about
the true drivers and mechanisms of growth cannot be reduced by an analysis of
the past observations. For instance, as was observed for the prognosis of
cancer, many growth curves with different shapes in the future may fit well to
given historic data (Kühleitner et al., 2019). The
paper develops a new approach to assess model-uncertainty and it illustrates it
for
forecasting. In order to measure model-uncertainty quantitatively, Bai & Jin (2016) suggested to use the relative error of the
prediction. However, We propose a different approach. We identify models
that fit well to the current data and that therefore in a technical sense
defined below have a certain “probability to be true”. Thereby, we use the Akaike information
criterion Based on this notion, we define a forecasting interval
by the lower and upper bounds of the predictions drawn from likely models.
Thereby, we consider all BP-models with a certain probability to be true, when
compared to the best-fitting one. The forecasts based on these models define
the forecasting interval. This concept resembles the confidence interval, but
the confidence interval assumes a fixed model that is fitted to random
variations of the data, while for the forecasting interval the data remain fixed.
## 2.
MATERIALS
AND METHODS
## 2.1. MATERIALS
The computations used Mathematica 12.0 software of Wolfram Research. The
results of optimization were exported to a spreadsheet (Microsoft Excel) and
reimported into Mathematica for further analysis of the model-uncertainty. ## 2.2. DATA
We use data about the mechanization of agriculture in Spain by means of
tractor ownership. The primary sources were census data and government reports
over the period 1951-1976 (Mar-Molinero, 1980). While
the data may appear to be outdated, they cover an interesting phase for
agriculture in Spain: During the early 1950s, the ancient regime of Franco gave
up its disastrous policy of economic autarky and in the 1960s and 70s this was
followed by a policy of modernization of the agricultural sector (Táboas et al., 2019). Further, the data for 1951-1976 were
used repeatedly to test approaches to forecasting (e.g. Nguimkeu,
2014; Franses, 1994; Meade, 1984; Mar-Molinero, 1980). Gurung et al. (2018) is a recent related
study about the mechanization of agriculture in India. The data were rescaled to start at
1) Source:
data combined from Figure 5 of Mar-Molinero
(1980) and open data from World Bank (2019). 2) Start with
1951 (year 0); 1976 corresponds to year 25. 3)
One unit is a stock of 10,000 tractors. ## 2.3. BERTALANFFY-PÜTTER (BP) MODEL
The growth function The five model parameters are determined from fitting the model to
historical data: Four parameters are displayed in the equations, namely the
non-negative exponent-pair Each exponent-pair ( In literature, there are several other five-parameter growth models,
such as the model of Bass (1969) for market diffusion or the model of Monod
(1949) for bacterial kinetics. We decided to use the BP-model, as it was very
flexible. In comparison to other five-parameter models, this versatility had
the disadvantage that for the BP-model the standard optimization tools (e.g.
command Non-Linear Model Fit in Mathematica) did not always identify the
best-fit parameters (numerical instability). However, as explained below, we
could overcome this difficulty, whence this model is now feasible for
practitioners. As to another limitation, the BP-model is not suitable in
situations, where both growth and decay occur. Rather, it is intended to
improve the fit to the data in situations, where initially e.g. logistic growth
has been considered as a viable model. ## 2.4. DATA FITTING
The most common method for data-fitting, used also for this paper, is
the method of least squares, which fits a (nonlinear) model to past data (Satoh
& Matsumura, 2018). Thus, model selection aimed at finding parameters that
minimized the sum of squared errors y)
are the _{i}N data, then SSE is defined by equation (2): As explained above, for model (1) the standard optimization tools did
not find best-fit parameters to minimize (2). We did overcome this difficulty
by considering exponent-pairs (
For each exponent-pair of the grid we identified the best fitting model
parameters ( The best fitting model had the overall least sum of
squared errors ( a, _{min}b)
with an accuracy of 0.01 (as we searched only grid-points) and its parameters _{min}p, _{min}q, _{min}c
that minimized _{min}SSE(_{opt}a, _{min}b)
= _{min}SSE.
_{min}For each grid-point ( b)
was identified from the row, where the least value of _{min}SSE was attained. ## 2.5. COMPARISON OF MODELS
We use N–1 or N, depending on the estimator for the variance). The use of SSE was based on assuming normally
distributed errors. Different statistical assumptions have led to weighted sums
of squared errors. In addition, we used the Akaike information criterion, ; c.f. Burnham &
Anderson (2002) or Motulsky & Christopoulos
(2003). Here, N is the number of the
data-points. For the 1951-1976 data, N
= 26; for the truncated data, N
ranges from 16 to 25; and for the 1951-2009 data, N = 59. Further, K = 4 is
the number of optimized parameters (c,
p, q and SSE as an implicit
parameter). As we compared a finite set of three-parameter BP-models BP (a, b) using exponent-pairs chosen from a
grid, we did not penalize the best fitting model by a higher number of
parameters (the optimal exponent-pair was roughly approximated, but not yet
computed). Thus, for these models, the least AIC was achieved
for the model defined from the exponent-pair (_{min}a, _{min}b). _{min}If the model with AIC,
then the probability that the other model is true (in an information theoretic
meaning) is given by the relative Akaike weight , using the difference d = AIC–AIC.
This probability is at most 50% (comparing the model with the least _{min}AIC with an almost as good model).## 2.6. FORECASTING INTERVALS
The starting point of our new approach are the data and u-near-optimal
exponent-pairs (a, b). Thereby, an exponent-pair (a, b) is u-near-optimal with
model-uncertainty u, if SSE(_{opt}a, b) ≤ (1+u)×SSE. For each
exponent-pair we also consider the best fitting growth curve _{min}y(_{a,b}t); it is specified by its parameters a, b, c,
p, q. All models that are
displayed in the yellow search region of Figure 1 by their exponent-pairs
are meant to realize the best possible fit to the given data, i.e. depending on
a and b (which defined the model) the model parameters p, q
and c were optimized according
formula (3). Therefore, for the data that represent the past, there was barely
a difference between the model curves for models whose SSE did not differ much from SSE._{min}In analogy to confidence intervals we now define: For
a point of time T) for the level u of model-uncertainty has as its end-points the least upper and
the largest lower bounds of the function values y(_{a,b}T) associated to u-near-optimal exponent-pairs. For the computation
of a forecasting interval I(_{u}t),
filtering in the table of optimal parameters identified the rows with SSE ≤ (1+u)×SSE. The
parameters of each of these rows defined a growth function _{min}y
that was evaluated at _{a,b}T. The minimum
and maximum of these values defined the boundaries of the interval.In order to obtain a closer analogy to the confidence
intervals, we related model-uncertainty in the following way to the
probability that a model is true in the information theoretic sense, using ## 3. RESULTS AND DISCUSSIONS## 3.1. PREVIOUS OUTCOMES
The trend for tractors has been studied repeatedly.
There is a consensus in literature that the growth of tractors would follow a
logistic model. Mar-Molinero (1980) compared several
models and for the 1951-1976 tractor data he reported Mar-Molinero (1980) also reported an
“unexplained residual sum of squares” of 2.57. However, he referred to
autocorrelation, using a fit of a time series: for the residuals and the unexplained error . This paper aims at
improving the fit by using a better growth curve, but it does not follow up on
the autocorrelation. ## 3.2. DATA FITTING
In order to identify the best-fit BP-model for the
1951-1976 tractor data, and for the truncated data, we searched between 0.9-1.3´10
1) This indicates the data from
1951 to the displayed year
Judging from
## 3.3. FORECASTING
Figure 4 displays the near-optimal exponent-pairs for the truncated
data for 1951-1971. This figure was obtained as a by-product of our approach to
data-fitting, where for each of almost 127,000 exponent-pairs ( SSE (attained at the black dot) by more than _{min}u = 10%. The blue area corresponds to
the near-optimal exponent-pairs with u
= 34%. The best fitting models using these exponent-pairs have a probability of
5% or more to be true, when compared to the best-fit model. We repeated these
computations for the truncated tractor data till 1970, 1971, …
and 1975. (In view of Figure 2, the data truncated at 1969 or earlier were
unsuitable for prognosis and in view of Figure 3 their best-fit
exponent-pairs were remote from the exponent-pair for the data till 1976.)
Next, for each of near-optimal exponent-pair ( t)
was identified and its “future” values were also evaluated (“future” referring
to the perspective of the fitted data). Their upper and lower bounds defined
forecasting intervals. Figure 5 plots the resulting forecasting band
corresponding to Figure 4, prognosis for the stock of tractors based on
the data for 1951-1971. For the present data and the chosen model-uncertainty
all data points from 1972-1976 (plotted in green) were within this forecasting
band.
These computations were repeated for all truncated data. Table 3
summarizes these results in the following way: It identifies the least model
uncertainty that was needed to include the true value for 1976 in the
forecasting interval. For instance, for each red exponent-pair ( y that attained a higher value and
selected the one with the least _{a,b}SSE
(Table 3 displays it as “needed SSE”).
1) For each
year, we searched for the exponent pair ( a, b) was minimal subject to the constraint
that its best fitting growth curve y was above the 1976 value. The parameters of
this _{a,b}y
are listed. _{a,b}2) SSE for the data (see Table 2) and “model
uncertainty”_{min}3) (1, 2) of the
logistic model._{ }4) The computations were
based on ^{th} column of Table 2).5) This model
is represented by the green dot in Figure 4. Table 3 shows that forecasting of the true value based on the
truncated data did not require unlikely models: The probabilities of the used
models ranged between 30-46% and SSE
by 1-9%. Thus, the prognosis for up to six years remained within the range of
variability that could be expected from the data. Note that for the forecast
based on the years till 1973, 1974, or 1975, the logistic model was outside
this minimal range (i.e. _{min}SSE
exceeded the needed _{logistic}SSE). ## 4. CONCLUSIONS AND RECOMMENDATIONSForecasting requires the use of models that are capable to represent the
hitherto observed data accurately. Model class (1) has some obvious advantages: it
is a very flexible class of growth models and it includes a wide range of
common growth models. Therefore, in general the best-fitting model of the
BP-class will fit better than any of the above-mentioned named model. Thus, for
the tractor data the use of models from the BP-class resulted in a significant
improvement over the fit by the logistic growth function that was previously
used in forecasting. This approach requires extensive computations, where about hundred
thousand models from the BP-class need to be optimized (different models are
defined from different exponent-pairs). Yet, these optimization results serve
an additional important purpose, as they may be used to quantify the
model-uncertainty of forecasting. As is displayed by the forecasting intervals
(Figure 5), the near-optimal model curves remained close to the data (on
which data-fitting was based). Nevertheless, subsequently there were
considerable differences. For the present data it was feasible to consider forecasting intervals
based on all models with a probability of 5% or more to be true. Using this
approach, we have shown that the considered data were suitable for a prognosis
over a time span of five years. ## SOURCES OF FUNDINGNone. ## CONFLICT OF INTERESTNone. ## ACKNOWLEDGMENTThe authors declare no competing interests. There occurred no ethical issues, as the research was based on published data. All authors contributed equally in research, evaluation and interpretation of the results and drafting the manuscript. ## APPENDICESThe method section lists the data and explains their sources. Further, the authors provided the supplementary material, namely the following spreadsheet (MS Excel) with the outcomes of the optimizations. S-File. Computation of SSEopt(a, b), based on Table 1 for the stock of tractors for the period 1951-1971, for certain grid-points, namely exponents a and b, and for them the best fit-parameters (optimization results) initial number m0, p, q, and SSE. ## REFERENCES[1] Adamuthe, A.C., Thampi, G.K., 2019. Technology forecasting: A case study of computational technologies. Technological Forecasting & Social Change 143, 181-189. [2] Akaike, H., 1974. A New Look at the Statistical Model Identification. IEEE Transactions of Automatic Control 19, 716-723. [3] Bai, Y., Jin, W.L., 2016. Marine Structural Design (2nd ed.) Elsevier, Amsterdam, Netherlands. [4] Bass, F.M., 1969. A new product growth model for consumer durables. Management Science 15, 215-227. [5] Bertalanffy, L.v., 1949. Problems of organic growth. Nature 163, 156-158. [6] Burnham, K.P.; Anderson, D.R., 2002. Model Selection and Multi-Model Inference: A Practical Information-Theoretic Approach. Springer, Berlin. [7] Dhakal, T., 2018. An analytical model on business diffusion. Journal of Industrial Engineering and Management Science 2018, 119-128. DOI 10.13052/jiems2446-1822.2018.007. [8] Firat, A.K., Madnick, S., Woon, W.L., 2008. Technology forecasting: A review. In: Working Paper CISL# 2008-15. MIT, Cambridge, USA. [9] Franses, P.H., 1994. A method to select between Gompertz and Logistic trend curves. Technological Forecasting & Social Change 46, 45-49. [10] Gurung, B., Singh, K.N., Shekhawat, R.S., Yeasin, M., 2018. An insight into technology diffusion of tractor through Weibull growth model. Journal of Applied Statistics 45, 682-696. [11] Hyndman, R.J., Koehler, A.B., 2006. Another look at measures of forecast accuracy. International Journal of Forecasting 22, 679-688. [12] Kühleitner, M., Brunner, N., Nowak, W.G., Renner-Martin, K., Scheicher, K., 2019. Best fitting tumor growth models of the von Bertalanffy-Pütter Type. BMC Cancer 19, published online: DOI /10.1186/s12885-019-5911-y. [13] Mar-Molinero, C., 1980. Tractors in Spain: a logistic analysis. Journal of the Operational Research Society 31, 141-152. [14] Marusic, M., Bajzer, Z., 1993. Generalized two-parameter equations of growth. Journal of Mathematical Analysis and Applications 179, 446-462. [15] Meade, N., 1984. The use of growth curves in forecasting market development-a review and appraisal. Journal of Forecasting 3, 429-451. [16] Monod, J., 1949. The growth of bacterial cultures. Annual Reviews of Microbiology 8, 371-374. [17] Motulsky, H., Christopoulos, A., 2003. Fitting Models to Biological Data Using Linear and Nonlinear Regression: A Practical Guide to Curve Fitting. Oxford University Press, Oxford, U.K. [18] Murphy, H., Jaafari, H., Dobrovolny, H.M., 2016. Differences in predictions of ODE models of tumor growth: a cautionary example. BMC Cancer 16, 163-172. [19] Naseri, M.B., Elliott, G., 2013. The diffusion of online shopping in Australia: Comparing the Bass, Logistic and Gompertz growth models. Journal of Marketing Analytics 1, 49-60. DOI 10.1057/jma.2013.2. [20] Nguimkeu, P., 2014. A simple selection test between the Gompertz and Logistic growth models. Technological Forecasting & Social Change 88, 98-105. [21] Ohnishi, S., Yamakawa, T., Akamine. T., 2014. On the analytical solution for the Pütter-Bertalanffy growth equation. Journal of Theoretical Biology 343, 174-177. [22] Pell, B., Kuanga, Y., Viboud, C., Chowell, G., 2018. Using phenomenological models for forecasting the 2015 Ebola challenge. Epidemics 22, 62-70. [23] Pütter, A., 1920. Studien über physiologische Ähnlichkeit. VI. Wachstumsähnlichkeiten. Pflügers Archiv für die Gesamte Physiologie des Menschen und der Tiere 180, 298-340. [24] Renner-Martin, K., Brunner, N., Kühleitner, M., Nowak, W.G., Scheicher, K., 2018. Optimal and near-optimal exponent-pairs for the Bertalanffy-Pütter growth model. PeerJ 6, published online: DOI 10.7717/peerj.5973. [25] Richards, F.J., 1959. A Flexible Growth Function for Empirical Use, Journal of Experimental Botany, 10, 290-300. [26] Satoh, D., Matsumura, R., 2018. Monotonic decrease of upper limit estimated with Gompertz model for data described using logistic model. Japan Journal of Industrial and Applied Mathematics, published online: DOI 10.1007/s13160-018-0333-9. [27] Solow, R., 1957. Technical Change and the Aggregate Production Function. The Review of Economics and Statistics 39, 312-320. [28] Táboas, D.L., Fernández–Prieto, L., Geada, A.D., 2019. Agriculture and Agricultural Policies in Spain (1939-1959). In: Rural History Conference (in preparation). Published online: DOI 10.13140/2.1.1521.3762. [29] Vidal, R.V.V., 1993. Applied simulated annealing. In: Lecture notes in economics and mathematical systems. Berlin: Springer-Verlag. [30] West, G.B., Brown, J.H., Enquist, B.J., 2001. A general model for ontogenetic growth. Nature 413, 628-631. [31] World Bank, 2019. World Bank Open Data, Link: data.worldbank.org (last visit: 01.07.2019) [32] Yamakawa, P., Rees, G.H., Salas, J.M., Alva, N., 2013. The diffusion of mobile telephones: an empirical analysis for Peru. Telecommunication Policy, 37, 594-606.
This work is licensed under a: Creative Commons Attribution 4.0 International License © IJETMR 2014-2020. All Rights Reserved. |