FORECASTING INNOVATION DIFFUSION WITH NEAR-OPTIMAL BERTALANFFY-PÜTTER MODELS

Abbreviations: AicAkaike Information Criterion Bp-ModelBertalanffy-Pütter Model SseSum of Squared Errors ABSTRACT Using a classical example for technology diffusion, the mechanization of agriculture in Spain since 1951, we considered the forecasting-intervals from the near-optimal Bertalanffy-Pütter (BP) models. We used BPmodels, as they considerably reduced the hitherto best fit (sum of squared errors) reported in literature. And we considered near-optimal models (their sum of squared errors is almost best), as they allowed to quantify model-uncertainty. This approach supplemented traditional sensitivity analyses (variation of model parameters), as for the present models and data even slight changes in the best-fit parameters resulted in very poorly fitting model curves.


INTRODUCTION BACKGROUND
Technology forecasting uses a wide range of methods (Firat et al., 2008). This paper focuses on a popular phenomenological approach, trend analysis for the diffusion of technologies by means of sigmoidal (S-shaped) model curves. Here, an often considered three-parameter model was the Verhulst model of logistic growth (e.g. Adamuthe & Thampi, 2019; Naseri & Elliott, 2013; Yamakawa et al., 2013). This paper proposes a five-parameter model to describe the success (growth, diffusion) of a technology, the Bertalanffy-Pütter (BP) model. It generalizes many conventional three-parameter models (e.g. logistic growth) and therefore improves their fit. In addition, the two additional parameters allow to identify near-optimal three-parameter models that do not differ significantly from the best-fit BP-model. Comparing the forecasts from these near-optimal models with the forecast from the best-fit model results in new quantitative indicators for assessing model-uncertainty ex ante. We illustrate this approach for data from literature, where we explore, which levels of model-uncertainty occurred in the past for forecasting. We expect that such empirical studies will lead to sensible recommendations for practitioners about what modeluncertainty needs to be assumed for different timespans of prognosis.

PROBLEM
Model uncertainties arise, when gaps in knowledge about the true drivers and mechanisms of growth cannot be reduced by an analysis of the past observations. For instance, as was observed for the prognosis of cancer, many growth curves with different shapes in the future may fit well to given historic data (Kühleitner et al., 2019). The paper develops a new approach to assess model-uncertainty and it illustrates it for forecasting. In order to measure model-uncertainty quantitatively, Bai & Jin (2016) suggested to use the relative error of the prediction. However, ex ante we could not compute this future error of prediction.
We propose a different approach. We identify models that fit well to the current data and that therefore in a technical sense defined below have a certain "probability to be true". Thereby, we use the Akaike information criterion AIC (Akaike, 1974) and its associated probability (Akaike weights): For given data, the model with the lowest AIC is most likely to be true and comparing other models with this one, their probabilities to be true are computed from the differences in AIC.
Based on this notion, we define a forecasting interval by the lower and upper bounds of the predictions drawn from likely models. Thereby, we consider all BP-models with a certain probability to be true, when compared to the best-fitting one. The forecasts based on these models define the forecasting interval. This concept resembles the confidence interval, but the confidence interval assumes a fixed model that is fitted to random variations of the data, while for the forecasting interval the data remain fixed.

MATERIALS
The computations used Mathematica 12.0 software of Wolfram Research. The results of optimization were exported to a spreadsheet (Microsoft Excel) and reimported into Mathematica for further analysis of the modeluncertainty.

DATA
We use data about the mechanization of agriculture in Spain by means of tractor ownership. The primary sources were census data and government reports over the period 1951-1976(Mar-Molinero, 1980. While the data may appear to be outdated, they cover an interesting phase for agriculture in Spain: During the early 1950s, the ancient regime of Franco gave up its disastrous policy of economic autarky and in the 1960s and 70s this was followed by a policy of modernization of the agricultural sector (Táboas et al., 2019). Further, the data for 1951-1976 were used repeatedly to test approaches to forecasting (e.g. Nguimkeu, 2014;Franses, 1994;Meade, 1984;Mar-Molinero, 1980). Gurung et al. (2018) is a recent related study about the mechanization of agriculture in India.
The data were rescaled to start at t = 0, meaning the year 1951. (This rescaling was used in literature.) We used equation (1) to model the stock of tractors y(t) over time t in years. Thereby, we first searched for the best fitting BP-model for the data till 1976. In order to assess prognosis, we also identified the best fitting BP-models to ten truncated data (number of tractors till the year 1966, 1967, … 1975) that we used to test the forecast for 1976 in hindsight. We supplemented this example by considering in addition the 1977-2009 data from the World Bank Open Data repository starting in 1961 (World Bank, 2019) and identifying the best-fit BP model for the extended data till 2009. The combined data are listed in Table 1.

BERTALANFFY-PÜTTER (BP) MODEL
The growth function y(t) of the BP-model is a solution of the following differential equation (Pütter, 1920), which can be solved analytically, though in general not by means of elementary functions (Ohnishi et al., 2014).
The five model parameters are determined from fitting the model to historical data: Four parameters are displayed in the equations, namely the non-negative exponent-pair a < b and the constants p and q. An additional parameter is the initial value, i.e. y (0) = c > 0.
Each exponent-pair (a, b) defines a unique BP-model BP (a, b) with three free parameters. Well-known examples are bounded exponential growth BP (0, 1), logistic growth BP (1,2), the von Bertalanffy model BP (1, 2/3), and the West model BP (1, 3/4). The Solow (1957) model of economic growth is a class of BP-models; it coincides with the generalized Bertalanffy model (b = 1 and a< 1). Also, the Richards model is a class of BP-models (a = 1 and b> 1). The Gompertz-model is the limit case BP (1, 1) with a different differential equation, where b converges to a = 1 from above (Marusic & Bajzer, 1993). These models have been used in forecasting, e.g. business trends (Dhakal, 2018), tumor growth (Murphy at al., 2016), or epidemic trajectories (Pell et al., 2018). In biology, these models have been proposed to model the growth of plants (Richards, 1959) or the mass growth of vertebrates (e.g. West et al., 2001;Bertalanffy, 1949).
In literature, there are several other five-parameter growth models, such as the model of Bass (1969) for market diffusion or the model of Monod (1949) for bacterial kinetics. We decided to use the BP-model, as it was very flexible. In comparison to other five-parameter models, this versatility had the disadvantage that for the BP-model the standard optimization tools (e.g. command Non-Linear Model Fit in Mathematica) did not always identify the bestfit parameters (numerical instability). However, as explained below, we could overcome this difficulty, whence this model is now feasible for practitioners. As to another limitation, the BP-model is not suitable in situations, where both growth and decay occur. Rather, it is intended to improve the fit to the data in situations, where initially e.g. logistic growth has been considered as a viable model.

DATA FITTING
The most common method for data-fitting, used also for this paper, is the method of least squares, which fits a (nonlinear) model to past data (Satoh & Matsumura, 2018). Thus, model selection aimed at finding parameters that minimized the sum of squared errors SSE. If y(t) is a solution of equation (1), using certain exponents a < b and parameters p, q, c, and if (ti, yi) are the N data, then SSE is defined by equation (2): Forecasting Innovation Diffusion with Near-Optimal Bertalanffy-Pütter Models As explained above, for model (1) the standard optimization tools did not find best-fit parameters to minimize (2). We did overcome this difficulty by considering exponent-pairs (a, b) on a grid with step size 0.01 in both directions; initially 0 ≤ ≤ 2.5, < ≤ + 3.5. (The grid was adaptively enlarged during optimization; for one dataset we searched about 200,000 grid points.) Figure 1 plots the search-grid (yellow area) and it displays the exponent-pairs of named models by dots (e.g. Verhulst model) or lines (e.g. Richards' model interpreted as a model class). Compared to the search-grid, these models were rather exceptional. For each exponent-pair of the grid we identified the best fitting model parameters (p, q, c) that minimized SSE. This defined the following function of equation (3), assuming for the minimization (right hand side) model (1) with exponents a, b: The best fitting model had the overall least sum of squared errors (SSEmin). We identified its exponent-pair (amin, bmin) with an accuracy of 0.01 (as we searched only grid-points) and its parameters pmin, qmin, cmin that minimized SSEopt(amin, bmin) = SSEmin.
For each grid-point (a, b), the optimization of p, q, and c was done using a custom-made variant of the method of simulated annealing (Vidal, 1993). The details and the code were outlined in another paper (Renner-Martin et al., 2018, Kühleitner et al., 2019). The outcome was exported into a spreadsheet, whose rows listed the best fit parameters a, b, c, p, q, and SSE for each grid-point. The exponent-pair (amin, bmin) was identified from the row, where the least value of SSE was attained.

COMPARISON OF MODELS
We use SSE of equation (2)  is the number of optimized parameters (c, p, q and SSE as an implicit parameter). As we compared a finite set of three-parameter BP-models BP (a, b) using exponentpairs chosen from a grid, we did not penalize the best fitting model by a higher number of parameters (the optimal exponent-pair was roughly approximated, but not yet computed). Thus, for these models, the least AICmin was achieved for the model defined from the exponent-pair (amin, bmin).
If the model with AICmin is compared to another model with larger AIC, then the probability that the other model is true (in an information theoretic meaning) is given by the relative Akaike weight − /2 /(1 + − /2 ), using the difference d = AIC-AICmin. This probability is at most 50% (comparing the model with the least AIC with an almost as good model).

FORECASTING INTERVALS
The starting point of our new approach are the data and SSEmin together with all u-near-optimal exponent-pairs (a, b). Thereby, an exponent-pair (a, b) is u-near-optimal with model-uncertainty u, if SSE opt(a, b) ≤ (1+u)⋅SSEmin. For each exponent-pair we also consider the best fitting growth curve ya,b(t); it is specified by its parameters a, b, c, p, q. All models that are displayed in the yellow search region of Figure 1 by their exponent-pairs are meant to realize the best possible fit to the given data, i.e. depending on a and b (which defined the model) the model parameters p, q and c were optimized according formula (3). Therefore, for the data that represent the past, there was barely a difference between the model curves for models whose SSE did not differ much from SSEmin.
In analogy to confidence intervals we now define: For a point of time T the forecasting interval Iu(T) for the level u of model-uncertainty has as its end-points the least upper and the largest lower bounds of the function values ya,b(T) associated to u-near-optimal exponent-pairs. For the computation of a forecasting interval Iu(t), filtering in the table of optimal parameters identified the rows with SSE ≤ (1+u)⋅SSE min. The parameters of each of these rows defined a growth function ya,b that was evaluated at T. The minimum and maximum of these values defined the boundaries of the interval.
In order to obtain a closer analogy to the confidence intervals, we related model-uncertainty in the following way to the probability that a model is true in the information theoretic sense, using AIC and more specifically, the relative Akaike weight. For, as follows from the above section, if the exponent-pair (a, b) is near-optimal with modeluncertainty u, then the probability that this exponent-pair is true is given by the following formula (4) for the relative Akaike-weight. prob( , ) = 1 1 + �(1 + )

PREVIOUS OUTCOMES
The trend for tractors has been studied repeatedly. There is a consensus in literature that the growth of tractors would follow a logistic model. Mar-Molinero (1980) compared several models and for the 1951-1976 tractor data he reported SSE = 4.9768 as the best fit, obtained by the logistic model. Subsequent authors (e.g. Meade, 1984;Franses, 1994;Nguimkeu, 2014) confirmed this conclusion.
Mar-Molinero (1980) also reported an "unexplained residual sum of squares" of 2.57. However, he referred to autocorrelation, using a fit of a time series: = • −1 + for the residuals = − ( ) and the unexplained error . This paper aims at improving the fit by using a better growth curve, but it does not follow up on the autocorrelation.

DATA FITTING
In order to identify the best-fit BP-model for the 1951-1976 tractor data, and for the truncated data, we searched between 0.9-1.3×10 5 grid-points. Figure 2 plots the best fitting growth curves. Their best-fit parameters are listed in Table 2 and their exponent-pairs are plotted in Figure 3.  Judging from R 2 ( Table 2), for all (truncated) data the fit of all curves to these data was excellent (99.7-99.9% of the variance in the data was explained by the models). Further, the best fitting model curve for the 1951-1976 data with SSE = 3.91475 displayed a significant improvement of 21% over the previous SSE = 4.9768 reported in literature for logistic growth. However, the extrapolations of the growth curves for the truncated data to the remaining data underestimated the future growth ( Figure 2). In particular, the best fitting growth curves to the data till 1966, 1967, … and 1969 did rapidly approach their asymptotic limits and therefore their prognosis for T = 1976 remained considerably below the true value. However, the error became smaller, the more data were used. Further, the best fit exponent-pairs for different years were spread widely along a regression line (Figure 3). For comparison, we also plotted the optimal exponents for the extended data 1951-2009.  (Table 2) and linear trend b = 0.2094+0.988⋅a (dotted, plot using MS Excel). Figure 4 displays the near-optimal exponent-pairs for the truncated data for 1951-1971. This figure was obtained as a by-product of our approach to data-fitting, where for each of almost 127,000 exponent-pairs (a, b) the best-fit exponents to the truncated data were found by an optimization. The red area displays about 2700 nearoptimal exponent-pairs, for which SSEopt did not exceed the overall best SSEmin (attained at the black dot) by more than u = 10%. The blue area corresponds to the near-optimal exponent-pairs with u = 34%. The best fitting models using these exponent-pairs have a probability of 5% or more to be true, when compared to the best-fit model. We repeated these computations for the truncated tractor data till 1970, 1971, … and 1975. (In view of Figure 2, the data truncated at 1969 or earlier were unsuitable for prognosis and in view of Figure 3 their best-fit exponent-pairs were remote from the exponent-pair for the data till 1976.) Figure 4: Part of the searched grid points (yellow) for the fit of model (1) to the tractor data for 1951-1971, near-optimal exponent-pairs assuming a model-uncertainty of u = 34% (blue: this corresponds to a probability of 5%) and of u = 10% (red: probability 28%), optimal exponent-pair for these data (black, see Table 2), exponent-pair to obtain an upper bound for the prognosis for 1976 (green, see Table 3) and selected exponent-pairs of named models (light blue: logistic, Gompertz and von Bertalanffy); plot using Mathematica 12.0.

FORECASTING
Next, for each of near-optimal exponent-pair (a, b) the best fitting growth curve ya,b(t) was identified and its "future" values were also evaluated ("future" referring to the perspective of the fitted data). Their upper and lower bounds defined forecasting intervals. Figure 5 plots the resulting forecasting band corresponding to Figure 4, prognosis for the stock of tractors based on the data for 1951-1971. For the present data and the chosen modeluncertainty all data points from 1972-1976 (plotted in green) were within this forecasting band. These computations were repeated for all truncated data. Table 3 summarizes these results in the following way: It identifies the least model uncertainty that was needed to include the true value for 1976 in the forecasting interval. For instance, for each red exponent-pair (a, b) of Figure 5 (model-uncertainty u = 10%) the value of the growth curve ya,b was evaluated for the year 1976; the values ranged between 34.9 and 40.4. The observed value at 1976 was 40.1 and the best-fitting curve to the 1951-1971 data attained a lower value. For Table 3 (line for 1971) we identified all model curves ya,b that attained a higher value and selected the one with the least SSE (Table 3 displays it as "needed SSE").  (a, b), whose SSEopt(a, b) was minimal subject to the constraint that its best fitting growth curve ya,b was above the 1976 value. The parameters of this ya,b are listed. 2) SSEopt of the displayed exponent-pairs; it was above SSEmin for the data (see Table 2) and "model uncertainty" 3) SSEopt (1, 2) of the logistic model. 4) The computations were based on SSEmin (7 th column of Table 2). 5) This model is represented by the green dot in Figure 4. Table 3 shows that forecasting of the true value based on the truncated data did not require unlikely models: The probabilities of the used models ranged between 30-46% and SSEopt of these models exceeded SSEmin by 1-9%. Thus, the prognosis for up to six years remained within the range of variability that could be expected from the data. Note that for the forecast based on the years till 1973, 1974, or 1975, the logistic model was outside this minimal range (i.e. SSElogistic exceeded the needed SSE).

CONCLUSIONS AND RECOMMENDATIONS
Forecasting requires the use of models that are capable to represent the hitherto observed data accurately. Model class (1) has some obvious advantages: it is a very flexible class of growth models and it includes a wide range of common growth models. Therefore, in general the best-fitting model of the BP-class will fit better than any of the above-mentioned named model. Thus, for the tractor data the use of models from the BP-class resulted in a significant improvement over the fit by the logistic growth function that was previously used in forecasting.
This approach requires extensive computations, where about hundred thousand models from the BP-class need to be optimized (different models are defined from different exponent-pairs). Yet, these optimization results serve an additional important purpose, as they may be used to quantify the model-uncertainty of forecasting. As is displayed by the forecasting intervals ( Figure 5), the near-optimal model curves remained close to the data (on which data-fitting was based). Nevertheless, subsequently there were considerable differences.
For the present data it was feasible to consider forecasting intervals based on all models with a probability of 5% or more to be true. Using this approach, we have shown that the considered data were suitable for a prognosis over a time span of five years.

SOURCES OF FUNDING
None.

CONFLICT OF INTEREST
None.

ACKNOWLEDGMENT
The authors declare no competing interests. There occurred no ethical issues, as the research was based on published data. All authors contributed equally in research, evaluation and interpretation of the results and drafting the manuscript.

APPENDICES
The method section lists the data and explains their sources. Further, the authors provided the supplementary material, namely the following spreadsheet (MS Excel) with the outcomes of the optimizations. S-File. Computation of SSEopt(a, b), based on Table 1 for the stock of tractors for the period 1951-1971, for certain grid-points, namely exponents a and b, and for them the best fit-parameters (optimization results) initial number m0, p, q, and SSE.