More Resources

Ranking crop yield models a comment.


by Ramirez, Octavio A.^McDonald, Tanya

Alternatives to model and simulate yield and price distributions for empirical risk analyses have been proposed in the agricultural economics literature since the early 1970s. A recent article in the American Journal of Agricultural Economics ranks five parametric and one semiparametric yield distribution models using out-of-sample-log-likelihood functions (Norwood, Roberts, and Lusk [NRL] 2004). The five parametric models are based on four well-known statistical distributions: the normal, the gamma, the beta, and the inverse hyperbolic sine (IHS), which is also known as the SU family.

This comment discusses key specification issues that may have substantially affected the performance and, therefore, the ranking of the parametric models compared in NRL. These include different, sometimes ad hoc, formulations for the models' mean and variance functions, and only allowing the distributional shape, that is, the skewness and kurtosis, to shift over time in some of the models.

A procedure to obtain the most flexible parametric model specification possible, given the particular probability distribution function on which the model is based, is presented. These improved specifications address the issues that could have affected the performance and ranking of the models compared in NRL, and might allow parametric models to fare better in relation to nonparametric procedures in future comparisons. Therefore, this comment cautions against generalization of NRL rankings and recommends that these more flexible probability distribution model specifications be adopted in future comparisons and applications.

A Conceptual Framework

Unlike the recently developed IHS system, the basic parameterization of probability distributions such as the gamma, beta, and SU family usually includes two coefficients, which implies that the mean, variance, skewness, and kurtosis are restricted to be functions of two parameters only. As a result, only two of these four central moments are freely determined. Such a restriction makes these basic parameterizations unnecessarily inflexible for use as probability distribution models.

Fortunately, any statistical distribution can be recentered and rescaled to include coefficients (or combinations of variables and coefficients) that determine the mean and the variance of the distribution independently of its skewness and kurtosis. Specifically, consider any two-parameter nonnormal distribution pdf(y) = f(y, [theta], [lambda]) with mean E[y] = [f.sub.1]([theta], [lambda]), variance E[[(y - E[y]).sup.2]] = V[y] = [f.sub.2]([theta], [lambda]), skewness E[[(y E[y]).sup.3]/[f.sub.2][([theta], [lambda]).sup.3/2]] = S[y] = [f.sub.3]([theta], [lambda]) [not equal to] 0, and kurtosis E[(y - E[y]).sup.4]/[f.sub.2][([theta], [lambda]).sup.2]] = K[y] = [f.sub.3]([theta], [lambda]) [not equal to] 0 coefficients. Then the following reparameterization

(1) y' = {y --[f.sub.1]([theta], [lambda])}/[f.sub.2][([theta], [lambda]).sup.1/2]

yields a pdf {pdf'(y')} with a constant mean (E[y'] = 0) and variance (V[y'] = 1) without altering its skewness and kurtosis coefficients. Furthermore, the transformation

(2) y" = [sigma]-y' + [mu]

produces a pdf {pdf"(y") = f"(y", [mu], [sigma], [theta], [lambda])} with mean and variance solely determined by [mu] and [[sigma].sup.2], respectively (i.e., E[y"] = [mu] and V[y"] = [[sigma].sup.2]), while its skewness and kurtosis coefficients depend on the original distributional shape parameters ([theta] and [lambda]) only. Note that, as in the Ramirez (1997) IHS model, the mean and the variance can be specified as parametric functions of explanatory variables to allow them to shift across observations (i); that is, [[mu].sub.i] = [F.sub.1] ([X.sub.i], [beta]) and [[sigma].sub.i] = [F.sub.2]([Z.sub.i], [GAMMA]), where [X.sub.i] and [Z.sub.i] are explanatory variable matrices, [beta] and [GAMMA] are parameter vectors, and [F.sub.1] and [F.sub.2] are linear or nonlinear functions of them. In addition, as in Ramirez, Misra, and Nelson (2003), [theta] and/or [lambda] can also be specified as parametric functions of explanatory variables so that skewness and/or kurtosis may change across observations. Maximum likelihood estimation of such reparameterized/expanded models is straightforward given knowledge of the original pdf(y).

Because any distribution can be reparameterized/expanded so that its mean and variance are [mu] and [sigma] while maintaining the skewness-kurtosis (S-K) combinations that it originally allows, the degree to which a particular distribution might be suitable to adequately model the unknown statistical distribution generating the data is solely determined by the span of S-K combinations originally allowed by that distribution.

The span of S-K combinations permitted by the four parametric distributions evaluated by NRL is shown in figure 1.(1) Note that, although the IHS can accommodate a much wider spectrum of S-K combinations, the region of the S-K space spanned by the beta distribution is unattainable with the IHS and vice versa; while the gamma distribution spans a unique line in the positive quadrant of the S-K plane at the boundary of the area covered by the beta. Thus, these three distributions complement each other in terms of the S-K space that they can span. On the positive S-K quadrant, the lower boundary of the IHS is known as the log-normal line, which represents the S-K combinations allowed by the log-normal distribution. The normal distribution is represented by the dot where the S-K axes intersect.

[FIGURE 1 OMITTED]

From the information in figure 1, it is concluded that any one of these distributions could be the most suitable in a specific application, as particular regions of the S-K space cannot be ruled out a priori. Also note that there are regions of the S-K plane that are not covered by any of the distributions. Thus, even after addressing the specification issues and using the more flexible parameterization proposed in this comment, it is not certain that any of these parametric models will rank favorably in comparison with nonparametric methods in a particular application. Therefore, identifying a set of parametric distributions that can accommodate all theoretically feasible S-K combinations should be a priority in this area of research.

On the NRL 2004 Methods

Several model specification issues cast doubt on whether the results of NRL comparisons of parametric models can be generalized. First, unlike in the original applications of these models (Gallagher 1987 and Nelson and Preckel 1989, respectively), the two distributional parameters in the gamma and beta models are specified as linear functions of time. Because NRL did not reparameterize these distributions as suggested in the previous section, their specification makes the mean, variance, skewness, and kurtosis of the yield-deviation distributions become arbitrary nonlinear functions of time. Using the formulas in Mood, Graybill, and Boes (1974) and adopting the notation in NRL, the means and variances are

(3) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where the subscripts g and b stand for the gamma and beta distributions, respectively, and [Y.sub.gt] and [Y.sub.bt] represent positive yield deviations from a maximum threshold; that is, [Y.sub.gt] and [y.sub.bt] = [Y.sup.m] Yt, where Yt stands for the observed yield at time t and [y.sup.m] is the threshold. Therefore,

(4) E[[Y.sub.t]] = [Y.sup.m] - ([[alpha].sub.1] + [[alpha].sub.2t])/([[beta].sub.1] + [[beta].sub.2t]), and E[[Y.sub.t]] = [Y.sup.m] - ([[theta].sub.1] + [[omega].sub.1t])/{([[theta].sub.1] + [[theta].sub.2] + [[omega].sub.1 + [[omega].sub.2]) t}

under the gamma and beta models, respectively. Thus, although the maximum yield threshold is constant, the expected yields implied by the NRL gamma and beta model specifications are fairly complex nonlinear functions of time. It is not clear why this makes empirical sense, particularly when modeling relatively long yield time series as in NRL. In fact, note that in Gallagher's original gamma model where [Y.sup.m] is specified as a linear function of time ([Y.sup.m] = [[gamma].sub.0] + [[gamma.sub.it]) and the distributional parameters [alpha] and [beta] are assumed constant, E[[Y.sub.t]] simply equals [Y.sup.m] - [alpha]/[beta] = ([gamma].sub.0] + [[gamma.sub.it]) - [alpha]/[beta] and, therefore, the expected yield values and their corresponding thresholds synchronously shift over time separated by a constant factor [alpha]/[beta]. Although it is difficult to empirically justify the relationship between the expected yield values and the yield thresholds implied by the NRL models, their gamma and beta specifications are quite flexible in that they allow for all distributional moments, including skewness and kurtosis, to arbitrarily fluctuate over time.

This enhanced flexibility, however, was not afforded to the IHS model specifications used in NRL comparisons. Although Ramirez had previously published two articles in which the "shape" parameters of IHS models are either functions of time (Ramirez and Fadiga 2003) or of other explanatory variables (Ramirez, Misra, and Nelson 2003), NRL did not use this more flexible IHS model specification in their comparisons. Therefore, while the gamma and beta models used in NRL allow for the skewness and kurtosis of the estimated crop yield distributions to be different at each and every observation, the IHS specifications do not.


1  2  3  
COPYRIGHT 2006 American Agricultural Economics Association Reproduced with permission of the copyright holder. Further reproduction or distribution is prohibited without permission.
Copyright 2006, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.
NOTE: All illustrations and photos have been removed from this article.


Browse by Journal Name:
Today on Entrepreneur

e-Business & Technology
Franchise News
Business Book Sampler
Starting a Business
Sales & Marketing
Growing a Business
E-mail*:
Zip Code*: