Alternatives to model and simulate yield and price distributions
for empirical risk analyses have been proposed in the agricultural
economics literature since the early 1970s. A recent article in the
American Journal of Agricultural Economics ranks five parametric and one
semiparametric yield distribution models using
out-of-sample-log-likelihood functions (Norwood, Roberts, and Lusk [NRL]
2004). The five parametric models are based on four well-known
statistical distributions: the normal, the gamma, the beta, and the
inverse hyperbolic sine (IHS), which is also known as the SU family.
This comment discusses key specification issues that may have
substantially affected the performance and, therefore, the ranking of
the parametric models compared in NRL. These include different,
sometimes ad hoc, formulations for the models' mean and variance
functions, and only allowing the distributional shape, that is, the
skewness and kurtosis, to shift over time in some of the models.
A procedure to obtain the most flexible parametric model
specification possible, given the particular probability distribution
function on which the model is based, is presented. These improved
specifications address the issues that could have affected the
performance and ranking of the models compared in NRL, and might allow
parametric models to fare better in relation to nonparametric procedures
in future comparisons. Therefore, this comment cautions against
generalization of NRL rankings and recommends that these more flexible
probability distribution model specifications be adopted in future
comparisons and applications.
A Conceptual Framework
Unlike the recently developed IHS system, the basic
parameterization of probability distributions such as the gamma, beta,
and SU family usually includes two coefficients, which implies that the
mean, variance, skewness, and kurtosis are restricted to be functions of
two parameters only. As a result, only two of these four central moments
are freely determined. Such a restriction makes these basic
parameterizations unnecessarily inflexible for use as probability
distribution models.
Fortunately, any statistical distribution can be recentered and
rescaled to include coefficients (or combinations of variables and
coefficients) that determine the mean and the variance of the
distribution independently of its skewness and kurtosis. Specifically,
consider any two-parameter nonnormal distribution pdf(y) = f(y, [theta],
[lambda]) with mean E[y] = [f.sub.1]([theta], [lambda]), variance E[[(y
- E[y]).sup.2]] = V[y] = [f.sub.2]([theta], [lambda]), skewness E[[(y
E[y]).sup.3]/[f.sub.2][([theta], [lambda]).sup.3/2]] = S[y] =
[f.sub.3]([theta], [lambda]) [not equal to] 0, and kurtosis E[(y -
E[y]).sup.4]/[f.sub.2][([theta], [lambda]).sup.2]] = K[y] =
[f.sub.3]([theta], [lambda]) [not equal to] 0 coefficients. Then the
following reparameterization
(1) y' = {y --[f.sub.1]([theta],
[lambda])}/[f.sub.2][([theta], [lambda]).sup.1/2]
yields a pdf {pdf'(y')} with a constant mean (E[y']
= 0) and variance (V[y'] = 1) without altering its skewness and
kurtosis coefficients. Furthermore, the transformation
(2) y" = [sigma]-y' + [mu]
produces a pdf {pdf"(y") = f"(y", [mu],
[sigma], [theta], [lambda])} with mean and variance solely determined by
[mu] and [[sigma].sup.2], respectively (i.e., E[y"] = [mu] and
V[y"] = [[sigma].sup.2]), while its skewness and kurtosis
coefficients depend on the original distributional shape parameters
([theta] and [lambda]) only. Note that, as in the Ramirez (1997) IHS
model, the mean and the variance can be specified as parametric
functions of explanatory variables to allow them to shift across
observations (i); that is, [[mu].sub.i] = [F.sub.1] ([X.sub.i], [beta])
and [[sigma].sub.i] = [F.sub.2]([Z.sub.i], [GAMMA]), where [X.sub.i] and
[Z.sub.i] are explanatory variable matrices, [beta] and [GAMMA] are
parameter vectors, and [F.sub.1] and [F.sub.2] are linear or nonlinear
functions of them. In addition, as in Ramirez, Misra, and Nelson (2003),
[theta] and/or [lambda] can also be specified as parametric functions of
explanatory variables so that skewness and/or kurtosis may change across
observations. Maximum likelihood estimation of such
reparameterized/expanded models is straightforward given knowledge of
the original pdf(y).
Because any distribution can be reparameterized/expanded so that
its mean and variance are [mu] and [sigma] while maintaining the
skewness-kurtosis (S-K) combinations that it originally allows, the
degree to which a particular distribution might be suitable to
adequately model the unknown statistical distribution generating the
data is solely determined by the span of S-K combinations originally
allowed by that distribution.
The span of S-K combinations permitted by the four parametric
distributions evaluated by NRL is shown in figure 1.(1) Note that,
although the IHS can accommodate a much wider spectrum of S-K
combinations, the region of the S-K space spanned by the beta
distribution is unattainable with the IHS and vice versa; while the
gamma distribution spans a unique line in the positive quadrant of the
S-K plane at the boundary of the area covered by the beta. Thus, these
three distributions complement each other in terms of the S-K space that
they can span. On the positive S-K quadrant, the lower boundary of the
IHS is known as the log-normal line, which represents the S-K
combinations allowed by the log-normal distribution. The normal
distribution is represented by the dot where the S-K axes intersect.
[FIGURE 1 OMITTED]
From the information in figure 1, it is concluded that any one of
these distributions could be the most suitable in a specific
application, as particular regions of the S-K space cannot be ruled out
a priori. Also note that there are regions of the S-K plane that are not
covered by any of the distributions. Thus, even after addressing the
specification issues and using the more flexible parameterization
proposed in this comment, it is not certain that any of these parametric
models will rank favorably in comparison with nonparametric methods in a
particular application. Therefore, identifying a set of parametric
distributions that can accommodate all theoretically feasible S-K
combinations should be a priority in this area of research.
On the NRL 2004 Methods
Several model specification issues cast doubt on whether the
results of NRL comparisons of parametric models can be generalized.
First, unlike in the original applications of these models (Gallagher
1987 and Nelson and Preckel 1989, respectively), the two distributional
parameters in the gamma and beta models are specified as linear
functions of time. Because NRL did not reparameterize these
distributions as suggested in the previous section, their specification
makes the mean, variance, skewness, and kurtosis of the yield-deviation
distributions become arbitrary nonlinear functions of time. Using the
formulas in Mood, Graybill, and Boes (1974) and adopting the notation in
NRL, the means and variances are
(3) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
where the subscripts g and b stand for the gamma and beta
distributions, respectively, and [Y.sub.gt] and [Y.sub.bt] represent
positive yield deviations from a maximum threshold; that is, [Y.sub.gt]
and [y.sub.bt] = [Y.sup.m] Yt, where Yt stands for the observed yield at
time t and [y.sup.m] is the threshold. Therefore,
(4) E[[Y.sub.t]] = [Y.sup.m] - ([[alpha].sub.1] +
[[alpha].sub.2t])/([[beta].sub.1] + [[beta].sub.2t]), and E[[Y.sub.t]] =
[Y.sup.m] - ([[theta].sub.1] + [[omega].sub.1t])/{([[theta].sub.1] +
[[theta].sub.2] + [[omega].sub.1 + [[omega].sub.2]) t}
under the gamma and beta models, respectively. Thus, although the
maximum yield threshold is constant, the expected yields implied by the
NRL gamma and beta model specifications are fairly complex nonlinear
functions of time. It is not clear why this makes empirical sense,
particularly when modeling relatively long yield time series as in NRL.
In fact, note that in Gallagher's original gamma model where
[Y.sup.m] is specified as a linear function of time ([Y.sup.m] =
[[gamma].sub.0] + [[gamma.sub.it]) and the distributional parameters
[alpha] and [beta] are assumed constant, E[[Y.sub.t]] simply equals
[Y.sup.m] - [alpha]/[beta] = ([gamma].sub.0] + [[gamma.sub.it]) -
[alpha]/[beta] and, therefore, the expected yield values and their
corresponding thresholds synchronously shift over time separated by a
constant factor [alpha]/[beta]. Although it is difficult to empirically
justify the relationship between the expected yield values and the yield
thresholds implied by the NRL models, their gamma and beta
specifications are quite flexible in that they allow for all
distributional moments, including skewness and kurtosis, to arbitrarily
fluctuate over time.
This enhanced flexibility, however, was not afforded to the IHS
model specifications used in NRL comparisons. Although Ramirez had
previously published two articles in which the "shape"
parameters of IHS models are either functions of time (Ramirez and
Fadiga 2003) or of other explanatory variables (Ramirez, Misra, and
Nelson 2003), NRL did not use this more flexible IHS model specification
in their comparisons. Therefore, while the gamma and beta models used in
NRL allow for the skewness and kurtosis of the estimated crop yield
distributions to be different at each and every observation, the IHS
specifications do not.
COPYRIGHT 2006 American Agricultural Economics
Association Reproduced with permission of the copyright holder. Further reproduction or distribution is prohibited without permission.
Copyright 2006, Gale Group. All rights
reserved. Gale Group is a Thomson Corporation Company.
NOTE: All illustrations and photos have been removed from this article.