Measurement error in recall surveys and the
relationship between household size and food demand.
by Gibson, John^Kim, Bonggeun
which is identical to equation (5) because [gamma] = [beta][alpha].
(9) According to equation (6), if [x.sup.0] is the outlay of a
one-person household, an n-person household of the same composition
needs total outlay of [x.sup.0] [n.sup.1-[sigma]] to have the same food
share (and the same welfare level, by assumption).
Theoretical objections to this method have been raised at least
since Nicholson (1976) and it is not the aim to add to those here.
Instead, the aim is to see how correlated measurement error affects
these Engel estimates. Because the scale economy parameter, [sigma] is
just the ratio of [??] to [??], any measurement error that biases [??]
will affect Engel estimates of scale economies. For example, Lanjouw and
Ravallion estimate [sigma] to be 0.4, so if ten individuals in Pakistan
formed a ten-person household, their per capita food spending could go
down by 60% and they would still have the same level of welfare
([10.sup.0.6] = 3.98). These large scale economy estimates imply
improbable reductions in food spending per head for consumers in a poor
country (Deaton 1997). But if the estimates of [sigma] are sensitive to
measurement error, not only will the Engel method be theoretically
unfounded, it will also be shown to be empirically fragile.
Measurement Error and the Testing Procedure
Suppose that survey data on household expenditure is subject to
reporting error of the form:
(7) [[??].sub.i] = [x.sub.i] + [m.sub.i] + [v.sup.x.sub.i]
where [[??].sub.i] is the survey response, [x.sub.i] is the true
value of expenditure of the ith household, [m.sub.i] is a method effect,
due perhaps to the use of a less detailed recall questionnaire rather
than a more detailed one, and [v.sup.x.sub.i] is a pure random error. As
discussed above, the method effect in the measurement error, [m.sub.i]
may be negatively correlated with household size, [n.sub.i]. Thus it is
also assumed to be negatively correlated with household expenditure,
[x.sub.i] since [x.sub.i] is positively correlated with [n.sub.i].
Hence, the method effect can be expressed as:
(8) [m.sub.i] = [pi][[x.sub.i] + [v.sup.m.sub.i]
where [v.sup.m.sub.i] is a random deviation for the ith household
from the average method effect. Combining the two equations gives:
(9) [[??].sub.i] = [[lambda].sub.x][x.sub.i] + [v.sub.i]
where [v.sub.i]([equivalent to] [v.sup.m.sub.i] + [v.sup.x.sub.i]
is a pure random error and [[lambda].sub.x] ([equivalent to] 1 + [pi])
represents a potential correlation between the true values and the
method effect in the measurement error. Note that [[lambda].sub.x] is
the estimated slope in the regression of the method effect on the true
value plus 1. Classical measurement error is a special case of equation
(9) where [[lambda].sub.x] = 1. But with correlated errors, [pi] < 0
and (as long as measured expenditures are still positively correlated
with true values) the measurement error follows a mean-reverting pattern
(0 < [[lambda].sub.x] < 1). Thus, the expected value of measured
expenditures, E([??]) is the population mean of true expenditures scaled
down by [[lambda].sub.x] and this understatement is consistent with the
literature summarized above (e.g., Jolliffe 2001).
To see the implications of nonclassical (i.e., [[lambda].sub.x]
[not equal to] 1) measurement errors for regression parameters, consider
the following simplified version of the linear regression model used by
Deaton and Paxson (the demographic composition and control variables are
ignored):
(10) [w.sub.f,i] = [alpha] + [beta] ln [(x/n).sub.i] + [gamma] ln
[n.sub.i] + [u.sub.i].
The survey data on 1n [(x/n).sub.i] are subject to reporting error
due to the reporting error on [x.sub.i]
(11) ln [([??]/n).sub.i] = ln [(x/n).sub.i] + [v.sub.i]
where measurement error [v.sub.i] can be correlated with 1n
[(x/n).sub.i]. In addition, the food share is error-ridden (unless there
is the same proportionate error in food and nonfood expenditures):
(12) [[??].sub.f,i] = [w.sub.f,i] + [v.sup.w.sub.i]
and the measurement error vi can be correlated with [w.sub.f,i].
With the error-ridden variables, the regression model becomes:
(13) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].
The slope coefficients in the population regression are
(14) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
and
(15) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
where [[sigma].sup.2.sub.ln([??]/n)]is population variance of
ln([??]/n), [[sigma].sup.2.sub.ln n] is population variance of ln n,
[[sigma].sub.ln([??]/n), ln n] is the population covariance of
ln([??]/n) and in n, which is supposed to be negative, and [rho] is the
population correlation of ln([??]/n) and ln n, which is -1 < [rho]
< 0, and [beta] is supposed to be negative according to Engels
(first) law. When the underreporting of nonfood expenditures is less
than that of food expenditures, [v.sup.w.sub.i] will be negatively
correlated with [w.sub.f,i]. It is also expected that [v.sub.i] will be
positively correlated with lnp[(x/n).sub.i] since measurement error in
total expenditure, [x.sub.i] is assumed to be negatively correlated with
log household size, In [n.sub.i] which is negatively correlated with log
PCE, ln[(x/n).sub.i]. Under these assumptions, the first two terms of
the bias in [??} will be negative and the other two terms of the bias
will be positive. Thus, [??] could be biased downward and even negative
depending on the magnitude of these terms.
The relative size of the two negative terms ([MATHEMATICAL
EXPRESSION NOT REPRODUCIBLE IN ASCII] and - [beta][[sigma].sub.ln n,
v][[sigma].sup.2.sub.ln([??]/n)]) is greater than the two corresponding
positive terms ([MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],
[[sigma].sub.ln]([??]/n), ln n] and [beta][[sigma].sub.ln([??]/n), v]
[[sigma].sub.ln([??]/n), ln n]) under reasonable assumptions (compare
the ratio of the first to the third term and the second to the fourth
term). Thus, [??] is more likely to be negative when the degree of
underreporting of total expenditures increases or the relative
underreporting of food expenditures to that of nonfood expenditures
becomes larger.
The implications change when the stronger assumption of classical
measurement error is used. When measurement errors are pure random
errors (and hence uncorrelated with true values), the household size
coefficient, [gamma] in the population regression is:
(16) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].
Thus, [??] will be upwardly biased and will be positive when
[gamma] > 0. The important implication of the result in equation (16)
is that classical measurement error could not account for the Deaton and
Paxson puzzle. It is only some form of correlated error that could cause
[??] to be biased downwards, so if measurement error is a cause of the
puzzle it is in a form that differs from the standard white noise
assumptions that are typically used in the literature.
To supplement the analytical results, Monte Carlo experiments were
carried out on equation (10). The experiments are based on a value of
[??} = -0.007, which is similar to the value found by Deaton and Paxson
in surveys from the United States and France (this value also ensures
that the Engel scale elasticity is positive, but small). The aim of the
experiments was to see whether plausible values of measurement error
could bias [??} downwards toward the values found in surveys from poor
countries, -0.09 [less than or equal to] [??} [less than or equal to]
-0.05. To implement the experiments, total expenditure, x was
partitioned into food expenditures, [x.sub.f] = x x [w.sub.f] and
nonfood expenditures, [x.sub.nf] = x - [x.sub.f]. In the first set of
experiments, a proportionate error was added to true food expenditures,
so that the observed variable was ln [[??}.sub.f] = ln [x.sub.f] + v. In
the first case, the measurement error was independent of any of the
variables in the model: v ~ N(0, [[sigma].sup.2.sub.v]), with three
values of [[sigma].sub.v] used; 0.1, 0.2, and 0.3. In the second case,
errors were correlated with true values, v = [phi] ln [x.sub.f] +
[epsilon], where [epsilon] ~ N(0, [[sigma].sup.2.sub.[epsilon]]) and
E([epsilon], [x.sub.f]) = 0. In the third case, errors were correlated
with household size, v = [lambda] ln n + [epsilon], where [epsilon] ~
N(0, [[sigma].sup.2.sub.[epsilon]]) and E([epsilon], n) = 0. The values
used for [phi] and [lambda] were -0.3, -0.2, and -0.1. The error-ridden
total expenditure and food share variables were reconstructed as [??] =
[[??}.sub.f] + [x.sub.nf] and [[??].sub.f] = [[??}.sub.f]/[??]. In other
experiments, the errors in measuring food expenditures were mirrored by
a similar set of errors in nonfood, recognizing the fact that it may
also be difficult to accurately report expenditures on things like
transportation and entertainment in larger households.
COPYRIGHT 2007 American Agricultural Economics
Association Reproduced with permission of the copyright holder. Further reproduction or distribution is prohibited without permission.
Copyright 2007, Gale Group. All rights
reserved. Gale Group is a Thomson Corporation Company.
NOTE: All illustrations and photos have been removed from this article.