Measurement error in recall surveys and the
relationship between household size and food demand.
by Gibson, John^Kim, Bonggeun
The results of the Monte Carlo experiments are reported in table 1,
in the form of the mean values of three parameters: [??}, [??], and the
scale elasticity [??} = [??]/[??} under the assumptions of the Engel
method. (10) The results confirm the finding from equation (15), that
errors in measuring food expenditures that are negatively correlated
with either household size (row 3b) or with the true value of food
expenditures (row 2b) could cause negative bias in estimates of [gamma].
Also, if measurement errors in food expenditures are correlated with the
true value of expenditures, the coefficient on ln(x/n), [??] will suffer
attenuation bias (i.e., toward zero) but if errors are correlated with
household size, there will be no effect on (see rows 2a and 3a). It is
also apparent that errors in measuring food expenditures that are
negatively correlated with either true values (row 2c) or with household
size (row 3c) can cause [??} to be biased upwards. The results when
nonfood is also measured with error can be summarized by the following
two points: if the errors in nonfood expenditures are independent, i.e.,
ln [[??].sub.nf] = ln [x.sub.nf] + g where g ~ N(0, 0.4), the effect of
food expenditure errors is amplified slightly (row 4b). If the errors in
nonfood expenditures vary negatively with household size, g = -0.2 ln n
+ [xi] where [xi] ~ N(0, 0.4) and the food expenditure errors are at
least as strongly correlated with household size ([lambda] [less than or
equal to] -0.2), [??] is negatively biased and moves into the range
-0.06 [less than or equal to] [gamma] [less than or equal to] -0.03 (row
5b).
The results of the Monte Carlo experiments suggest that one way to
empirically observe the effect of correlated measurement errors is to
estimate a food Engel curve with an interaction term between household
size and a dummy variable, D for differences in household survey
methods. For example, if it is assumed that reporting errors are less
likely when households have their expenditures measured with a long,
detailed recall questionnaire rather than with a shorter recall, the
effect of errors correlated with household size may be observed from:
(17) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
where [D.sub.i] = 1 if the household expenditures are measured with
the long recall and [D.sub.i] = 0 if the short recall is used. If
[[??].sub.1] > 0 it would imply that reporting errors in shorter,
less detailed surveys are correlated with household size, where such a
correlation could occur because of the greater number of food purchases
to recall in larger households (Gibson 2002).
In contrast, if errors are negatively correlated with the true
value of food expenditures, the bias will affect not only [??} but also
[??} (see row 2a, table 1). Consequently, other variables may also need
to be interacted with D, giving the more general model:
(18) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].
In equation (18), [[kappa].sub.2] > 0 and [[kappa].sub.1] > 0
would be consistent with reporting errors that are negatively correlated
with the true value of food expenditures. On the other hand, errors that
are correlated with household size would imply [[kappa].sub.2] > 0
and [[kappa].sub.1] = 0.
Data
To estimate equations (17) and (18), data from household surveys
carried out in 1999 in two developing countries, Cambodia and Indonesia,
are used. Both surveys feature random variation in the methods and
practices used within each country. By relying on within-rather than
between-country variation, most of the other factors listed as possible
explanations by Deaton and Paxson should be held constant. If estimated
food demand parameters then differ between two randomly selected groups
of households in the same setting whose expenditures were measured in
different ways, measurement error is one plausible explanation.
Indonesia
The annual SUSENAS (National Socio-Economic Household Survey) Core
questionnaire asks respondents about their household's consumption
of fifteen food groups over the previous week and eight nonfood groups
over both the previous month and previous year. Once every three years,
a randomly selected subsample receives a detailed consumption
questionnaire (the Module). The Module has 218 categories of food and
102 of nonfood, and uses the same reference periods as the Core. The
Module questionnaire is nested in the Core, which covers the same items
but more broadly. Households given the Module are not also asked the
Core questions, instead interviewers add up consumption within each
subgroup of the Module and copy these into the Core. Thus, we cannot
compare Core and Module results for the same household, but we can make
comparisons between the group given the Module and those given just the
Core.
Pradhan (2001) has analyzed this large, repeated, experiment in
survey design and found that the shorter, Core questionnaire gives
average consumption that is 12% to 20% below the more detailed Module
(Pradhan 2001). The trade-off for survey authorities is that use of the
Module almost doubles the average interview time, raising it from fifty
to eighty minutes per household. The underestimation varies from year to
year (highest in 1996), is worst for nonfood, and appears to vary
systematically with the true level of consumption (i.e., a correlated
measurement error). Pradhan does not test whether the underestimation
varies by household size, which would also show up as a correlation with
total consumption.
We use data from the 1999 survey, mainly for urban areas on Java,
because household wage income is used as an instrument and wage earning
is much more prevalent in urban areas. Almost 13,000 of these households
were given the detailed consumption Module and 19,000 were given the
Core. In this sample, the households given the longer Module
questionnaire have measured per capita consumption expenditures almost
one quarter higher than the average for those given the shorter Core
questionnaire (table 2). (11) The food budget share is also lower,
suggesting that nonfood expenditures are raised most by using the more
detailed questionnaire, corroborating results reported by Pradhan
(2001). Except for these questionnaire effects, there is no evidence
that the two samples of households differ in any significant way. (12)
It is likely that the questionnaire effects also vary with
household size. For example, a simple nonparametric description of the
Core and Module data shows that measured food expenditures rise much
more strongly with increases in household size when the longer Module
questionnaire is used (figure 1). Approximating these nonparametric
curves by linear functions, with the Core, an additional person is
associated with a 15.2% increase in food expenditure but with the
Module, each additional person is associated with an 18.3% rise in
additional food expenditure (the difference is statistically significant
at p < 0.001).
[FIGURE 1 OMITTED]
Cambodia
The 1999 Cambodia Socio-Economic Survey (CSES) used a consumption
recall with twenty-three foods and thirteen nonfoods specified. It did
not aim to apply different procedures to different groups in the
population but variation in interviewer practice appears to have
produced the same effects. This variation is apparent because the sample
was randomly split, with half of the households interviewed between
January and March (Round 1), and the remainder interviewed between June
and September (Round 2). (13) Between the two rounds, interviewers were
retrained, where it was emphasized that estimates of household
consumption should be "reasonable" given the estimate of
household income. To facilitate these income-expenditure comparisons the
questionnaires included a Household Income and Expenditure Balance
Sheet. (14) Consistent with a greater effort made to reconcile household
total income, y and total expenditure, x there is a much closer
relationship between the two variables in Round 2 of the survey than
there was in Round 1:
Round 1 Round 2
ln x = 3.25 + 0.777 ln y ln x = 2.01 + 0.862 ln y
[R.sup.2] = 0.60 [R.sup.2] = 0.80
The rise in the estimated income elasticity of expenditure between
the two survey rounds is statistically significant (p < 0.02). As a
result of the extra effort to match expenditure and income, there is a
20% rise in measured expenditures between the two survey rounds (table
3). (15)
One possible cause for different expenditure estimates between the
survey rounds is that the sample splitting was not random. However,
comparisons between the two groups of households in terms of dwelling
characteristics (as proxies for wealth) and literacy (as a proxy for
income) reveal no evidence that the subsamples differ in any systematic
way (table 3). Also, if one subsample was significantly better off, it
would also be expected to alter the food budget share (according to
Engel's Law) but the average food share is almost the same across
survey rounds, and if anything, indicates that the households in Round 2
are worse off.
COPYRIGHT 2007 American Agricultural Economics
Association Reproduced with permission of the copyright holder. Further reproduction or distribution is prohibited without permission.
Copyright 2007, Gale Group. All rights
reserved. Gale Group is a Thomson Corporation Company.
NOTE: All illustrations and photos have been removed from this article.