Empirical research in agricultural and development economics
increasingly uses data from household surveys. (1) There is a growing
realization that "measurement error is an ever-present, generally
significant, but usually neglected, feature of survey based income and
expenditure data" (Chesher and Schluter 2002, p. 377). It is
difficult to directly study these measurement errors because the true
value of expenditures is rarely known. Comparisons with other estimates,
such as household consumption in the National Accounts, are also fraught
with difficulty. In the absence of contrary evidence, assumptions about
errors being uncorrelated white noise continue to be made for reasons of
convenience (Bound, Brown, and Mathiowetz 2001).
In this article, we provide suggestive evidence of measurement
errors in food expenditures and budget shares being correlated with
household size in recall surveys. These surveys, where one respondent
gives a verbal report on the entire household's expenditure on a
number of items over some previous period, are used especially in
developing countries. It appears that in the absence of prompting from a
more detailed recall list a respondent in a recall survey is likely to
forget food expenditures, especially in larger households where there
are more purchases to remember. While they also may forget nonfoods, the
understatement for food may be greater due to its purchase frequency.
These measurement errors affect the estimated relationship between
household size and food demand, which is important for understanding
economies of scale within households. One common, although frequently
criticized, method of measuring scale economies is based on what is
sometimes called Engel's second law, the assertion that the food
share is an inverse indicator of welfare across households of different
sizes and compositions (Lanjouw and Ravallion 1995). This method may
mistake correlated errors in food expenditure data for genuine scale
economies.
Further motivation for studying these errors comes from Deaton and
Paxson (1998), who report the puzzling result that at constant per
capita expenditure (PCE), the budget share for food falls as household
size rises, especially in poorer countries. This pattern has been
confirmed by Gardes and Starzec (2000), Perali (2001), Abdulai (2003),
and Gan and Vernon (2003). Theory predicts the opposite pattern. Larger
households should have higher food demand because, at constant PCE,
resources released by the sharing of public goods can be spent on both
public and private goods, giving a positive income effect. Substitution
effects favor public goods, which are effectively cheaper in larger
households, but the income effect should be bigger for food whose
(absolute) own-price elasticity is likely to be lower than the income
elasticity, especially in poorer countries. Deaton and Paxson list
several possible explanations for their puzzle, including measurement
error, but none are considered convincing.
Measurement error may warrant more attention because the design of
the surveys used by Deaton and Paxson varies systematically across the
income distribution. The countries with the least puzzling results
(France and Britain) use diary surveys where each adult in the household
keeps a daily record of expenditure for two weeks. Surveys in the three
poorest countries, with the most puzzling results, ask respondents to
remember household food expenditure over the previous week (Thailand and
South Africa), month (South Africa) or year (Pakistan). These surveys
use broad commodity detail (i.e., short questionnaires) with only
twenty-six to thirty-eight food items specified (fifty-seven to
seventy-four items in total). In Taiwan and the United States, where the
results are not as puzzling, a mixture of diary and recall methods is
used. (2) This cross-country variation may contribute to the puzzling
effect of household size on food demand in poorer countries. But it is
hard to isolate the role of measurement error because factors associated
with other explanations also differ across countries. To overcome this
problem we focus on variation in household survey design and
implementation within countries, to hold other factors constant. Two
recent surveys from Cambodia and Indonesia provide this variation.
Other recent studies also follow this approach. For example,
Attanasio, Battistin, and Ichimura (2004) show quite different
inequality trends between the diary and recall samples of the U.S.
Consumer Expenditure Survey. Ahmed, Brzozowski, and Crossley (2005)
compare diaries and recall applied to the same households in the
Canadian Food Expenditure Survey. By assuming that the diaries measure
"true" food consumption they find measurement errors in the
recalled expenditures that are correlated with true values. There is
less correlation with household size, perhaps because their survey asks
a single question about food spending over the past month. Respondents
asked this question may not actually try to add up all of their
spending, which is referred to as episodic enumeration below, and
instead may use an estimation strategy. While episodic enumeration
should be harder for a respondent from a larger household because of the
greater number of transactions to remember, forming some estimate based
on assumptions about average spending may not be. Thus the results
reported here may not apply to single-question food recalls used in some
surveys in developed countries (Browning, Crossley, and Weber 2003).
The next section of the article reviews literature on household
survey design. Two examples where errors in food expenditure data may
affect results are then described. Analytical and Monte Carlo results
relating to measurement error in food share equations are then developed
and an econometric testing procedure is outlined. Finally, evidence from
the household surveys is described and compared with the results from
the Monte Carlo experiments. This comparison suggests that food
expenditure estimates from less detailed recall surveys have measurement
errors that are correlated with household size.
Previous Literature
Existing evidence suggests that the measurement of both food and
total expenditures is sensitive to survey design. Three design
variations are considered in the literature: recording in diaries versus
respondent recall in an interview, longer (more detailed) versus shorter
(less detailed) recall questionnaires, and different periods over which
expenditures are meant to be recalled.
In an experiment in Latvia, one half of the households were given a
diary for recording expenditures and in a subsequent period they were
given a recall survey, while the other half had the recall first and
then the diary. (3) Reported food expenditures were about 46% higher
with the diary, regardless of whether the diary was used first or second
(Scott and Okrasa 1998). Another split-sample experiment in urban Papua
New Guinea found (geometric) mean food expenditures to be 26% higher and
the food budget share six percentage points higher with the diary
(Gibson 2002). Moreover, the difference in food shares between the two
questionnaires appeared to be correlated with household size.
A recall experiment in El Salvador gave a long questionnaire
(seventy-five food items, twenty-five nonfoods) to one quarter of the
sample, with others given a short questionnaire (eighteen foods, six
nonfoods) covering the same items more broadly. Average per capita
consumption was 31% higher with the long questionnaire (Jolliffe 2001).
A similar experiment, which is repeated every three years in Indonesia,
gives one sample a questionnaire with twenty-three broad categories and
another one with 320 detailed categories that nest within the broad
ones. Average consumption is between 12% and 20% lower with the short
questionnaire and the difference between questionnaires appears to be
correlated with the level of expenditures (Pradhan 2001).
An experiment in Ghana varied recall periods, with reported
spending on a group of frequently purchased items falling by 2.9 % for
every day added to the recall period, with the recall error leveling off
at about 20% after two weeks (Scott and Amenuvegbe 1991). The Indian
National Sample Survey (NSS) experimented with using a "last
week" versus a "last month" recall and found that for the
all-food aggregate the estimates based on weekly recall were 21% higher
(NSSO 2003).
These examples of widely different estimates of expenditure when
two survey designs are used in the same setting indicate measurement
error because it cannot be true that estimates from both surveys are
right. It is tempting to go further than this and suggest that some
designs are more accurate than others but such beliefs remain unproven
because it is hard to obtain actual expenditures, which are needed if
survey estimates are to be validated. For example, the NSS experiments
attempted to form a gold standard by having enumerators visit households
every day and giving respondents volumetric containers for measuring
food consumption. The monthly recall for the all-food aggregate was only
83% of this standard compared with 93% for the weekly recall. But the
gold standard may not have been completely accurate because for some
foods less than two-thirds of respondents used the measuring containers
and many respondents did not use the daily diary supplied to them (NSSO
2003).
National Accounts (NA) estimates of household food consumption are
also not a plausible source of validation data, at least in developing
countries. Comparisons between survey and NA estimates of food
consumption have been hotly debated in India where both the survey and
national account statisticians have concluded that discrepancies more
likely reflect errors in the national accounts (Minhas 1988;
Kulshreshtha and Kar 2005). For example, some foods that are also
ingredients in restaurant meals get counted twice in NA estimates
because their use by the food-away-from-home (FAFH) sector is not
deducted when household consumption is derived from aggregate production
and net exports. The rising importance of FAFH with economic growth
induces a trend error in the national accounts (Deaton and Kozel 2005).
Moreover, expenditure in restaurants is classified as nonfood consumer
services in the NA estimates but as part of the food group in the
household surveys (Minhas 1988).
While validation data for directly studying measurement errors in
household expenditure surveys are hard to find, the literature on
cognitive processes gives plausible reasons for why variations in survey
design may create correlated measurement errors. First, information
appears to be encoded, and eventually retrieved, differently when
reporting for oneself rather than others (Eisenhower, Mathiowetz, and
Morganstein 1991). This may help explain why results for diary surveys
(with self-reporting) differ from recall surveys (with proxy reporting).
A special case of this proxy reporting is "composite
households" (those comprising individuals other than either a
single person, a couple or a couple or their children) who have much
greater item non-response for consumption questions (Browning, Crossley,
and Weber 2003). Larger households are more likely to be composite, (4)
so one reason why reported per capita expenditures may fall in larger
households is that item nonresponse wrongly gets treated as zero
spending.
Second, the cognitive strategies used by respondents depend on the
length of the recall period and the number of events in that period.
Respondents tend to give an actual count for infrequent events
("episodic enumeration") but for higher frequency events they
switch to an estimation strategy (Blair and Burton 1987). This matters
because enumeration and estimation are not equally reliable. According
to Eisenhower, Mathiowetz, and Morganstein (1991, p. 140) "when the
number of events are large or closely spaced [...] the direction of
response error would be predicted to be an [...] underestimation of
events." Hence, if a questionnaire uses a shorter, less detailed,
recall list, there will be more purchases in each category in a given
time period, especially for larger households. Thus, a respondent from a
larger household, when given a less detailed recall questionnaire, might
tend to use an estimation strategy that is likely to understate
frequent, closely spaced purchases. Food is typically purchased more
frequently than nonfood, and purchase frequency is more in proportion to
household size, so this understatement for larger households should
especially affect measured food expenditure. (5)
Third, the greater the length of a recall period over which a
respondent is required to remember information, the greater the expected
bias (Eisenhower, Mathiowetz, and Morganstein 1991). The errors related
to recall period are due either to telescoping, which is a mis-dating of
events, or recall decay, which is a forgetting of events. Telescoping is
most relevant to nonroutine events, and can bias survey reports either
upwards or downwards. But for routine events, like buying food, recall
decay is the most likely source of error. This decay could explain why
recall surveys often have lower expenditures than diary surveys because
most diary-keepers record on the day of their purchase so there should
be less memory loss. (6)
Two Motivating Examples for Studying Correlated Measurement Errors
Two motivations for studying correlated errors in food expenditure
data are that (1) they may cause empirical fragility in Engel method
estimates of household scale economies, adding to the other problem
besetting this method, which is its atheoretical nature, (7) and (2)
they may also at least partially cause the puzzle about food demand
reported by Deaton and Paxson. These two examples are in fact in
conflict with each other because the puzzle that Deaton and Paxson
report was identified during an attempt to develop an alternative to the
Engel method of measuring scale economies. An alternative was needed
because even though the Engel method is atheoretical it continues to be
used. The aim here is not to resolve that conflict, but rather, to show
how correlated measurement errors might affect the empirical results
reported in each area.
The Deaton and Paxson Puzzle
Deaton and Paxson use a version of the model first developed in
Barten (1964). An egalitarian household with n members allocates
consumption between food, [q.sub.f] and a nonfood good, such as housing,
[q.sub.h], in order to maximize utility, u:
(1) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
where x is total household expenditures, [p.sub.f] and [p.sub.h]
are the price of food and nonfood, and [[phi].sub.k](n) (where k = f, h)
is the scaling function that transforms the number of members, n into
"effective" size. (8) The commodity-specific degree of
economies of household scale is:
(2) [[sigma].sub.k] = 1 [partial derivative]ln[[phi].sub.k](n)/
[partial derivative]ln n.
The per capita food demand function is:
(3) [q.sub.f]/n = [[phi].sub.f](n)/n[g.sub.f](x/n,
[p.sub.f][[phi].sub.f](n)/n, [p.sub.h][[phi].sub.h](n)/n)
where [g.sub.f](x, [p.sub.f], [p.sub.h]) is the food demand
function for a single person household. Differentiating the logarithm of
equation (3) with respect to Inn yields the conditions needed if per
capita food consumption is to increase with household size, holding x/n
constant:
(4) [partial derivative]ln([q.sub.f]/n)/[partial derivative]ln n
> 0 [??] [[sigma].sub.h]([[epsilon].sub.fx] + [[epsilon].sub.ff]) -
[[sigma].sub.f] (1 + [[epsilon].sub.ff]) > 0
where [[epsilon].sub.ff] and [[epsilon].sub.fx] are the own-price
and income elasticities of demand for food. If nonfood contains some
public goods, so that [[sigma].sub.h] [not equal to] 0, while food is a
pure private good ([[sigma].sub.f] = 0), and if the (absolute) own-price
elasticity is less than the income elasticity of food demand, per capita
food consumption will increase with household size. This condition is
most likely to hold for poor consumers, so the positive effect of
household size on per capita food consumption (and hence food budget
shares) is predicted to be greatest in poor countries.
To test whether the empirical evidence is consistent with this
prediction, Deaton and Paxson estimate the following food share model on
household survey data from seven countries:
(5) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
where [r.sub.ji] = [n.sub.ji]/[n.sub.i] is the proportion of
persons in household i in demographic group j, z is a vector of other
household characteristics, [u.sub.i] is a disturbance term, and [alpha],
[beta], [gamma], [eta], and [delta] are parameters to be estimated.
While [??] was expected to be positive, especially for poor countries,
the empirical results showed the opposite pattern. Deaton and Paxson
estimated [??] to be negative in surveys from six out of seven countries
(positive only in Britain), and while it was close to zero for the rich
countries (-0.008 for the United States and France) it was quite large
for the poor countries (approximately -0.06 to -0.10 for Thailand,
Pakistan, and Africans in South Africa).
Several unsuccessful attempts have been made to explain this
puzzle. Horowitz (2002) suggests that the two-good model used to derive
the predictions is too restrictive. In a three-good model, food demand
rises with household size only if food and the public good are gross
complements. However, a multi-good equivalent to equation (4) derived by
Deaton and Paxson (2003) provides no resolution to the puzzle. Gan and
Vernon (2003) suggest that there are economies of scale in food
preparation but this only deepens the puzzle because a reduction in per
capita preparation costs should allow an increase in food expenditures
per head. Abdulai (2003) suggests that bulk discounts allow larger
households to spend less on food even as they consume more. But he
provides no evidence of these bulk discounts, other than a negative
effect of household size on the average unit value for all food--which
could just as easily reflect a tendency for larger households to buy
lower quality foods (Deaton 1997). It is therefore worth seeing whether
correlated errors bias [??] downwards especially because of the
variation in household survey methods among the countries considered by
Deaton and Paxson.
Engel Estimates of Household Scale Economies
A reparameterized version of equation (5) can provide Engel
estimates of size economies, albeit with assumptions substantially
different to those used by Deaton and Paxson. In the case of the Engel
method, no distinction is made between private and public goods (hence,
the economies of scale are not commodity-specific). Scale economies are
calculated by comparing the total outlays of different-sized households
with the same food shares. For example, Lanjouw and Ravallion (1995) use
data from Pakistan to estimate:
(6) [w.sub.f,i] = [alpha] + [beta] ln
([x.sub.i]/[n.sup.1-[sigma].sub.i]) + [J-1.summation over.(j=1)]
[[eta].sub.j] [r.sub.j,i] + [delta] x z + [u.sub.i],
which is identical to equation (5) because [gamma] = [beta][alpha].
(9) According to equation (6), if [x.sup.0] is the outlay of a
one-person household, an n-person household of the same composition
needs total outlay of [x.sup.0] [n.sup.1-[sigma]] to have the same food
share (and the same welfare level, by assumption).
Theoretical objections to this method have been raised at least
since Nicholson (1976) and it is not the aim to add to those here.
Instead, the aim is to see how correlated measurement error affects
these Engel estimates. Because the scale economy parameter, [sigma] is
just the ratio of [??] to [??], any measurement error that biases [??]
will affect Engel estimates of scale economies. For example, Lanjouw and
Ravallion estimate [sigma] to be 0.4, so if ten individuals in Pakistan
formed a ten-person household, their per capita food spending could go
down by 60% and they would still have the same level of welfare
([10.sup.0.6] = 3.98). These large scale economy estimates imply
improbable reductions in food spending per head for consumers in a poor
country (Deaton 1997). But if the estimates of [sigma] are sensitive to
measurement error, not only will the Engel method be theoretically
unfounded, it will also be shown to be empirically fragile.
Measurement Error and the Testing Procedure
Suppose that survey data on household expenditure is subject to
reporting error of the form:
(7) [[??].sub.i] = [x.sub.i] + [m.sub.i] + [v.sup.x.sub.i]
where [[??].sub.i] is the survey response, [x.sub.i] is the true
value of expenditure of the ith household, [m.sub.i] is a method effect,
due perhaps to the use of a less detailed recall questionnaire rather
than a more detailed one, and [v.sup.x.sub.i] is a pure random error. As
discussed above, the method effect in the measurement error, [m.sub.i]
may be negatively correlated with household size, [n.sub.i]. Thus it is
also assumed to be negatively correlated with household expenditure,
[x.sub.i] since [x.sub.i] is positively correlated with [n.sub.i].
Hence, the method effect can be expressed as:
(8) [m.sub.i] = [pi][[x.sub.i] + [v.sup.m.sub.i]
where [v.sup.m.sub.i] is a random deviation for the ith household
from the average method effect. Combining the two equations gives:
(9) [[??].sub.i] = [[lambda].sub.x][x.sub.i] + [v.sub.i]
where [v.sub.i]([equivalent to] [v.sup.m.sub.i] + [v.sup.x.sub.i]
is a pure random error and [[lambda].sub.x] ([equivalent to] 1 + [pi])
represents a potential correlation between the true values and the
method effect in the measurement error. Note that [[lambda].sub.x] is
the estimated slope in the regression of the method effect on the true
value plus 1. Classical measurement error is a special case of equation
(9) where [[lambda].sub.x] = 1. But with correlated errors, [pi] < 0
and (as long as measured expenditures are still positively correlated
with true values) the measurement error follows a mean-reverting pattern
(0 < [[lambda].sub.x] < 1). Thus, the expected value of measured
expenditures, E([??]) is the population mean of true expenditures scaled
down by [[lambda].sub.x] and this understatement is consistent with the
literature summarized above (e.g., Jolliffe 2001).
To see the implications of nonclassical (i.e., [[lambda].sub.x]
[not equal to] 1) measurement errors for regression parameters, consider
the following simplified version of the linear regression model used by
Deaton and Paxson (the demographic composition and control variables are
ignored):
(10) [w.sub.f,i] = [alpha] + [beta] ln [(x/n).sub.i] + [gamma] ln
[n.sub.i] + [u.sub.i].
The survey data on 1n