Entrepreneur: Start & Grow Your Business

Measurement error in recall surveys and the relationship between household size and food demand.


by Gibson, John^Kim, Bonggeun

Empirical research in agricultural and development economics increasingly uses data from household surveys. (1) There is a growing realization that "measurement error is an ever-present, generally significant, but usually neglected, feature of survey based income and expenditure data" (Chesher and Schluter 2002, p. 377). It is difficult to directly study these measurement errors because the true value of expenditures is rarely known. Comparisons with other estimates, such as household consumption in the National Accounts, are also fraught with difficulty. In the absence of contrary evidence, assumptions about errors being uncorrelated white noise continue to be made for reasons of convenience (Bound, Brown, and Mathiowetz 2001).

In this article, we provide suggestive evidence of measurement errors in food expenditures and budget shares being correlated with household size in recall surveys. These surveys, where one respondent gives a verbal report on the entire household's expenditure on a number of items over some previous period, are used especially in developing countries. It appears that in the absence of prompting from a more detailed recall list a respondent in a recall survey is likely to forget food expenditures, especially in larger households where there are more purchases to remember. While they also may forget nonfoods, the understatement for food may be greater due to its purchase frequency.

These measurement errors affect the estimated relationship between household size and food demand, which is important for understanding economies of scale within households. One common, although frequently criticized, method of measuring scale economies is based on what is sometimes called Engel's second law, the assertion that the food share is an inverse indicator of welfare across households of different sizes and compositions (Lanjouw and Ravallion 1995). This method may mistake correlated errors in food expenditure data for genuine scale economies.

Further motivation for studying these errors comes from Deaton and Paxson (1998), who report the puzzling result that at constant per capita expenditure (PCE), the budget share for food falls as household size rises, especially in poorer countries. This pattern has been confirmed by Gardes and Starzec (2000), Perali (2001), Abdulai (2003), and Gan and Vernon (2003). Theory predicts the opposite pattern. Larger households should have higher food demand because, at constant PCE, resources released by the sharing of public goods can be spent on both public and private goods, giving a positive income effect. Substitution effects favor public goods, which are effectively cheaper in larger households, but the income effect should be bigger for food whose (absolute) own-price elasticity is likely to be lower than the income elasticity, especially in poorer countries. Deaton and Paxson list several possible explanations for their puzzle, including measurement error, but none are considered convincing.

Measurement error may warrant more attention because the design of the surveys used by Deaton and Paxson varies systematically across the income distribution. The countries with the least puzzling results (France and Britain) use diary surveys where each adult in the household keeps a daily record of expenditure for two weeks. Surveys in the three poorest countries, with the most puzzling results, ask respondents to remember household food expenditure over the previous week (Thailand and South Africa), month (South Africa) or year (Pakistan). These surveys use broad commodity detail (i.e., short questionnaires) with only twenty-six to thirty-eight food items specified (fifty-seven to seventy-four items in total). In Taiwan and the United States, where the results are not as puzzling, a mixture of diary and recall methods is used. (2) This cross-country variation may contribute to the puzzling effect of household size on food demand in poorer countries. But it is hard to isolate the role of measurement error because factors associated with other explanations also differ across countries. To overcome this problem we focus on variation in household survey design and implementation within countries, to hold other factors constant. Two recent surveys from Cambodia and Indonesia provide this variation.

Other recent studies also follow this approach. For example, Attanasio, Battistin, and Ichimura (2004) show quite different inequality trends between the diary and recall samples of the U.S. Consumer Expenditure Survey. Ahmed, Brzozowski, and Crossley (2005) compare diaries and recall applied to the same households in the Canadian Food Expenditure Survey. By assuming that the diaries measure "true" food consumption they find measurement errors in the recalled expenditures that are correlated with true values. There is less correlation with household size, perhaps because their survey asks a single question about food spending over the past month. Respondents asked this question may not actually try to add up all of their spending, which is referred to as episodic enumeration below, and instead may use an estimation strategy. While episodic enumeration should be harder for a respondent from a larger household because of the greater number of transactions to remember, forming some estimate based on assumptions about average spending may not be. Thus the results reported here may not apply to single-question food recalls used in some surveys in developed countries (Browning, Crossley, and Weber 2003).

The next section of the article reviews literature on household survey design. Two examples where errors in food expenditure data may affect results are then described. Analytical and Monte Carlo results relating to measurement error in food share equations are then developed and an econometric testing procedure is outlined. Finally, evidence from the household surveys is described and compared with the results from the Monte Carlo experiments. This comparison suggests that food expenditure estimates from less detailed recall surveys have measurement errors that are correlated with household size.

Previous Literature

Existing evidence suggests that the measurement of both food and total expenditures is sensitive to survey design. Three design variations are considered in the literature: recording in diaries versus respondent recall in an interview, longer (more detailed) versus shorter (less detailed) recall questionnaires, and different periods over which expenditures are meant to be recalled.

In an experiment in Latvia, one half of the households were given a diary for recording expenditures and in a subsequent period they were given a recall survey, while the other half had the recall first and then the diary. (3) Reported food expenditures were about 46% higher with the diary, regardless of whether the diary was used first or second (Scott and Okrasa 1998). Another split-sample experiment in urban Papua New Guinea found (geometric) mean food expenditures to be 26% higher and the food budget share six percentage points higher with the diary (Gibson 2002). Moreover, the difference in food shares between the two questionnaires appeared to be correlated with household size.

A recall experiment in El Salvador gave a long questionnaire (seventy-five food items, twenty-five nonfoods) to one quarter of the sample, with others given a short questionnaire (eighteen foods, six nonfoods) covering the same items more broadly. Average per capita consumption was 31% higher with the long questionnaire (Jolliffe 2001). A similar experiment, which is repeated every three years in Indonesia, gives one sample a questionnaire with twenty-three broad categories and another one with 320 detailed categories that nest within the broad ones. Average consumption is between 12% and 20% lower with the short questionnaire and the difference between questionnaires appears to be correlated with the level of expenditures (Pradhan 2001).

An experiment in Ghana varied recall periods, with reported spending on a group of frequently purchased items falling by 2.9 % for every day added to the recall period, with the recall error leveling off at about 20% after two weeks (Scott and Amenuvegbe 1991). The Indian National Sample Survey (NSS) experimented with using a "last week" versus a "last month" recall and found that for the all-food aggregate the estimates based on weekly recall were 21% higher (NSSO 2003).

These examples of widely different estimates of expenditure when two survey designs are used in the same setting indicate measurement error because it cannot be true that estimates from both surveys are right. It is tempting to go further than this and suggest that some designs are more accurate than others but such beliefs remain unproven because it is hard to obtain actual expenditures, which are needed if survey estimates are to be validated. For example, the NSS experiments attempted to form a gold standard by having enumerators visit households every day and giving respondents volumetric containers for measuring food consumption. The monthly recall for the all-food aggregate was only 83% of this standard compared with 93% for the weekly recall. But the gold standard may not have been completely accurate because for some foods less than two-thirds of respondents used the measuring containers and many respondents did not use the daily diary supplied to them (NSSO 2003).

National Accounts (NA) estimates of household food consumption are also not a plausible source of validation data, at least in developing countries. Comparisons between survey and NA estimates of food consumption have been hotly debated in India where both the survey and national account statisticians have concluded that discrepancies more likely reflect errors in the national accounts (Minhas 1988; Kulshreshtha and Kar 2005). For example, some foods that are also ingredients in restaurant meals get counted twice in NA estimates because their use by the food-away-from-home (FAFH) sector is not deducted when household consumption is derived from aggregate production and net exports. The rising importance of FAFH with economic growth induces a trend error in the national accounts (Deaton and Kozel 2005). Moreover, expenditure in restaurants is classified as nonfood consumer services in the NA estimates but as part of the food group in the household surveys (Minhas 1988).

While validation data for directly studying measurement errors in household expenditure surveys are hard to find, the literature on cognitive processes gives plausible reasons for why variations in survey design may create correlated measurement errors. First, information appears to be encoded, and eventually retrieved, differently when reporting for oneself rather than others (Eisenhower, Mathiowetz, and Morganstein 1991). This may help explain why results for diary surveys (with self-reporting) differ from recall surveys (with proxy reporting). A special case of this proxy reporting is "composite households" (those comprising individuals other than either a single person, a couple or a couple or their children) who have much greater item non-response for consumption questions (Browning, Crossley, and Weber 2003). Larger households are more likely to be composite, (4) so one reason why reported per capita expenditures may fall in larger households is that item nonresponse wrongly gets treated as zero spending.

Second, the cognitive strategies used by respondents depend on the length of the recall period and the number of events in that period. Respondents tend to give an actual count for infrequent events ("episodic enumeration") but for higher frequency events they switch to an estimation strategy (Blair and Burton 1987). This matters because enumeration and estimation are not equally reliable. According to Eisenhower, Mathiowetz, and Morganstein (1991, p. 140) "when the number of events are large or closely spaced [...] the direction of response error would be predicted to be an [...] underestimation of events." Hence, if a questionnaire uses a shorter, less detailed, recall list, there will be more purchases in each category in a given time period, especially for larger households. Thus, a respondent from a larger household, when given a less detailed recall questionnaire, might tend to use an estimation strategy that is likely to understate frequent, closely spaced purchases. Food is typically purchased more frequently than nonfood, and purchase frequency is more in proportion to household size, so this understatement for larger households should especially affect measured food expenditure. (5)

Third, the greater the length of a recall period over which a respondent is required to remember information, the greater the expected bias (Eisenhower, Mathiowetz, and Morganstein 1991). The errors related to recall period are due either to telescoping, which is a mis-dating of events, or recall decay, which is a forgetting of events. Telescoping is most relevant to nonroutine events, and can bias survey reports either upwards or downwards. But for routine events, like buying food, recall decay is the most likely source of error. This decay could explain why recall surveys often have lower expenditures than diary surveys because most diary-keepers record on the day of their purchase so there should be less memory loss. (6)

Two Motivating Examples for Studying Correlated Measurement Errors

Two motivations for studying correlated errors in food expenditure data are that (1) they may cause empirical fragility in Engel method estimates of household scale economies, adding to the other problem besetting this method, which is its atheoretical nature, (7) and (2) they may also at least partially cause the puzzle about food demand reported by Deaton and Paxson. These two examples are in fact in conflict with each other because the puzzle that Deaton and Paxson report was identified during an attempt to develop an alternative to the Engel method of measuring scale economies. An alternative was needed because even though the Engel method is atheoretical it continues to be used. The aim here is not to resolve that conflict, but rather, to show how correlated measurement errors might affect the empirical results reported in each area.

The Deaton and Paxson Puzzle

Deaton and Paxson use a version of the model first developed in Barten (1964). An egalitarian household with n members allocates consumption between food, [q.sub.f] and a nonfood good, such as housing, [q.sub.h], in order to maximize utility, u:

(1) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where x is total household expenditures, [p.sub.f] and [p.sub.h] are the price of food and nonfood, and [[phi].sub.k](n) (where k = f, h) is the scaling function that transforms the number of members, n into "effective" size. (8) The commodity-specific degree of economies of household scale is:

(2) [[sigma].sub.k] = 1 [partial derivative]ln[[phi].sub.k](n)/ [partial derivative]ln n.

The per capita food demand function is:

(3) [q.sub.f]/n = [[phi].sub.f](n)/n[g.sub.f](x/n, [p.sub.f][[phi].sub.f](n)/n, [p.sub.h][[phi].sub.h](n)/n)

where [g.sub.f](x, [p.sub.f], [p.sub.h]) is the food demand function for a single person household. Differentiating the logarithm of equation (3) with respect to Inn yields the conditions needed if per capita food consumption is to increase with household size, holding x/n constant:

(4) [partial derivative]ln([q.sub.f]/n)/[partial derivative]ln n > 0 [??] [[sigma].sub.h]([[epsilon].sub.fx] + [[epsilon].sub.ff]) - [[sigma].sub.f] (1 + [[epsilon].sub.ff]) > 0

where [[epsilon].sub.ff] and [[epsilon].sub.fx] are the own-price and income elasticities of demand for food. If nonfood contains some public goods, so that [[sigma].sub.h] [not equal to] 0, while food is a pure private good ([[sigma].sub.f] = 0), and if the (absolute) own-price elasticity is less than the income elasticity of food demand, per capita food consumption will increase with household size. This condition is most likely to hold for poor consumers, so the positive effect of household size on per capita food consumption (and hence food budget shares) is predicted to be greatest in poor countries.

To test whether the empirical evidence is consistent with this prediction, Deaton and Paxson estimate the following food share model on household survey data from seven countries:

(5) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [r.sub.ji] = [n.sub.ji]/[n.sub.i] is the proportion of persons in household i in demographic group j, z is a vector of other household characteristics, [u.sub.i] is a disturbance term, and [alpha], [beta], [gamma], [eta], and [delta] are parameters to be estimated. While [??] was expected to be positive, especially for poor countries, the empirical results showed the opposite pattern. Deaton and Paxson estimated [??] to be negative in surveys from six out of seven countries (positive only in Britain), and while it was close to zero for the rich countries (-0.008 for the United States and France) it was quite large for the poor countries (approximately -0.06 to -0.10 for Thailand, Pakistan, and Africans in South Africa).

Several unsuccessful attempts have been made to explain this puzzle. Horowitz (2002) suggests that the two-good model used to derive the predictions is too restrictive. In a three-good model, food demand rises with household size only if food and the public good are gross complements. However, a multi-good equivalent to equation (4) derived by Deaton and Paxson (2003) provides no resolution to the puzzle. Gan and Vernon (2003) suggest that there are economies of scale in food preparation but this only deepens the puzzle because a reduction in per capita preparation costs should allow an increase in food expenditures per head. Abdulai (2003) suggests that bulk discounts allow larger households to spend less on food even as they consume more. But he provides no evidence of these bulk discounts, other than a negative effect of household size on the average unit value for all food--which could just as easily reflect a tendency for larger households to buy lower quality foods (Deaton 1997). It is therefore worth seeing whether correlated errors bias [??] downwards especially because of the variation in household survey methods among the countries considered by Deaton and Paxson.

Engel Estimates of Household Scale Economies

A reparameterized version of equation (5) can provide Engel estimates of size economies, albeit with assumptions substantially different to those used by Deaton and Paxson. In the case of the Engel method, no distinction is made between private and public goods (hence, the economies of scale are not commodity-specific). Scale economies are calculated by comparing the total outlays of different-sized households with the same food shares. For example, Lanjouw and Ravallion (1995) use data from Pakistan to estimate:

(6) [w.sub.f,i] = [alpha] + [beta] ln ([x.sub.i]/[n.sup.1-[sigma].sub.i]) + [J-1.summation over.(j=1)] [[eta].sub.j] [r.sub.j,i] + [delta] x z + [u.sub.i],

which is identical to equation (5) because [gamma] = [beta][alpha]. (9) According to equation (6), if [x.sup.0] is the outlay of a one-person household, an n-person household of the same composition needs total outlay of [x.sup.0] [n.sup.1-[sigma]] to have the same food share (and the same welfare level, by assumption).

Theoretical objections to this method have been raised at least since Nicholson (1976) and it is not the aim to add to those here. Instead, the aim is to see how correlated measurement error affects these Engel estimates. Because the scale economy parameter, [sigma] is just the ratio of [??] to [??], any measurement error that biases [??] will affect Engel estimates of scale economies. For example, Lanjouw and Ravallion estimate [sigma] to be 0.4, so if ten individuals in Pakistan formed a ten-person household, their per capita food spending could go down by 60% and they would still have the same level of welfare ([10.sup.0.6] = 3.98). These large scale economy estimates imply improbable reductions in food spending per head for consumers in a poor country (Deaton 1997). But if the estimates of [sigma] are sensitive to measurement error, not only will the Engel method be theoretically unfounded, it will also be shown to be empirically fragile.

Measurement Error and the Testing Procedure

Suppose that survey data on household expenditure is subject to reporting error of the form:

(7) [[??].sub.i] = [x.sub.i] + [m.sub.i] + [v.sup.x.sub.i]

where [[??].sub.i] is the survey response, [x.sub.i] is the true value of expenditure of the ith household, [m.sub.i] is a method effect, due perhaps to the use of a less detailed recall questionnaire rather than a more detailed one, and [v.sup.x.sub.i] is a pure random error. As discussed above, the method effect in the measurement error, [m.sub.i] may be negatively correlated with household size, [n.sub.i]. Thus it is also assumed to be negatively correlated with household expenditure, [x.sub.i] since [x.sub.i] is positively correlated with [n.sub.i]. Hence, the method effect can be expressed as:

(8) [m.sub.i] = [pi][[x.sub.i] + [v.sup.m.sub.i]

where [v.sup.m.sub.i] is a random deviation for the ith household from the average method effect. Combining the two equations gives:

(9) [[??].sub.i] = [[lambda].sub.x][x.sub.i] + [v.sub.i]

where [v.sub.i]([equivalent to] [v.sup.m.sub.i] + [v.sup.x.sub.i] is a pure random error and [[lambda].sub.x] ([equivalent to] 1 + [pi]) represents a potential correlation between the true values and the method effect in the measurement error. Note that [[lambda].sub.x] is the estimated slope in the regression of the method effect on the true value plus 1. Classical measurement error is a special case of equation (9) where [[lambda].sub.x] = 1. But with correlated errors, [pi] < 0 and (as long as measured expenditures are still positively correlated with true values) the measurement error follows a mean-reverting pattern (0 < [[lambda].sub.x] < 1). Thus, the expected value of measured expenditures, E([??]) is the population mean of true expenditures scaled down by [[lambda].sub.x] and this understatement is consistent with the literature summarized above (e.g., Jolliffe 2001).

To see the implications of nonclassical (i.e., [[lambda].sub.x] [not equal to] 1) measurement errors for regression parameters, consider the following simplified version of the linear regression model used by Deaton and Paxson (the demographic composition and control variables are ignored):

(10) [w.sub.f,i] = [alpha] + [beta] ln [(x/n).sub.i] + [gamma] ln [n.sub.i] + [u.sub.i].

The survey data on 1n