Growing concentration in the retail grocery sector raises new
economic questions that are difficult to answer with existing data
sources. The data problems are due in large part to concentration in the
retail data industry, where data are collected for commercial rather
than academic research. Currently available grocery-level datasets are
extremely expensive, are not properly randomized, and lack critical
information.
To focus our discussion, we address data needs for industrial
organization and marketing, nutrition and food safety, and government
policy studies. The growing concentration at the grocery retail level
raises a variety of industrial organization and marketing questions,
such as: Has this greater concentration increased market power or
changed the vertical relationship between manufacturers and other
suppliers with retailers? Has the entry of low-price superstores
fundamentally changed the services provided, the degree of product
differentiation, the provision of private label products, and other
actions by traditional supermarkets? What caused the mergers to occur?
Similarly, we want to know if greater concentration has affected
the nation's nutrition and food safety, such as by making
catastrophic food safety disasters more likely. Have increased product
differentiation and lower prices from changes in retailing contributed
substantially to alarming increases in rates of obesity?
Finally, we want to know how government rules and regulations have
affected these markets and consumers. To protect consumers' health,
the government has imposed restrictions on selling certain goods when
food safety issues arise (e.g., mad cow disease and E. coli in lettuce
and spinach). The government also provides nutritional and other label
information (e.g., concerning health foods and organic foods) to help
consumers make more informed food choices. What have been the effects of
these laws and regulations on markets and on the health of various
groups of consumers? We discuss the increase in concentration at the
retail level, commercial databases, data needs for a number of important
research areas, and possible solutions.
Concentration in Retail Markets
Grocery retailing markets are much more concentrated today than
they were two decades ago. This increased concentration has altered the
relationship between manufacturers and retailers. Although most existing
empirical studies based on grocery scanner data implicitly presume that
manufacturers set prices and retailers passively add on a competitive
markup, there is substantial evidence (e.g., Villas-Boas) that such a
description of the market is no longer true, if it ever was.
Mergers and acquisitions by large grocery retailers, including
Kroger Co., Albertson's, Ahold USA, and Safeway, have significantly
increased concentration ratios. Between 1997 and 2000, more than 4,100
U.S. supermarkets were acquired, representing $69 billion in sales. The
four-firm concentration ratio (C4) increased from 16.6 percent in 1992
to 35.5 percent in 2005 (see figure 1). This trend toward increased
concentration has continued with Supervalu's acquisition of
third-ranked Albertson's in 2006 and the growth of Wal-Mart
(Kaufman 2007).
[FIGURE 1 OMITTED]
Companies that were not involved in the food business two decades
ago, such as Wal-Mart and Target, now account for a significant share of
consumers' food-at-home expenditures. Since 1994, nontraditional
food retailers (supercenters, warehouse clubs, mass merchandisers,
drugstores, and dollar stores) have steadily increased their market
share by about 28 percentage points to 31.6 percent in 2005. Led by
Wal-Mart, most of this growth is attributed to supercenters that
commanded 17.1 percent of the food-at-home retail markets in 2005
(Kaufman 2007).
It took Wal-Mart just four years of aggressive supercenter growth
to become the largest U.S. grocery chain by 2002. Wal-Mart's large
share is due to its relatively low prices, which are driven by scale
economies and efficient operations based on buying directly from
suppliers. Wal-Mart's approach has started a domino effect,
significantly changing the retail food market's landscape.
Warehouse club and mass-merchandisers have adopted this strategy,
further intensifying price competition as more consumers have switched
from shopping at supermarkets to low-price, large-scale operations.
Many supermarkets and other traditional grocery retailers have
reacted by expanding their operations through merger and acquisition
strategies, introducing a wider variety of new products (e.g., organic
and natural foods, upgrade store brands, and convenience foods),
promoting new store formats, introducing self-checkout stations,
expanding frequent shopper card programs, and offering online home
shopping services. Some researchers contend mergers and acquisitions are
driven by a search for efficiencies associated with consolidation as
supermarkets are increasingly pressured to meet price competition from
non-traditional food retailers like Wal-Mart. Others contend that
mergers increase the market power of supermarkets and increase prices
for consumers.
Growing retail concentration has not only changed the nature of
competition at the retail level, it has greatly affected the vertical
relations along the marketing chain. As a result of the competitive
pressures from Wal-Mart and other nontraditional formats, many firms in
the grocery industry have resorted to what the industry refers to as
efficient consumer response. These methods are designed to enhance
timely, accurate, continuous, consistent flow of products that are
matched to consumer demands. The initiative focuses on reengineering
activities in the selection of product assortments, product
replenishment, product promotions, and new product introductions.
Information on the type and extent of these business practices are not
readily available, thus impeding efforts to examine their impact on
prices and consumer welfare. Further, many researchers believe that the
now larger retail vendors are exercising their increased oligopsony
power to lower prices paid to suppliers and increasingly charging
manufacturers slotting fees, which are lump-sum fees for carrying a new
product or continuing to carry an existing one.
Commercial DataBases
Agricultural economists have studied a variety of demand, health,
marketing, and industrial organization questions using data from grocery
chains or proprietary retail grocery scanner data. Stores' loyalty
card datasets do not include detailed information on household
demographics and are potentially subject to more measurement errors due
to infrequent use of loyalty cards or use of someone else's card
for convenience. Moreover, grocery chains rarely make their databases
available to researchers.
Today, the only two major firms providing such scanner data are
Information Resources, Inc. (IRI) and Nielsen (formerly known as
AC-Nielsen). Their datasets are constructed primarily for marketing
purposes and are used by retailers, manufacturers, and farm commodity
groups. Usually, these firms charge researchers prices comparable to
those they charge their commercial customers, so that a dataset covering
only a few commodities for the most recent year may cost hundreds of
thousands of dollars.
The current major point-of-sale or store scanner data sources are
IRI's InfoScan and Nielsen's ScanTrack. Store scanner data are
collected at cash registers, while household scanner data are obtained
from a sample of households that scan their purchases after each
shopping trip. Over the past ten years, IRI and Nielsen also have begun
to track grocery purchases by specific households. Nielsen's
household scanner dataset is Homescan and IRI's is Consumer
Network. (Knowledge Networks is also developing a household-based
scanner data panel.)
These datasets provide richer household demographic information
than are available in store scanner data (Muth, Siegel, and Zhen 2007).
Because IRI and Nielsen instruct the household scanner data panelists to
scan all purchases from all outlets, the datasets from household-based
scanner data are more complete than grocery datasets of purchases of
individual households collected through loyalty card users.
In addition to being expensive, commercial datasets come with
significant restrictions on how they may be used (e.g., brand market
shares may not be reported) and do not provide all critical information
needed for many important research topics. For example, although
feasible, they do not have information on whether a specific low-income
household is a Women, Infants, and Children (WIC) program participant,
they do not provide any details on retailers' cost of operation
(e.g., wholesale prices), and the household scanner databases tack
prices of nonpurchased items for demand studies.
Because scanner data are proprietary and are not primarily designed
for academic research, detailed documentation on sampling and data
collection procedures and statistical properties of the data are not
readily available. Although few academic papers that use IRI and Nielsen
data discuss the quality of these datasets, there is good reason to
question whether these firms use proper random sampling techniques. In
the store-based scanner data, large, traditional supermarket chains are
over-represented (because they supply data and hence are included with
certainty, as opposed to smaller stores that are sampled). In addition,
store-based scanner data may not adequately include new sources of food
sales (Wal-Mart supercenters and other big box stores, and WIC-only
stores).
Muth, Siegel, and Zhen (2007) document the data collection process
for Nielsen's Homescan data and identify potential sources of bias:
sample design, self-selection, self-reporting, nonresponse, and
attrition. However, no formal statistical studies have been conducted to
measure the magnitude of the actual presence or the size of any
potential bias. The households included in the sample are not
probability based and randomly drawn from the community, and hence
Homescan is a convenience sample.
We compared the U.S. Census demographic information with sample
averages from IRI InfoScan by zip code area for all the zip coeds in the
1999 IRI dataset. Table 1 shows the averages across the zip code areas.
IRI values could differ from Census data because only a subset of
grocery stores is sampled within any given zip code or because the
sampled households who shop at those grocery stores are not
representative. In our sample, the IRI sample values have relatively
large standard errors, so that we cannot conclude that the means of
demographic variables in the Census and IRI datasets differ
statistically significantly. However, in most zip codes areas, IRI
households are younger, more likely to be white, larger, and more likely
to be neither poor nor rich than are Census households (that is,
typically large, white middle-class families).
Data Problems for Research
Purveyors of proprietary scanner data focus on the most recent
marketing information for the industry and not on creating datasets that
are ideal for research. In the proprietary datasets, short time series
and lack of information from other levels of the production chain and
other missing variables limit the type of academic studies that are
possible.
Industrial Organization and Marketing Studies
These datasets lack information that would facilitate studies of
market power and vertical relations between manufacturers and retailers
or suppliers.
To study markups over the food chain, other vertical relations, and
food safety questions, we need to trace goods from the farm to the
consumer. Most industrial organization studies and many nutritional and
other studies require one to estimate a system of demand equations,
which is often difficult with existing databases for three reasons.
First, the relevant prices are usually unavailable. Household
datasets include prices for only purchased goods. In a few cases,
researchers have matched store-level data with household data (or
purchases by other households) to obtain the missing prices.
Disturbingly, the price data from the grocery dataset do not always
match that from the household dataset, and we lack any means of
reconciling these differences.
Second, the actual transaction price is not obvious from the
reported information. It is not possible to determine if the price
reflects all discounts, coupons, and taxes. The commercial databases do
not record whether the purchases were made using food stamps or WIC
vouchers, which preclude studies of such programs and may bias standard
demand equation estimates.
Third, the databases do not report shelf space allocations, local
restrictions, or store warnings, all relevant advertising, information
provided on the products (e.g., fat, health, safety, price per unit, and
whether the product is organic), wholesale prices, slotting allowances,
other transfers and restrictions between manufacturers and retailers,
and government program information (e.g., WIC and food stamps).
Because the databases cover only a nonrandom subset of stores,
conducting industrial organization studies of horizontal competition
between stores is difficult. We do not have a complete enough set of
stores to conduct spatial studies of pricing and other subjects.
Research findings on the economics of consumer behavior provide insights
into the effects of neighborhood characteristics on consumers'
choices in differentiated product markets (cf. Waldfogel 2003).
Nutritional and Food Safety Studies
The high societal costs associated with obesity have intensified
the need to identify and understand the factors that influence food
choices and the effects of these choices on an individual's health.
Extensive studies on consumer food demand show that food choices may
depend on food prices, income levels, time available to shop and prepare
meals, human capital resources, such as education and type of
employment, and consumers' attitude, perception, diet, and
nutrition knowledge, as well as psychological factors. Economic studies
of these issues are greatly hampered by a lack of consistent and
integrated data and information.
No single reliable source currently provides or could provide all
of the information required for a myriad of studies that could be
undertaken. A number of data sources do provide some of the information,
but each is weak in critical areas. A 2005 report by the National
Research Council of the National Academies (NRC) recommended enhancing
usability of various key data systems to support research on critical
U.S. food and nutrition policies. Adopting the NRC's recommendation
to create integrated and consistent data would help researchers to
better understand how consumers' food choices, diets, and health
are affected by changes in food prices, neighborhood characteristics,
access to food stores and restaurants, behavioral factors, and by
participation in food assistance programs.
The National Health and Nutrition Examination Survey (NHANES),
conducted by the National Center for Health Statistics of the Centers
for Disease Control, measures food intakes and an array of health
outcomes for a representative population, but no information on prices
of foods eaten by survey respondents is collected. Adding price
information from other existing sources would enable research on drivers
of consumer food choice and their connections to health outcomes for
various population subgroups and regions overtime. Measuring consumer
price responsiveness is a critical component of a sound policy strategy.
Beyond characterizing consumer preferences, information on price
responsiveness enables researchers to evaluate the effects of taxes and
subsidies on consumption of various foods and the nutrients they
contain. Further, without controlling for price variations, researchers
cannot consistently estimate the role of other factors. Adding data and
information on consumer attitude, perception, diet, and nutrition
knowledge, and psychological factors to the NHANES intake data would
facilitate studies of the drivers of the obesity epidemic.
Currently, no dataset provides the capability to trace foods back
to their sources. Plans of Wal-Mart and others to use radio signals
(RFID tags) to track goods from the manufacturer to the retailer or
final consumer raise privacy concerns but also may provide a means to
examine important questions concerning food safety, food quality, and
various vertical integration issues. However, we know of no plans to
make such information available to researchers. Indeed, manufacturing
and retailing firms may not want such information disseminated.
Nutritional studies are hampered by a lack of datasets that cover
both food at home and food in restaurants. As Americans have
increasingly switched from home-cooked meals to processed foods or
restaurant meals, the substitution patterns between these types of meals
has substantial public policy importance.
Government Programs
Many studies of government programs require time series data.
Bizarrely, both IRI and Nielsen usually discard data that are more than
three years old, making many time series or historical studies of
government laws and regulations difficult or impossible to conduct. For
example, data from these sources before and after the recent change in
the U.S. rules on organic foods are generally not available either
because datasets.
Food assistance programs are designed to provide a nutritional
safety net, guaranteeing a minimum level of access to essential
nutrients for participants. Empirical evidence on the extent to which
the programs affect consumption, nutrient intake, and obesity provides
critical information about the current effectiveness of the programs.
Combining the existing measures of consumption patterns and the health
status of program participants with this information on benefit levels
and duration of participation will help to reveal the critical link
between food assistance programs and the diet, nutrition, and health
outcomes of program participants. For example, accounting for how long
participants have been in the sample can help researchers determine if
the sizes of the program's effects differ depending on the duration
of participation.
The NHANES queries respondents about their program participation
and benefits. However, studies show that self-reported information is
systematically underreported in many surveys, including NHANES. For
example, in 2004, the Current Population Survey captured 60% of average
monthly caseloads and 58% of annual benefits (Bollinger and David 2005).
Administrative records can be used to correct this underreporting and
avoid analytical results that would otherwise be biased.
Supplementing the NHANES dataset with administrative records would
allow researchers to study the connection between food choices and
neighborhood characteristics, particularly for low-income households in
urban and rural areas. To the extent that NHANES includes such
households, researchers could correlate health and nutrition outcomes
with household and location characteristics. A link between NHANES data
and information on the location of food stores and eating establishments
would also enhance efforts to understand the effects of access on food
choices and health outcomes. Information on locations and
characteristics of food stores and foodservice establishments can be
collected using proprietary sources, such as Spectra[R] and NPD. Linking
NHANES to household and local community descriptors in the Census's
American Community Survey will help researchers understand how
neighborhood characteristics influence food choices and health outcomes.
Improving Datasets
We have a simple and obvious message. With more data, economists
could analyze additional, important issues of economic theory and
government policies.
Because data lack rivalry (everyone can consume the data), society
under-provides data. Relying on commercial vendors is unattractive
because these firms charge very high prices, do not fully disclose the
nature of their data, provide data for only very short periods, and
report only variables that are important for commercial customers and
not all variables that are important for researchers.
One approach to ameliorating data shortages for research would be
to have government agencies or nonprofit organizations collect the ideal
datasets or provide incentives to commercial providers. Fundamentally,
researchers need access to unrestricted data based on proper random
samples and that include all the relevant variables.
First, to enable unfettered assess, to improve content, and to
obtain better prices, it may make sense for university and government
researchers and organizations (the AAEA, government agencies, business
school organizations, the American Economic Association, and others) to
try to negotiate with private purveyors collectively. They might also
negotiate to house, at little or no cost, historical IRI and Nielsen
data that are now discarded so that longer time series and additional
variables can be created. However, such collective action might raise
antitrust issues.
Second, these research groups could try to make arrangements with
individual firms to supply data. We know of at least two supermarket
chains that have been willing to make such agreements in the past. The
AAEA could lead efforts to select representative samples of suppliers to
collect details of proprietary transaction data and provide them to
researchers so that privacy and confidentiality of the data are
maintained.
Third, these research organizations could collaborate to collect
data on their own. Even discussing this possibility may facilitate
negotiations with commercial data purveyors.
On a less grand scale, we have a laundry list of new datasets that
would be particularly useful. First, industrial organization and food
safety studies require information at both the retail and upstream
levels, including information about wholesale prices, food sources,
various slotting and tying relations, and government programs.
Second, nutritional studies need datasets that combine information
on food-at-home and away-from-home, nutritional content of these various
foods, and prices. Because consumer studies find substantial variation
in nutritional consumption across demographic groups and neighborhoods,
datasets are needed that cover a broad cross-section.
Third, health and nutrition studies would benefit substantially if
we could link the intake and health data with administrative food
assistance records to add levels and duration of program assistance.
Such a link would have to address two challenging issues: (1) privacy
and confidentiality conditions under which states collect the
administrative data must be met to access the data for linking purposes
and (2) variation of data formats across states makes linking these sets
to survey data difficult. In addition, given the relatively small
effects of price and income on food choices, addressing the obesity
epidemic may require collection of new data on consumers' health
and nutritional knowledge, attitudes, and available time to shop and
prepare meals to undertake economic studies to understand consumer
dietary behavior.
References
Bollinger, C., and M. David. 2005. "I Didn't Tell, and I
Won't Tell: Dynamic Response Error in the SIPP." Journal of
Applied Econometrics 20:563-69.
Kaufman, E 2007. "Food Market Structures: Food
Retailing." U.S. Department of Agriculture, Economic Research
Service. www. ers.usda.gov/Briefing/FoodMarketStructures/
foodretailing.htm (2007).
Muth, M.K., EH. Siegel, and C. Zhen. 2007. "ERS Data Quality
Study Design." Final Report, Research Triangle Institute, Project
Number 210153.001.
National Research Council of National Academies. 2005.
"Improving Data to Analyze Food and Nutrition Policies."
Committee on National Statistics, Panel on Enhancing the Data
Infrastructure in Support of Food and Nutrition Programs, Research, and
Decision Making, National Academies Press, Washington, DC.
Villas-Boas, S.B. 2007. "Vertical Relationships Between
Manufacturers and Retailers: Inference With Limited Data." Review
of Economic Studies 74:625-52.
Waldfogel, J. 2003. "Preference Externalities: An Empirical
Study of Who Benefits Whom in Differentiated Product Markets." Rand
Journal of Economics 34:557-68.
Jeffrey Perloff is Professor, Department of Agricultural and
Resource Economics, and member of the Giannini Foundation, University of
California at Berkeley. Mark Denbaly is Deputy Director for Data and Web
Communication, Economic Research Service, U.S. Department of
Agriculture.
The opinions expressed in this paper are those of the authors and
not necessarily the United States Department of Agriculture or any of
its members. We thank Rui Huang for statistical help.
This article was presented in a principal paper session at the AAEA
annual meeting (Portland, OR, July 2007). The articles in these sessions
are not subjected to the journal's standard refereeing process.
Table 1. Comparison of U.S. Census and IRI Demographic Data
Households IRI Census
With individuals <18 years old 35.0% 33.9%
With income <$10,000 5.6% 7.4%
With income >$100,000 1.9% 14.3%
White 86.4% 71.9%
Black 5.3% 10.8%
Asian 1.3% 5.9%
Hispanic 5.7% 17.0%
Size 2.8 2.6
Notes: Average across all the zip code regions in the IRI
data set. IRI data are for 1999 and Census data are from 2000.
COPYRIGHT 2007 American Agricultural Economics
Association Reproduced with permission of the copyright holder. Further reproduction or distribution is prohibited without permission.
Copyright 2007, Gale Group. All rights
reserved. Gale Group is a Thomson Corporation Company.
NOTE: All illustrations and photos have been removed from this article.