The most widely used elicitation formats in conjoint analysis (CA)
applied to environmental valuation have been rating, ranking, and
choice. As economists tend to prefer ordinal measures of preferences
rather than cardinal measures, especially due to the more obvious
interpretation in terms of random utility (Roe, Boyle, and Teisl 1996;
Holmes and Boyle 2001), we focus on ranking and choice experiments. The
differences found between these formats in previous studies are not
particularly surprising because different statistical techniques are
used, and violations of transitivity after the first rank have been
observed both in experiments and in field applications (see Foster and
Mourato (2002) and Bateman et al. (2007)). However, even if data
obtained through a ranking exercise are recoded as a choice experiment
by assuming that the option ranked first would be the option chosen, and
analyzed using statistical techniques employed in choice experiments,
differences in response have been found to persist between the two
formats (see Boyle et al. (2001) and the remaining literature summarized
in table 1). That is, according to previous research, saying that an
option is your preferred option is not the same as saying that you would
choose the option. The explanation provided in Boyle et al. (2001) is
that the cognitive process is different, if we ask a person to state her
(or his) preferred option rather than asking her/him to state the option
that she/he would choose. Holmes and Boyle (2001) state that "This
surprising result suggests that different cognitive processes were used
in seemingly identical tasks (i.e., choose the most preferred profile
from a set)." If we turn the argument upside down, one can argue
that a method that yields different results with such similar tasks
(choose one versus state your preferred option) has a fundamental
problem.
As we feel that, if true, this would have profound implications not
only for economic valuation through CA but also for economic theory, we
decided to investigate whether this difference remains when some of the
shortcomings of existing comparisons are removed. Although the details
are discussed below, in our experiment we essentially ensured that the
choice and the ranking surveys were identical in all relevant features:
same experimental design, same number of alternatives, and same
questionnaire.
The implications of this comparison are also relevant for CA
practitioners. If the differences persist, we should probably recommend
choice experiments because they are closer to real-market decisions
(Adamowicz, Louviere, and Williams 1994). If the differences disappear,
the use of ranking could be recommended because we can obtain the same
results as when using choice, and we may be able to use the information
provided by the subsequent ranks to develop an additional measurement.
Having this idea in mind we decided to use a pairwise comparison,
plus status quo, to test differences between choice and recoded ranking
experiments. We chose this design because it has not been used for
comparing the two formats in previous studies in spite of being the most
common format used for environmental valuation applications (Adamowicz,
Louviere, and Williams 1994; Blarney et al. 2002; among others). In
addition, this format can be seen as a benchmark case to test the
theoretical concerns discussed above (convergent-validity of a choice
and a ranking recoded as a choice when both are analyzed as a choice),
because it has the minimum number of alternatives necessary to provide a
meaningful ranking.
Using a split-sample design we presented a choice experiment to one
half of the sample and a ranking to the other half. Although a ranking
can be performed using a simpler design (Louviere 1988, p. 100), we
chose the experimental design to be identical in both cases. In other
words, we used for both subsamples the experimental design that we would
need to use in a choice, because this design can also be used in a
ranking.
Our results, the opposite of those obtained in Boyle et al. (2001)
and the remaining literature summarized in table 1, show that the choice
experiment and the ranking recoded as a choice provide statistically
similar parameter vectors (same structural models). The same cognitive
process was apparently used for seemingly identical tasks (i.e., choose
the most preferred profile from a set, or state the profile you would
choose from a set). Aggregated and per attribute welfare measures
comparison tests also show statistically identical results in almost all
cases. This holds for parametric tests as well as for bootstrapping
tests. We completed our analysis by trying to detect
"learning" and "fatigue" effects with a subsample
analysis. We also used follow-up questions to study the effects
associated with the information provided, the difficulty of the
valuation task, the number of sets of alternatives, and the response
effort. Only the respondent's reported difficulty with the
valuation task turned out to be relevant.
Literature Review
Mackenzie (1993) made the first comparison of CA formats applied to
environmental valuation. However, he performed a rating experiment and
then simulated choices and rankings from the ratings (see also Anderson
and Bettencourt 1993). In Roe, Boyle, and Teisl (1996) and Stevens,
Barrett, and Willis (1997), ratings were also recoded to rankings and
choices. Thus, the results of these studies are not relevant for our
comparison purposes, because a choice experiment and a ranking recoded
as a choice were not compared. The same holds for studies that analyze
full ranks, because the differences reported can be explained by the
different statistical techniques employed, and by inconsistency in the
second and subsequent ranks (Chapman and Staelin 1982; Hausman and Ruud
1987; Ben-Akiva, Morikawa, and Shiroishi 1991; Foster and Mourato 2002;
Siikamaki and Layton 2007). However, this should not pertain when the
first rank only is analyzed as a choice.
Table 1 summarizes the main features of previous environmental
valuation comparisons and cross-validity tests between independent
samples of choice and rankings recoded to choice formats (i.e., we
analyze only studies that focus on the first rank). Although results
demonstrate differences between these formats, all the studies featured
at least one of the shortcomings discussed below.
The experimental design of choice and recoded ranking experiments
differ in some of the studies in table 1, because rankings can employ
simpler experimental designs (e.g., Mogas and Riera (2001); see table
1). The number of alternatives offered for ranking is in some cases more
than the alternatives provided to choose from (e.g., Morrison and Boyle
(2001); see table 1). In comparison studies where this occurs, it is
hard to discern whether the differences are caused by the process of
stating preferences or by the different experimental designs inducing
different results. Furthermore, when respondents face a high number of
alternatives to rank, they may reduce the precision of their valuation
process, or they may simply assign ranks randomly. This also affects the
first rank. All the studies in table 1 provided four alternatives to
rank.
The inclusion of a status quo alternative in all sets of
alternatives is also relevant. As in the contingent valuation method, a
reference level must exist to obtain adequate welfare measures (Roe,
Boyle, and Teisl 1996). Most of the studies in table 1 did not always
include the status quo in all sets of alternatives. Experimental designs
in Boyle et al. (2001) and Holmes and Boyle (2001) were random in
attributes, implying that the whole status quo appeared only in some of
the sets of alternatives. In Mogas and Riera (2001), the choice
experiment presented two alternatives plus status quo and the ranking
presented four alternatives without status quo.
As table 1 shows, convergent validity was generally not obtained in
parameters and only Morrison and Boyle (2001) found convergence when
exclusively including respondents who stated that the valuation task was
easy. As to welfare measures, results generally pointed out that they
are statistically different (table 1).
Methodology
The CA exercise presented in this article was applied to the
valuation, by public visitors, of a reforestation program with cork oak
trees in Alcornocales Natural Park (ANP). ANP is a protected
Mediterranean forest of 1,677 [km.sup.2] located in the south of Spain
and it is covered by extensive woodlands where the main species is cork
oak. Public visitors value its recreational environmental services
highly (Campos, Caparros, and Oviedo 2007). The ANP forests currently
face aging cork oak trees, due to natural mortality accentuated by
diseases, and lack of natural regeneration due to overgrazing.
Unchecked, this process will eventually result in the gradual
replacement of the cork oak forest with shrublands. The failure of
natural regeneration and private reforestation programs has led the
regional administration to implement a policy providing subsidies to
landowners that reforest their lands. This policy is currently being
applied within the framework of the European Union Common Agricultural
Policy. We decided to investigate whether social preferences, expressed
through willingness to pay (WTP), are in alignment with conserving and
increasing cork oak forest extent in ANE An analysis of policy
implications of the results of these experiments can be found in
Caparros et al. (2007).
Survey Logistics and Experimental Design
The survey provided was a CA exercise where respondents had to
complete either eight choices or eight rankings per questionnaire. In
each case 450 individuals answered the survey. The interviews, made from
June 2002 to May 2003, were face-to-face with ANP public visitors, who
were given an informative booklet with basic information about ANP and
the implications of the different reforestation options.
Previously, two focus groups were used to identify the main
attributes of a reforestation program for the general public, and to
evaluate the extent to which the information presented in the survey was
understood. A preliminary design for the choice/ranking sets was tested
as well. We used the focus group information to create a pretest (1)
whose main objective was to obtain the vector of monetary values to be
offered in the main survey. An open-ended WTP question was used to
obtain a value for a whole reforestation program, followed by six
open-ended WTP questions corresponding to the six different attributes
selected using the focus group (the five used in the final version plus
one attribute not included in the final version, "number of birds
protected"). The pretest was presented to 115 ANP visitors.
Given the information obtained in the focus group and the pretest,
the attributes presented in table 2 were chosen for the analysis. Figure
1 shows an example of a choice and a ranking set.
Given these attributes and their levels, we chose sixteen
treatments from the universe of 1,024 possible combinations ([4.sup.4] x
[2.sup.2]) of attributes, forming a main effects design for attributes.
Then, we placed the sixteen treatments in pairwise combinations in order
to obtain a full set of pairwise comparisons among treatments, yielding
120 choice sets ([sub.16][C.sub.2]). This full set enables us to take
into account all interactions between treatments and is more appropriate
for comparison purposes. Thus, our design considers main effects for
attributes and all effects between treatments.
Statistical Models
We analyzed two data sets: the information provided by the choice
experiment (model C) and the information provided by the contingent
ranking recoded using only the first rank (model RC). (2) In this
manner, we focus on the question of if people respond in different ways
to the one you would choose question and to the your most preferred
question. For the regression analysis we use the nested logit (NL) model
(3) in the main text (reported here), while a supplemental appendix
(Caparros, Oviedo, and Campos 2008) presents the random parameter logit
model (RPL) and additional NL models.
In the NL we set a reforestation (REF) branch for the two
reforestation alternatives and a no reforestation (NREF) branch for the
status quo. The latter is known as a degenerate branch (Louviere,
Hensher, and Swait 2000, pp. 153-54). A detailed explanation of the NL
model can be found in McFadden (1981) and the particular case of the
model with one-degenerate branch is discussed in Hunt (2000).
We assume a linear-in-parameters utility function that originates
from an additively separable linear utility model with a systematic
([V.sub.ij]) and a random component ([[epsilon].sub.ij]) : [U.sub.ij] =
[[summation].sup.K.sub.k=1][[beta]'.sub.k][X.sub.kj] +
[[epsilon].sub.ij] = [V.sub.ij]([X.sub.kj]) + [[epsilon].sub.ij] where
[[beta].sub.k] represents the regression coefficient for the attribute
k; [X.sub.kj] the value of the attribute k for each possible alternative
j in the choice set; and [[epsilon].sub.ij] the random errors. The
notations of the attributes included in the regression (table 2) are
respectively BIO, TEC, REC, EMP, SUR, and BID. We do not include an
alternative specific constant (ASC) for reforestation alternatives in
the NL models in the main text because the ASCs are not significant. The
supplemental appendix (Caparros, Oviedo, and Campos 2008) reports the
results of the models including ASCs. Thus, the vectors [[beta].sub.k]
and [X.sub.kj] in the NL models reported in the article are
[[beta]'.sub.k] = [[beta].sub.BIO], [[beta].sub.TEC],
[[beta].sub.REC], [[beta].sub.EMP], [[beta].sub.SUR], [[beta].sub.BID])
[X'.sub.kj] =([x.sub.BIOj], [x.sub.TECj], [x.sub.RECj],
[x.sub.EMPj], [x.sub.SVRh], [x.sub.BIDj]).
The probability of choosing alternative j in a category r (REF or
NREF) is represented as (Blamey et al. 2002)
(1) [P.sub.jr] = P(j | r) P(r)
= exp[V.sub.ijr / [[alpha].sub.r]] exp [[alpha].sub.r][I.sub.r]] /
exp [[I.sub.r]] [[summation].sup.R.sub.k=1] exp
[[alpha].sub.k][I.sub.k]]
where
[I.sub.r] = log [[jr.summation over i=1] exp ([V.sub.ir] /
[[alpha].sub.r)].
[I.sub.r] represents the inclusive value, which is a measure of the
expected maximum utility from the alternatives associated with the rth
class of alternatives; and [[alpha].sub.r] is the parameter of the
inclusive value [I.sub.r] (Blamey et al. 2002). For the degenerate
branch, the inclusive value parameter is fixed to 1 (Louviere, Hensher,
and Swait 2000, p. 154).
The quantitative attributes (BIO, EMP, SUR, and BID) were coded
introducing their own values and not as categorical variables (BIO was
not coded as categorical because the planted species are not specified,
except for the always present cork oak). The attribute REC was
dummy-coded. The attribute TEC was effect-coded (1 for natural
regeneration, -1 for artificial plantation, and 0 for the status quo) to
differentiate the effect of choosing any of the two possible techniques
from the status quo.
For welfare measures, we calculated a point estimate of the mean
WTP for a marginal increase in the level of an attribute (mWTP) dividing
the [beta] associated to the attribute ([[beta].sub.k]) by the [beta]
associated to the payment-vehicle ([[beta].sub.BID]), with negative
sign. We also generated an empirical distribution of this mWTP for each
attribute through the Krinsky and Robb (1986) bootstraping technique
with 1,000 replacements. In this case, the mean of the empirical
distribution is the mean of the mWTP for increasing the level of the
attribute. Both techniques were also applied for two cases of Hicksian
surplus (HS) (Choi and Moon 1997). We selected the cases of maximum and
minimum possible HS (HSMAX and HSMIN), given the highest and the lowest
levels of the attributes for the reforestation alternatives. HSMAX
considers four species, natural regeneration, two recreational areas,
eighty employees, and 140% of present extent of forest surface. HSMIN
considers one species, artificial plantation, no recreational areas,
twenty employees, and 90% of present extent of forest surface.
[FIGURE 1 OMITTED]
In the case of the point estimate, we used the Wald procedure
(Greene 2007, p. E38-2) for calculating the variance for the mWTP and
for the HS. Invoking Cramer's theorem we constructed the 95%
confidence interval. In the case of the bootstrapping, we obtained the
standard deviation from the empirical distribution and the 95%
confidence interval through the percentile approach (Efron and
Tibshriani 1993).
In the supplemental appendix (Caparros, Oviedo, and Campos 2008),
NL and RPL models including socioeconomic variables are presented
(including an ASC where appropriate). The findings reported here in this
article remain essentially unchanged.
Tests
A Likelihood Ratio test was used to establish whether the parameter
vectors are statistically similar, that is, whether the valuation tasks
derive from the same cognitive process. We followed Swait and
Louviere's (1993) proposal, also applied in Blamey et al. (2002)
and Holmes and Boyle (2001). With this test we are able to distinguish
whether differences between parameter vectors are due to differences in
taste parameters ([[beta].sub.k]) or due to differences in scale
parameters ([lambda]). The scale parameter is generally unknown and set
equal to 1 ([lambda] = 1), but between two separate data sets it is
possible to compute the relative scale parameter.
Swait and Louviere (1993) propose a double stage test to check the
hypothesis [H.sub.1]:([[lambda].sup.C][[beta].sup.C]) =
([[lambda].sup.RC] [[beta].sup.RC]). First, we test
[H.sub.A]:([[beta].sup.C]) = ([[beta].sup.RC]) setting the relative
scale parameter as [[lambda].sup.RC] / [[lambda].sup.C]. If HA is
rejected then [H.sub.1] is also rejected. If HA is not rejected, then we
test [H.sub.B]:([[lambda].sup.C]) = ([[lambda].sup.RC]). If [H.sub.B] is
not rejected, then we cannot reject [H.sub.1].
To complete the comparison of the parameters, we used a simulation
to check if parameters of the RC model can recover the information of
the C model and vice versa. This was performed using the parameters of
the attributes ([[beta].sub.k]) plus an error component, assigned to
each individual i, randomly drawn from the estimated variance of the
error distribution of each attribute k ([[epsilon].sub.ik]). This gives
one parameter for the attribute k for each individual i ([[beta].sub.ik]
= [[beta].sub.k] + [[epsilon].sub.ik]). We then calculated the
percentage of correct predictions of the choice of an alternative
obtained with the C model and compared it with the percentage of correct
predictions obtained with the C model that uses its actual [X.sub.jk]
and the [[beta].sub.ik] and [[epsilon].sub.ik] from the RC model (the
same was carried out for the RC model using the [[beta].sub.ik] and
[[beta].sub.ik] from the C model).
We also tested for the equality of mWTP obtained for each attribute
[H.sub.2]: [(mWT[P.sub.K]).sup.C] = [(mWT[P.sub.K]).sup.RC] and for the
two cases of HS previously mentioned [H.sub.3]: [(HS).sup.C] =
[(HS).sup.RC]. Three tests were carried out: the nonoverlapping
confidence interval test, (4) the t-test and the complete combinatorial
test (Poe, Giraud, and Loomis 2005).
Results
Refusals to answer the face-to-face survey were low and very
similar, representing 6% of total attempts in both exercises. As we
obtained eight observations per respondent and each survey format was
completed by 450 respondents, the number of observations obtained with
each survey was 3,600. After removing invalid responses, we have 3,600
useable observations for the choice experiment (there were no invalid
responses in this experiment) and 3,594 for the contingent ranking.
Comparison
In table 3 we compare the most important socioeconomic
characteristics of the choice and ranking subsamples. We also include
two characteristics related to the attitude of respondents when
answering the questionnaire (attitude and understanding). In all cases
we cannot reject the null hypothesis that the characteristics of choice
and ranking respondents are the same.
As we have used an experimental design where each possible
treatment is compared with the remaining treatments the same number of
times, we can compare the number of times one treatment was chosen or
ranked first without worrying about the treatments it was compared with.
Through a [chi square]-test we examine whether the proportion of
respondents who chose or ranked first a concrete treatment is
statistically different between choice and ranking surveys. The results
show that in fifteen out of the seventeen possible cases we cannot
reject the null hypothesis that the percentage of times a treatment is
chosen/ranked first is statistically similar (at the 5% level). The
supplemental appendix (Caparros, Oviedo, and Campos 2008) reports the
detailed results of these tests. That is, respondents select the same
alternative when they have to choose, and when they have to do the first
ranking.
Table 4 shows the regression results of the C and RC models; as can
be seen, there are no significant differences between them. All
parameters have the expected sign and are significant at the 1% level.
In both cases, BIO has the largest value of the part-worth utility (beta
parameter), followed by TEC in the C model and by REC in the RC model.
The Likelihood Ratio test (table 4) is consistent with the
hypothesis that C and RC models derive from the same cognitive process.
Both [H.sub.A] and [H.sub.B] are not rejected and, consequently,
[H.sub.1] is not rejected. The structural models are the same for the
one you would choose and for your most preferred question. This is the
most important result of our analysis, because no previous comparison
has found this seemingly obvious result.
The percentage of correct predictions of C and RC models are high
in both cases (67% and 66%, respectively). When using parameters of the
RC model to predict the choices in the C model, we obtain a similar
percentage of correct predictions (67 %). Similarly, we obtain the same
percentage of correct predictions when we simulate the first rank of the
RC model using the parameters of the C model (66%). Thus, the predictive
power of one model is recovered with the parameters of the alternative
one.
Table 5 shows the mean and confidence intervals (95%) of parametric
and bootstrapped estimates of mWTP and HS from C and RC models and the
p-values of the equality tests. For parametric and bootstrapping
estimations, confidence intervals overlap in all cases (at the 5% level)
and only EMP and HSMIN diverge in the t-test. The complete combinatorial
test shows that EMP and HSMIN are statistically different again and also
BIO and HSMAX, but the latter ones only at the 10% level. (5)
If we look at the efficiency of estimations, the C model offers the
smallest relative errors in most cases, providing more efficient
estimations of welfare measures. This, together with the fact that the C
model yields lower estimations, (6) can be seen as an argument in favor
of using choice experiments instead of ranking. Nevertheless, this
argument is rather weak because the differences are not significant.
If we compare the efficiency of welfare measure estimations within
each model rather than between models, the mWTP associated with the
attribute REC offers the largest relative errors in all cases. This
could be due to disagreement among respondents about the appropriateness
of increasing recreational areas in ANE On the other hand, both the BIO
attribute and HSMAX have the smallest relative errors in both models. In
this sense, the C and RC models also converge in the efficiency of
welfare measures.
Testing Effects
To detect the existence of effects influencing results that could
lead to differences between a choice and a recoded ranking, we compare
subsamples that isolate respondents possibly affected by those effects.
These results are only discussed qualitatively; statistical details can
be found in the supplemental appendix (Caparros, Oviedo, and Campos
2008).
The first analysis uses subsamples formed with the first four sets
of alternatives answered by each respondent (out of the eight presented
in each questionnaire) checking for a "learning" effect. The
opposite, a "fatigue" effect, is tested using the last four
alternatives answered by each respondent. The models of these
sub-samples do not add anything new to the findings of the base models,
suggesting that these effects are not present. The Likelihood Ratio
tests show that there are no significant differences as it is the case
with the comparison between welfare measures. Hanley, Wright, and Koop
(2002) also find no evidence of "learning" and
"fatigue" in choice experiments.
On the other hand, the surveys included four follow-up statements,
made after the valuation exercise, which enabled us to check for the
presence of four potential effects. The respondents were asked to rate
the following statements from 1 (totally disagree) to 5 (totally agree):
(a) "I correctly understood the information provided in the
previous choices/rankings;" (b) "I had difficulties in stating
my answers in the previous choices/rankings;" (c) "The number
of choices/rankings that I faced has been excessive;" and (d)
"I thought more about my answers of the first four choices/rankings
than about the last four choices/rankings." The effects tested will
be called, respectively, "information,"
"difficulty," "sets of alternatives," and
"response effort" effects. The comparison of the scores to the
follow-ups shows that we cannot reject the hypothesis of statistically
similar scores (Caparros, Oviedo, and Campos 2008).
Using the scores, we created subsamples for the choice and the
recoded ranking data corresponding to each follow-up. Then, we compared
the subsamples made from each follow-up to test if the results of the
comparison were different from the results of the comparison made with
the full samples.
The regressions made with the subsamples corresponding to each
follow-up show that all attributes are significant at the 1% level, as
in the base models. The sole exception is that in the recoded ranking
model that tries to capture the "information" effect, the
attribute REC is significant at the 10% and not at the 1% level
(Caparros, Oviedo, and Campos 2008).
On the other hand, the Likelihood Ratio test for the models
hypothetically affected by the "difficulty" effect states that
the scale parameter is statistically different, because [H.sub.B] is
rejected (Caparros, Oviedo, and Campos 2008). The value of the relative
scale parameter between these models
([[lambda].sup.RC]/[[lambda].sup.C]) is 0.835, implying that the recoded
ranking subsample has a lower scale parameter and consequently a higher
error variance. Thus, the first ranking implies a more difficult
cognitive process than the choice for those who found the task
difficult. The comparison between welfare measures in these models shows
that there is no evidence of significant differences because the scale
parameter (the factor causing significant differences in the Likelihood
Ratio test) is cancelled in the calculations for welfare measures
(Blamey et al. 2002, pp. 174-75). For the remaining models made from the
follow-ups, the results of the comparison between parameters and between
welfare measures are similar to those obtained in the comparison of the
full samples.
Conclusions
Although previous literature has shown that a ranking exercise
recoded and analyzed as a choice using only the preferred option is
different from a choice task, our results provide the first case that,
when differences are eliminated from the design of the experiment,
suggests that there is no difference in the cognitive process. We also
found that response rates and follow-up analysis did not show
significant differences between formats, except in the case of the
subsample of respondents that found the task difficult. For these
subsamples the difference does not reside in the taste parameters but in
the scale parameters, and the relative scale parameter shows that
ranking has a higher error variance. None of the other effects studied
seem to have any significant impact on the estimations.
Concerning welfare measures, results also show that they are not
statistically distinguishable (per attribute as well as for aggregated
welfare) in most cases. This conclusion holds both for parametric and
bootstrapping tests. However, most of the estimations are more efficient
and lower in the choice experiment. This could be used as an argument in
favor of this format. Nevertheless, this argument is rather weak because
the differences from a recoded ranking are not statistically
significant.
Overall, our results suggest that doing a ranking experiment, but
designing the survey as if it would be a choice, may be a safe practice
even if the researcher wants to focus only on the first rank/choice and
analyze it using choice-based methods. The question of whether it is
convenient to use the subsequent ranks in the analysis has been studied
extensively elsewhere (Foster and Mourato 2002; Bateman et al. 2007) and
goes beyond the scope of this article. Nevertheless, the most important
take home message is that people appear rational enough, and appear to
take the task seriously enough, to ensure that they choose their
preferred option.
[Received October 2006; accepted December 2007.]
References
Adamowicz, W., J. Louviere, and M. Williams. 1994. "Combining
Revealed and Stated Preference Methods for Valuing Environmental
Amenities." Journal of Environmental Economics and Management
26(3):271-92.
Anderson, J.L., and S.U. Bettencourt. 1993. "A Conjoint
Approach to Model Product Preferences: The New England Market for Fresh
and Frozen Salmon." Marine Resource Economics 8(1):31-49.
Bateman, I., B. Day, G. Loomes, and R. Sugden. 2007. "Can
Ranking Techniques Elicit Robust Values?" Journal of Risk and
Uncertainty 34(1):49-66.
Beggs, S., S. Cardell, and J. Hausman. 1981. "Assessing the
Potential Demand for Electric Cars." Journal of Econometrics
17(1):1-19.
Ben-Akiva, M., T. Morikawa, and F. Shiroishi. 1991. "Analysis
of the Reliability of Preference Rank Data." Journal of Business
Research 23(3):253-68.
Blamey, R.K., J.W. Bennet, J.J. Louviere, M.D. Morrison, and J.C.
Rolfe. 2002. "Attribute Causality in Environmental Choice
Modelling." Environmental and Resource Economics 23(2):167-86.
Boyle, K.J., T.P. Holmes, M.F. Teisl, and B. Roe. 2001. "A
Comparison of Conjoint Analysis Response Formats." American Journal
of Agricultural Economics 83(2):441-54.
Campos, P., A. Caparros, and J.L. Oviedo. 2007. "Comparing
Payment-Vehicle Effects in Contingent Valuation Studies for Recreational
Use in Two Spanish Protected Forests." Journal of Leisure Research
39(1):60-85.
Caparros, A., J.L. Oviedo, and P. Campos. 2008. "AJAE
Appendix: Would You Choose Your Preferred Option? Comparing Choice and
Recoded Ranking Experiments." Unpublished manuscript. Available at
http:// agecon.lib.umn.edu/.
Caparros, A., E. Cerda, P. Ovando, and P. Campos. 2007.
"Carbon Sequestration with Reforestations and Biodiversity-Scenic
Values." FEEM Working Paper 28. 2007, Milan.
Chapman, R.G., and R. Staelin. 1982. "Exploring Rank Ordered
Choice Set Data Within the Stochastic Utility Model." Journal of
Marketing Research 19(3):288-301.
Choi, K., and C. Moon. 1997. "Generalized Extreme Value Model
and Additively Separable Generator Function." Journal of
Econometrics 76(1-2):129-40.
Efron, B., and R.J. Tibshirani. 1993. An Introduction to the
Bootstrap. New York: Chapman & Hall.
Foster, V., and S. Mourato. 2002. "Testing for Consistency in
Contingent Ranking Experiments." Journal of Environmental Economics
and Management 44(2):309-28.
Greene, W. 2007. Limdep Version 9.0. Econometric Modeling Guide
Volume 2. New York: Econometric Software.
Hanley, N., R. Wright, and G. Koop. 2002. "Modelling
Recreation Demand Using Choice Experiments: Climbing in Scotland."
Environmental and Resource Economics 22(3):449-66.
Hausman, J.A., and P.A. Ruud. 1987. "Specifying and Testing
Econometric Models for Rank-Ordered Data." Journal of Econometrics
34(1-2):83-104.
Herriges, J., and C. Kling. 1996. "Testing the Consistency of
Nested Logit Model with Utility Maximization." Economic Letters
50(1):33-39.
Holmes, T.P., and K.J. Boyle. 2001. "Cross Validation of
Conjoint Ranking and Choice Data: An Application to Timber Harvesting
Preferences." Paper presented at EAERE 11th Annual Conference,
Southamptom UK. 28-30 June.
Hunt, G.L. 2000. "Alternative Nested Logit Model Structures
and the Special Case of Partial Degeneracy." Journal of Regional
Science 40(1):89-113.
Krinsky, I., and A.L. Robb. 1986. "On Approximating the
Statistical Properties of Elasticities." Review of Economics and
Statistics 68(4):715-19.
Louviere, J.J. 1988. "Conjoint Analysis Modeling of Stated
Preferences. A Review of Theory, Methods, Recent Developments and
External Validity." Journal of Transport Economics and Policy
22:93-119.
Louviere, J.J., D.A. Hensher, and J.D. Swait. 2000. Stated Choice
Methods. Analysis and Application. Cambridge: Cambridge University
Press.
Mackenzie, J. 1993. "A Comparison of Contingent Preference
Models." American Journal of Agricultural Economics 75(3):593-603.
McFadden, D. 1981. "Econometric Models of Probabilistic
Choice." In C. Manski and D. McFadden, eds. Structural Analysis of
Discrete Data with Econometric Applications. Cambridge, MA.: MIT Press,
pp. 198-272.
Mogas, J., and P. Riera. 2001. "Comparacion de la Ordenacion
Contingente y del Experimento de Eleccion en la Valoracion de las
Funciones no Privadas de los Bosques." Economia Agraria y Recursos
Naturales 1(2):125-47.
Morrison, M.D., and K.J. Boyle. 2001. "Comparative Reliability
of Rank and Choice Data in Stated Preference Models." Paper
presented at EAERE 11th Annual Conference, Southampton UK, 28-30 June.
Poe, G.L., K.L. Giraud, and J.B. Loomis. 2005. "Computational
Methods for Measuring the Difference of Empirical Distributions."
American Journal of Agricultural Economics 87(2):353-65.
Roe, B., K.J. Boyle, and M.F. Teisl. 1996. "Using Conjoint
Analysis to Derive Estimates of Compensating Variation." Journal of
Environmental Economics and Management 31(2):145-59.
Siikamaki, J., and D.F. Layton. 2007. "Discrete Choice Survey
Experiments: A Comparison Using Flexible Methods." Journal of
Environ mental Economics and Management 53(1):122-39.
Stevens, T.H., C. Barret, and C.E. Willis. 1997. "Conjoint
Analysis of Groundwater Protection Programs." Agricultural and
Resources Economics Review 26(2):229-36.
Swait, J., and J.J. Louviere. 1993. "The Role of the Scale
Parameter in the Estimation and Comparison of Multinomial Logit
Models." Journal of Marketing Research 30(3):305-14.
(1) In addition to the focus group and the pretest, interviews with
experts from the INIA (National Institute of Alimentary and Agrarian
Technology Research) and with the ANP Director were held.
(2) From now on, we use C and RC to refer to the variables and
measures corresponding to the choice and to the recoded ranking model,
respectively.
(3) Independence of Irrelevant Alternatives (IIA) assumption was
violated when using the conditional logit.
(4) Given a level of significance ([alpha]), we report p-values
corresponding to the lowest [alpha]% of significance level, at which the
(1 - [alpha])% confidence intervals do not overlap.
(5) We also estimated a rank-ordered logit using the
respondents' full ranking following the method proposed in Beggs,
Cardell, and Hausman (1981). The welfare measures of this model are
statistically different from the ones of the C model.
(6) Lower WTP values tend to be preferred in applications, since
conservative estimates are usually preferred.
Alejandro Caparros, Jose L. Oviedo, and Pablo Campos are associate
research professor (Investigador Cientifico), postdoctoral researcher,
and full research professor (Profesor de Investigacion), respectively,
in the Institute for Public Goods and Policies (IPP), Spanish Council
for Scientific Research (CSIC).
Alejandro Caparros and Jose L. Oviedo share the first authorship of
the article. We thank three anonymous referees and especially Stephen
Swallow (journal co-editor) and Lynn Huntsinger for their helpful
comments and suggestions. We would also like to thank participants at
the following conferences: WCERE 2006 (Japan), TIES 2006 (Sweden), AERNA
2006 (Spain), and IX EEA (Spain). The usual disclaimer applies. We
gratefully acknowledge funding provided by the European Commission
(project MEDMONT-QLRT-1999-31031), the Consejeria de Medio Ambiente
(Junta de Andalucia), and the National Institute of Alimentary and
Agrarian Technology Research (INIA).
Table 1. Previous Comparisons Between Independent Samples of Choice
and Recoded Rankings Applied to Environmental Valuation
Experimental
Authors Comparison Design
Boyle et al. Rating, ranking, Random (not all
(2001) choice and included status
recoded Ranking quo)
Mogas and Riera Ranking, choice and Different (status
(2001) recoded ranking quo not included
in ranking)
Holmes and Boyle Ranking, choice and Random
(2001) recoded ranking (not all included
status quo)
Morrison and Ranking, choice and Not available
Boyle (2001) recoded ranking
Authors Alternatives Sample
Boyle et al. Four for all Rating: 287
(2001) Ranking: 214
Choice: 278
Mogas and Riera Three for choice and Ranking: 626 (a)
(2001) four for ranking Choice: 1140
Holmes and Boyle Four for all Ranking: 212
(2001) Choice: 278
Morrison and Three for choice and Ranking: 268 (b)
Boyle (2001) four for ranking Choice: 297
Results (Choice Versus Recoded Ranking)
Welfare
Authors Parameters Measures
Boyle et al. Statistically No comparison test
(2001) significant
differences (scale
parameter not
considered)
Mogas and Riera No comparison test Statistically
(2001) significant
differences
Holmes and Boyle Statistically No comparison test
(2001) significant
differences
Morrison and Statistically Statistically
Boyle (2001) significant significant
differences differences
(a) The total number of observations for the ranking exercise was 626
and for the choice experiment 4,576.
(b) The total number of observations for the ranking exercise was
1,905 and for the choice experiment 2,068.
(c) In this case, a sub-sample analysis found convergent validity
parameters when including exclusively respondents who stated that the
valuation task was easy.
Table 2. Attributes of the Experiment and Levels
Attributes Levels
Biodiversitya (BIO) 1; 2-,3; 4
Technique used (TEC) Natural regeneration;
artificial plantation
Number of new 0; 2
recreational areas
(REC)
Additional employees 20; 40; 60; 80
(equivalent
permanent employees
(EMP)
Forest surface area 90% of present extent
conserved (SUR) (10% reduction);
100% of present extent
(same surface);
120% of present extent
(20% increase);
140% of present extent
(40% increase);
Increase in taxes 6 [euro]; 12 [euro]; 24 [euro]; 48 [euro]
for this year (BID)
Note: the status quo levels were: no trees, no technique, no
additional recreational areas, no employees, 80% of the current
forest surface conserved (20% reduction) and no additional taxes.
(a) Number of native tree species used, always including cork oaks.
Table 3. Socioeconomic and Attitudinal
Characteristics of the Subsamples
Choice Ranking
Sample Sample
Variables Mean N Mean
Age 33 449 34
(9) (10)
Family income ([euro] per month) 1,676 434 1,615
(745) (790)
Trip cost per person ([euro] per day) 19 450 19
(20) (22)
Gender (1 = female; 0 = male) 0.32 450 0.36
(0.47) (0.48)
Education (1 = college degree; 0.45 450 0.39
0 = otherwise) (0.3) (0.2)
Cadiz (1 = respondent from Cadiz 0.78 450 0.78
province; 0 = otherwise) (a) (0.42) (0.42)
Reasons for the visit (1 = active 0.31 450 0.31
tourism; 0 = otherwise) (0.46) (0.46)
Substitutive (1 = respondent knows 0.56 450 0.58
substitute for the visited area; (0.50) (0.49)
0 = otherwise)
Attitude (1 = poor; 0 = good) (b) 0.11 446 0.10
(0.31) (0.31)
Understanding (1 = poor; 0 = good) 0.04 450 0.05
(0.20) (0.23)
Variables N t-statistic (a)
Age 446 -0.085
Family income ([euro] per month) 427 0.055
Trip cost per person ([euro] per day) 450 -0.024
Gender (1 = female; 0 = male) 450 -0.053
Education (1 = college degree; 450 0.089
0 = otherwise)
Cadiz (1 = respondent from Cadiz 450 0.000
province; 0 = otherwise) (a)
Reasons for the visit (1 = active 450 -0.003
tourism; 0 = otherwise)
Substitutive (1 = respondent knows 450 -0.029
substitute for the visited area;
0 = otherwise)
Attitude (1 = poor; 0 = good) (b) 444 0.004
Understanding (1 = poor; 0 = good) 447 -0.038
Standard errors are shown in brackets.
N is the number of observations.
(a) t-statistic at the 5 % level = 1.965.
(b) Information provided by the interviewers.
Table 4. Choice and Recoded Ranking Nested Logit Models
Attribute Choice
Parameters Model
BIO 0.4543 ***
(0.0281)
TEC 0.4371 ***
(0.0401)
REC 0.3909 ***
(0.0677)
EMP 0.0155 ***
(0.0014)
SUR 0.0224 ***
(0.0017)
BID -0.0249 ***
(0.0028)
IV [([alpha]. 1.4385 ***
sub.REF])(a) (0.0752)
N 3,600
LogL ([beta]) -2,616.876
LogL (0) -4,906.096
[[rho].sup.2] 0.467
Likelihood
Ratio tests (b) [H.sub.A:]
[[beta].sup.C] =
[[beta].sup.RC]
[chi square] 8.428
(C vs. RC)
Attribute Recoded
Parameters Ranking Model
BIO 0.4197 ***
(0.0255)
TEC 0.3070 ***
(0.0345)
REC 0.4071 ***
(0.0624)
EMP 0.0167 ***
(0.0013)
SUR 0.0192 ***
(0.0016)
BID -0.0184 ***
(0.0023)
IV [([alpha]. 1.3050 ***
sub.REF])(a) (0.0644)
N 3,594
LogL ([beta]) -2,656.531
LogL (0) -4,891.350
[[rho].sup.2] 0.457
Likelihood
Ratio tests (b) [H.sub.B]: Reject [H.sub.1]:
[[lambda].sup.C] = [beta]
[[lambda].sup.RC] [[lambda].sup.C] =
[[lambda].sup.RC]?
[chi square] 0.746 Non
(C vs. RC)
C is the choice model, RC is the recorded ranking model. Standard
errors are shown in brackets. N is the number of observations.
IV ([alpha]REF) is the inclusive value parameter of the REF
branch. Asterisks denote significance at the 1% level.
(a) Although IV([[alpha].sub.REF]) > 1, the Herriges and Kling (1996)
condition for local utility maximization is fulfilled.
(b) For the hypothesis [H.sub.A], the [chi square] statistic for 8
degrees of freedom at the 5% level is 15.507. For the
hypothesis [H.sub.B], the [chi square] statistic for 1 degree
of freedom at the 5% level is 3.841.
Table 5. Welfare Measures from Choice and Recoded Ranking Nested
Logit Models
Parametric
C RC
Attributes Mean Mean
BIO 18.21 *** 22.82 ***
[14.21, 22.21] [17.20, 28.44]
TEC 17.52 *** 16.69 ***
[13.55, 21.50] [11.90, 21.49]
REC 15.67 *** 22.14 ***
[9.51, 21.83] [14.04, 30.24]
EMP 0.62 *** 0.91 ***
[0.46, 0.78] [0.67,1.14]
SUR 0.90 *** 1.05 ***
[0.69,1.10] [0.77,1.33]
HSMIN 22.09*** 34.83 ***
[16.18, 28.031 [25.57, 43.871
HSMAX 209.63 *** 265.91 ***
[169.87, 249.40] [206.97, 323.85]
Nonoverlapping t-test
Attributes p-value p-valise
BIO 0.350 0.191
TEC 0.853 0.794
REC 0.374 0.212
EMP 0.153 0.044 **
SUR 0.555 0.400
HSMIN 0.101 0.023 **
HSMAX 0.265 0.122
Bootstrapping
C RC
Attributes Mean Mean
BIO 18.50 *** 23.31 ***
[14.88, 22.29] [18.35, 30.18]
TEC 17.79 *** 17.05 ***
[14.2(1, 22.51] [12.81, 22.88]
REC 15.74 *** 22.37 ***
[9.87, 22.41] [15.01, 31.84]
EMP 0.63 *** 0.92 ***
[0.49, 0.81] [0.72, 1.20]
SUR 0.91 *** 1.07 ***
[0.72, 1.17] [0.82,1.45]
HSMIN 22.44 *** 35.45 ***
[17.26, 29.73] [27.31, 47.021
HSMAX 212.72 *** 270.84 ***
[177.95, 264.96] [219.84, 351.60]
Bootstrapping
Complete
Nonoverlapping t-test Combinatorial
Attributes p-value p-value p-value
BIO 0.340 0.218 0.090 *
TEC 0.866 0.826 0.399
REC 0.381 0.235 0.105
EMP 0.163 0.067 * 0.021 **
SUR 0.555 0.424 0.201
HSMIN 0.094 * 0.036 ** 0.010 ***
HSMAX 0.271 0.161 0.059 *
Note: Table reports Parametric and Bootstrapping Measures and
corresponding tests of the equality of means. C is the choice model.
RC is the ranking recorded as choice model. Lower and upper bounds of
the confidence interval (95%) are shown in brackets; asterisks
(e.g., * single asterisk, ** double asterisks, and *** triple
asterisks) denote significance at the 10%, 5%, and 1% level,
respectively.
COPYRIGHT 2008 American Agricultural Economics
Association Reproduced with permission of the copyright holder. Further reproduction or distribution is prohibited without permission.
Copyright 2008 Gale, Cengage Learning. All rights
reserved. Gale Group is a Thomson Corporation Company.
NOTE: All illustrations and photos have been removed from this article.