Abstract
Evaluation systematically assesses performance of policies in
meeting goals. The primary purpose of evaluation is to provide
information to help improve programs. There are many approaches to
evaluation, each with its own strengths and weaknesses, and many types
of common errors in evaluation methodology. Keys to successful
evaluation include: using comprehensive evaluation criteria, using a
large enough sample size, evaluating all facets of the program over an
extended time horizon, comparing the program to feasible alternatives,
clearly identifying causality, incorporating the views of stakeholders,
and clearly stating methodological limitations.
Resume
L'evaluation permet d'examiner systematique-ment le
rendement des politiques relativement a l'atteinte des objectifs.
Le principal objet de l'evaluation est de fournir de
l'information qui permette d'ameliorer les programmes. Il
existe de nombreuses approches a l'evaluation, chacune possedant
ses propres forces et faiblesses et de nombreux types d'erreurs
communes en matiere de methodologie devaluation. Les cles d'une
evaluation reussie sont, notamment: l'utilisation de criteres
d'evaluation complets, l'utilisation d'un echantillonnage
suffisamment grand, l'evaluation de tous les aspects du programme
sur une longue periode de temps, la comparaison du programme avec les
alternatives realisables, l'identification precise de la causalite,
l'integration du point de vue des intervenants et l'enonce
precis des limites methodologiques.
Key Words
Policy evaluation, resource and environmental management
Introduction
Resource and environmental planning can be defined as a decision
making process involving six steps (Figure 1). Evaluation plays a key
role in two of these steps: evaluating options and evaluating outcomes.
Given its importance in planning and policy-making, it is not surprising
that evaluation is institutionalized as a legal requirement in many
jurisdictions and has become a distinct field with its own body of
theory and methodology.
The purpose of this volume of Environments is to explore recent
developments in evaluation applied to resource and environmental
planning through a series of case studies. In this introductory article
we provide an overview of the field of evaluation and discuss some of
the challenges in evaluating resource and environmental planning. We
then review the contributions of the case studies to evaluation and
identify keys to successful evaluation.
Evaluation Theory
In her landmark text on evaluation, Carol Weiss (1998: 4) provides
a definition of evaluation that is commonly accepted in the evaluation
literature. According to Weiss, evaluation is "the systematic
assessment of the operation and/or the outcomes of a program or policy,
compared to a set of explicit or implicit standards, as a means of
contributing to the improvement of the program or policy."
Weiss emphasizes the following four key attributes of evaluation in
her definition.
* Evaluation is systematic, meaning it utilizes methods of analysis
that meet rigorous scientific standards and produces results that can be
replicated by other analysts.
* Evaluation focuses on the operation of a program and/or program
outcomes.
* Evaluation assesses the degree to which the process and/or
outcomes meet plan goals.
* The purpose of evaluation is to inform managers on the strengths
and weaknesses of the program and identify ways in which the program can
be improved and/or whether the program should be continued.
A wide variety of alternative approaches to evaluation exist (Table
1).
Internal versus External
The first decision in evaluation is whether the evaluation should
be conducted internally within the organization that is managing the
program or externally by an independent evaluator. Internal and external
evaluations have their strengths and weaknesses (Clark and Dawson 1999,
Weiss 1998). Internal evaluators may have a bias in favor of the program
that they are managing, which reduces their likelihood of producing an
accurate evaluation. This bias may be strategic, in that internal
evaluators want the program to appear successful, or may simply arise
because their thinking has been affected by close involvement with the
development and implementation of the program. An advantage of using
external evaluators is that they are less likely to have a bias in favor
of the program. However, external evaluations also have potential
weaknesses. External evaluators are less likely to understand the
program and program goals. Results from external evaluations may also be
less likely to be used by the program managers to improve the program
because the external evaluators may not be trusted or believed, and in
any event are not retained within the organization to assist in the
implementation of recommendations. Some researchers suggest a way of
achieving the combined benefits of external and internal evaluations is
to use a hybrid approach, in which external and internal evaluators work
together to assess the program (Suvedi and Morford 2003).
[FIGURE 1 OMITTED]
Purpose
A second decision is determining the purpose of evaluation. Four
purposes are identified in the evaluation literature (Weiss 1998, Rossi
et al. 2004). The most common purpose of evaluation is to assist in the
improvement of a program by identifying strengths and weaknesses. This
type of evaluation is referred to as formative evaluation. A second
purpose is to determine whether the program is justified or not. This
type of evaluation, normally referred to a summative evaluation, is
often undertaken to fulfill accountability requirements for public
spending. A third purpose is pursuit of general knowledge. This type of
evaluation, referred to as theoretical evaluation, is normally
undertaken by academics and is not linked directly or indirectly to
program managers. A fourth purpose of evaluation may be to fulfill a
hidden agenda such as justifying a decision that has already been made
to eliminate a program or initiate a project. In these cases, the
evaluation, which can be referred to as an ulterior evaluation, is
managed to achieve a predetermined outcome.
Timing
A third decision in evaluation is timing. Evaluations can be done
prior to policy or program development to assess options, during program
operation, and/or after program completion. Evaluations can provide
single snapshots of program performance or be ongoing over a longer time
frame to analyze trends and foster continuous improvement. A
comprehensive evaluation process should include evaluations at all key
stages of a program's life (Weiss 1998). Another useful time-frame
for evaluation is to compare the evaluation results after a program has
been implemented (ex-post evaluation) with the forecast results used to
select a program (ex-ante evaluation). Ex-post/ex-ante comparisons are
especially useful for assessing accuracy of initial evaluation
assessments and indicating how ex-ante evaluations can be improved.
While ex-post/ex-ante evaluations are rarely undertaken, the limited
studies that have been completed show a systematic optimistic bias in
ex-ante evaluations that contributes to poor program selection and
design (Gunton 2003).
What to Evaluate
The fourth decision in evaluation is deciding what component of the
policy or program to evaluate. Evaluations can focus on one or more of
five program components: program theory, which is the underlying logic
of the program including the causal model of the problem it is intended
to address, program design, program implementation, program outcomes,
and program efficiency, which measures the outcomes per unit of
resources required to operate the program. A comprehensive evaluation
normally should include all five components in the evaluation (Weiss
1998).
Methodology
The fifth decision in evaluation is selecting a methodology. First,
the evaluator must choose between quantitative and qualitative analysis.
Quantitative analysis describes program performance in numerical terms
that can be statistically analyzed. Qualitative analysis evaluates
programs by observation, interviews, and document analysis and assesses
performance in verbal as opposed to numerical terms. Within quantitative
analysis there are also options. The preferred technique is randomized
experiments. Randomized experiments attempt to identify program impacts
by comparing groups or areas where the program is applied to a control
group where all relevant variables other than those related to the
program are held constant. Randomized experiments attempt to overcome
one of the primary challenges in evaluation: distinguishing outcomes due
to the program from outcomes due to other factors. The use of randomized
experiments is constrained in resource and environmental management due
to the complexity and diversity of the systems being evaluated and
political and ethical considerations. There are, however, a range of
alternatives available between pure experimental methods and pure
qualitative methods (Clark and Dawson 1999).
Evaluation Criteria
The final question in evaluation is determining the criteria to use
to assess program performance. A common approach for developing
evaluative criteria is to use the explicit goals and objectives for the
program to assess program performance. The major problem with this
approach is that goals and objectives for environmental programs are
often too vague or incomplete to provide a clear standard for assessing
performance (Bellamy et al. 1999, Gunton and Joseph 2006). Even if the
goals and objectives are clear and comprehensive, assessing a program
relative to its goals and objectives assumes that the goals and
objectives adequately reflect the public interest. Programs may have
important unintended consequences relevant to the public interest that
may not be expressed in a stated goal of the program. Using only stated
goals and ignoring unintended consequences would result in a deficient
evaluation. Using explicit goals and objectives also does not indicate
whether the program is the most effective or efficient way of achieving
the objectives because the program is not being assessed relative to
options.
A second approach is to compare performance of the program being
evaluated to similar programs in other jurisdictions by completing a
cross-sectional analysis For example, an increasingly common way of
evaluating environmental performance of jurisdictions is to compare
environmental indicators such as greenhouse gas emissions per capita by
jurisdiction (Gunton et al. 2005, Esty et al. 2006). The assumption is
that the relative performance of a jurisdiction measures the
effectiveness of the jurisdiction's plans and programs. For
example, lower per capita greenhouse gas emissions relative to other
jurisdictions indicate the effectiveness of greenhouse gas emission
control strategies. The advantage of this approach is that the data for
comparison are more readily available than other program evaluation data
such as impacts of programs on the state of the environment. However,
this approach does not indicate whether the performance is good or
bad--all jurisdictions may be poor performers--and does not normally
distinguish between outcomes due to the program versus other factors
such as geography and climate.
A third common approach is to construct time series for outcome
indicators to determine if trends are improving or deteriorating. This
approach is used in most environmental monitoring systems, such as the
Canadian government's National Indicator Initiative. The assumption
is that if trends are improving, the existing program is effective. The
problem with this approach is that it does not distinguish between
changes due to the program and those due to other factors (Gunton and
Joseph 2006). Further, trend line analysis does not indicate whether the
performance is good or bad in absolute terms, just whether it is
changing.
Another evaluative criterion is best practice standards based on
theory and/or the performance of other jurisdictions. Best practice
standards are commonly used in process evaluations to assess program
management and planning. For example, best practice analysis has been
successfully used to assess the quality of environmental sustainability
planning in various countries by the OECD, the United Nations and
non-governmental researchers (Gunton and Joseph 2006, IISD 2004). The
underlying assumption in these evaluations is that better processes lead
to better outcomes. Although this approach is useful in identifying
strengths and weaknesses of planning systems, it relies on best practice
criteria that are often not empirically verified.
A final criterion for evaluation is a comprehensive benefit-cost
analysis that assesses program benefits relative to costs to determine
if the program is in the public interest. Although benefit-cost
addresses many of the problems with other methods of evaluation, it also
suffers from major weaknesses. The most significant challenge in
benefit-cost is monetizing intangibles such as pollution and ecological
values that are central to resource and environmental planning.
Benefit-cost also requires identifying impacts attributed to the
program. As discussed above, distinguishing between impacts due to the
program and those due to other factors is difficult. Nonetheless,
benefit-cost is a legally required component of environmental evaluation
in many jurisdictions such as the United States. Cost-effectiveness
analysis, which measures outputs per unit of input, is less demanding in
terms of monetizing intangibles, but is difficult to apply to resource
and environmental plans where outputs are difficult to quantify.
Common Evaluation Errors
Evaluative Criteria Errors
A common problem in evaluation is using inappropriate evaluative
criteria. A recent evaluation of the effectiveness of environmental
regulations in the U.S. found that the criteria normally used to assess
performance such as number of permits issued, enforcement actions, and
inspections do not indicate the effectiveness of the programs in meeting
environmental objectives (NAPA2001). Evaluations of the success of land
use planning in British Columbia have used the implementation of
recommendations to increase protected areas as a measure of success (Day
et al. 2003). While this is important, the underlying objectives of
increasing protected areas, such as reduction in endangered species,
also need to be assessed. Evaluations of alternative dispute resolution
processes sometimes assess effectiveness by using the single criterion
of whether an agreement was reached, which ignores the relative quality
of the agreement, and excludes other important benefits such as improved
stakeholder relations even when an agreement was not reached (Gunton and
Day 2003). These examples show that care must be taken to develop
evaluative criteria that are comprehensive and reflect the underlying
objectives of the plan. Otherwise the use of inappropriate evaluative
criteria will lead to unjustified conclusions regarding program
performance.
Causation Assumption Errors
Another common error in evaluation is assuming that a correlation
between plan implementation and outcomes is causally linked. For
example, evaluation of regional land use planning in British Columbia is
based on monitoring time series trends for key environmental indicators
(Joseph et al. 2007). The assumption is that the trends accurately
assess the impact of the plan. The problem is that there are many
confounding factors that affect environmental trends such as weather
patterns, natural cycles, and human activity that make it difficult if
not impossible to identify impacts of the plan. Also, the impacts of the
plan occur over a long time horizon and may not be detected until many
years later. The challenge is to compare what would have happened in the
absence of the plan with what happened with the plan, holding all other
variables constant over a long enough time horizon to assess impacts
resulting from the plan. Another example of causation assumption error
is a recent evaluation of the effectiveness of mediation processes that
concluded that mediators have little positive impact (Leach et al.
2002). The study compared cases with mediators to cases without
mediators and assumed that all other factors were constant. One apparent
problem is that the cases that opted for independent mediators may have
been more challenging cases. Therefore, differences in outcomes may have
been due to factors other than the presence of a mediator.
Selection Bias Error
Many evaluations suffer from using a biased sample of cases for
evaluation. In social policy, participants in a program may be selected
based on attributes that increase the likelihood of success, instead of
being randomly selected. A positive impact on recipients relative to
non-recipients may be due to these other attributes, not to the program.
In planning, evaluation of performance of dispute resolution techniques
such as consensus-based negotiation may appear artificially high because
negotiation tends to be used in cases where a conflict assessment has
indicated the likelihood of success. Selection bias therefore can
significantly skew results.
Content Scope Errors
As discussed earlier, evaluation can focus on several different
dimensions of plan performance ranging from implementation effectiveness
to outcome efficiency. A common error is to complete an evaluation of
only a few of the dimensions of the plan and then draw conclusions on
the plan effectiveness based on the limited assessment. For example, it
is common to evaluate plans by assessing whether the recommendations are
implemented. While assessing implementation of recommendations is a
necessary component of evaluation, it is not sufficient. The question of
whether the implemented recommendations are meeting plan objectives
efficiently is also critical to the evaluation. For example, evaluations
of acid rain reduction strategies have extolled the success of policies
in reducing emissions but have not adequately assessed whether the
acidity levels of the environment have returned to acceptable levels or
whether the reductions are being achieved in the most cost-effective
manner (OECD 2004).
Timing Scope Errors
Another common error in evaluation is undertaking the evaluation
only once. Impacts of plans and programs occur over many years and
premature evaluation or single point evaluations can miss many of the
impacts.
Feasible Options Error
Evaluations based on best practice standards compare plans to a
theoretical ideal that may be impossible to achieve. Best practice
evaluations are useful in indicating how the plan can be improved, but
can lead to the erroneous conclusion that the planning model is
deficient and should be rejected. For example, one of the case studies
in this volume evaluates an innovative collaborative planning model that
failed to reach a consensus agreement and was rated as a failure by the
participants. This finding could lead to the conclusion that
collaborative models are deficient. This would be an unfounded
conclusion unless it could be demonstrated that an alternative planning
model would have led to a successful outcome or be more likely to lead
to a successful outcome. It is important to compare a plan to the
feasible alternatives when assessing performance, rather than to a
theoretical ideal.
Case Studies
The five case studies in this volume evaluate various aspects of
environmental and resource planning. Each case study provides a
framework for evaluation and evaluation findings. Two of the case
studies deal with collaborative land use planning processes, two deal
with protected areas planning, and one deals with water policy. The case
studies are categorized by type of evaluation in Table 2.
Collaborative Planning: Lillooet Land and Resource Management Plan
(Gunton et al.)
Collaborative planning has emerged as an increasingly popular model
that is alleged to have significant benefits relative to other planning
models. Collaborative planning delegates the responsibility for planning
to stakeholders who engage in face to face negotiations to prepare a
plan by consensus agreement. Despite collaborative planning's
growing popularity, evaluation of collaborative planning is still in its
infancy (Gunton and Day 2003). Without proper evaluation, the merits of
collaborative planning relative to other planning methods and best
practice guidelines for effective implementation of collaborative
planning will remain uncertain.
Gunton et al. address the need for more evaluative research on
collaborative planning by providing a case study evaluation of a
collaborative planning process used to prepare a regional land use plan
for the Lillooet region in British Columbia. The case study is part of a
larger multiyear research program on collaborative planning in the
School of Resource and Environmental Management at Simon Fraser
University. The Lillooet process was chosen for a detailed case study
because it is the only collaborative process under the Land and Resource
Management Planning initiative in British Columbia that did not reach a
consensus-based agreement. The Lillooet process also experimented with a
unique final offer selection technique in an attempt to reach a
decision.
The Lillooet evaluation utilizes an evaluation methodology
developed as part of the larger collaborative planning research program
at Simon Fraser University. The Lillooet evaluation faces several major
challenges common to evaluating resource and environmental planning. The
first challenge is identifying standards for comparison. The official
goals for the collaborative planning process are too vague to provide a
clear foundation for evaluation. Consequently the Lillooet case utilizes
a comprehensive list of fourteen best practices process management
criteria and eleven outcome criteria that were developed based on a
review of the literature on collaborative planning. The importance of
the evaluation criteria was tested by surveying stakeholders.
The next challenge is assessing the degree to which the evaluative
criteria are met. For most of the criteria, such as neutrality of
management staff, objective data for assessing the degree to which the
criteria are met are not available. Consequently, an alternative
approach is required. One approach is for the researchers to make a
judgment using some type of rating scale such as met, partially met, or
not met. The problem with this approach is that in many cases the
researchers do not have a basis for making an accurate judgment.
Consequently, researchers' judgment was rejected in favor of
relying on a survey of stakeholders engaged in the process to rank the
degree to which the criteria are met on a Likert-type scale,
complemented by open-ended questions. Follow-up interviews were also
conducted with stakeholders to elaborate on questionnaire responses. In
this way, the evaluation used external evaluators to design the
methodology and construct the evaluation framework, and internal
evaluators to rate performance of the program.
The evaluation of the Lillooet planning process successfully
identifies strengths and weaknesses and indicates how the process could
be improved. For example, the findings suggest that: the process should
have allowed the stakeholders more time to achieve a consensus outcome
instead of using a final offer selection process; facilitators should
have used a single text planning approach to discourage the preparation
of competing plans from different stakeholder groups; and, increased
effort should have been made to engage key stakeholder groups who were
not part of the process. Therefore, the evaluation provides useful data
for improving the process in the future.
Gunton et al. also point out that the evaluation method they used
has deficiencies. The largest problem is that the evaluation relies on
the perspectives of stakeholders, which may or may not be accurate. For
some criteria, such as the neutrality of staff and the degree to which
the outcome met the interests of each stakeholder group, this is not a
problem because the perception of stakeholders is a sound basis for
measuring performance. For other criteria, particularly outcome criteria
such as the extent to which the process met the public interest and
whether the process was superior to alternative methods of planning,
stakeholder perceptions are less reliable. Stakeholder evaluations are
also constrained by low response rates that make the interpretation of
results challenging. The response rate in the Lillooet case study was
33%, which has a low confidence range for a small sample size. To remedy
these problems with stakeholder surveys, the authors suggest developing
more objective measures of outcomes to assess success. The survey is
also based on a single snapshot of stakeholder opinion taken after the
completion of a planning process. Accuracy would be improved by
completing evaluation in stages over an extended time horizon.
Another limitation is that the evaluation is based on a single case
study, which is too small a sample to provide reliable generic
conclusions on the merits of the planning model used. The researchers
address this limitation by pointing out that the study is part of a
research program that is based on a much larger number of collaborative
planning processes.
The authors emphasize several important lessons for evaluation from
the case study. First, it is crucial to have multiple evaluative
criteria. Relying on narrowly defined evaluative criteria may exclude
important benefits of the plan and result in unjustified rejection of
the planning model. For example, although the planning process did not
achieve the desired outcome of a consensus agreement on a plan, the
process did achieve important other benefits including improved
stakeholder relations and improved stakeholder skills and knowledge.
Second, the case study illustrates that it is important to compare the
planning model to feasible alternatives as well as best practice ideals
to assess merits. A planning model may not meet best practices criteria,
but it still may be superior to all the alternatives. However, as the
authors caution, comparing alternatives is challenging because the
alternatives can rarely if ever be tested as part of a controlled
experiment where all factors are held constant.
Ontario Resource Stewardship Agreements (Browne et al.)
Conflict between the tourism and forest industries in Ontario led
to the signing of a memorandum of understanding by the Ontario
government, the tourism industry, and the forest industry to use a new
collaborative process called resource stewardship agreements (RSA) to
help resolve land use conflicts. The second paper, by Browne et al., in
this volume summarizes an evaluation of the RSA process.
The RSA evaluation sets out to answer two questions. The first
question is whether the RSA process is meeting goals set by policy
makers and the tourism sector. The second question is the whether the
RSA process meets best practice requirements as defined in the academic
literature for collaborative planning processes. Criteria used for
evaluating the RSAs, were taken from three sources: government goals for
the process, tourism industry goals, and academic literature on best
practices.
The next step in this evaluation was to assess the degree to which
the evaluative criteria were met. The study relies on two sources:
stakeholder responses based on a mail survey and researcher assessments
based on a review of relevant documents. In some cases, both sources
were used to assess the same evaluative criterion. The combination of
stakeholder assessments combined with researcher's assessments
based on document analysis is an interesting approach that attempts to
offset the deficiencies of relying on just one of the sources.
The RSA survey used a Likert-type scale to assess the opinion of
respondents concerning the extent of agreement with statements
describing the process. Responses were received from 116 stakeholders,
for a response rate of 26%. This is another example of the low response
rates common for stakeholder evaluations. The evaluative criteria were
assessed as met, somewhat met, neutral or not met based on the average
response.
The RSA case study provides useful evaluation results for decision
makers by identifying strengths and weaknesses of the process. The RSA
case study states clearly the limitations of the evaluation. First, the
evaluation is done from the perspective of only one stakeholder group:
the tourism sector. This is a stated objective of the research that is
justified on the grounds that the purpose of the RSA is to meet the
interests of the tourist sector and therefore should be evaluated from
the tourist sector perspective. The researchers suggest that the
government should consider broadening the RSA planning process to
include all relevant stakeholders. The researchers also caution that the
results are a snap shot of stakeholder views assessed early in the
implementation phase. As such, the results help identify how the process
can be improved, but do not provide definitive results on the success of
the program. More evaluations over an extended time horizon are
required.
Canadian Bulk Water Export Policy (MacNab et al.)
The third case study, by MacNab et al., evaluates an innovative
approach termed "guided federalism" to develop bulk water
export policy in Canada. Guided federalism is based on the federal
government encouraging adoption of a national policy by the provinces by
providing a recommended policy framework that provinces are encouraged,
but not required, to adopt. Guided federalism is designed to address the
challenges of developing consistent national policy in a federal state
in which two levels of government have overlapping jurisdictions.
Concern over bulk water exports led to calls for development of a
national bulk water export policy. The challenge is that both the
federal and provincial governments have authority to regulate water
under the Canadian constitution. To address the need for development of
a consistent national policy while recognizing the rights of the two
levels of government, the federal government developed the Accord for
the Prohibition of Bulk Water Removal from Drainage Basins (Accord) in
1999 that outlines a proposed policy that the provinces are encouraged
to adopt. The purpose of the case study evaluation is to assess whether
the guided federalism approach was successful.
The case study uses eight evaluative criteria structured in the
form of questions that are based on the goals of the Accord to regulate
bulk water exports. The questions are defined specifically enough to
allow for yes or no answers. The answers to the questions are based on a
review of legislation, other relevant documents and public statements.
The evaluation concluded that guided federalism did not achieve its
objectives in the bulk water case based on the fact that none of the
jurisdictions meet all eight evaluative criteria. An unavoidable
limitation of the study is that it is not able to compare the guided
federalism policy against feasible alternative strategies--such as a
compulsory national strategy--because no alternative strategy has been
implemented and therefore no alternative can be evaluated. Thus the
study shows that guided federalism has not met the objectives but does
not show that other feasible strategies would have been more successful.
Protected Area Selection (Paridaen et al.)
The case study on protected area selection by Paridaen et al.
evaluates the process for selecting the protected areas that were
designated in British Columbia to achieve a more than doubling (by area)
of the park system between 1990 and 2002. The purpose of the evaluation
is to determine: 1) what criteria for the selection of protected areas
were deemed to be important by stakeholders; and 2) whether the criteria
deemed important were actually used in the selection process.
The first step in the evaluation was to complete a literature
review to identify protected area selection criteria. Twenty-four
criteria were selected and grouped into three categories: environmental,
social, and economic. Next, a survey was developed to test the
significance of the criteria by using a five point Likert-type scale
ranging from very important to not at all important. The survey
questionnaire was designed as a self-administered, mail back survey.
The next step was to select survey respondents. Four case study
regions were chosen to represent a cross section of completed land and
resource planning processes that designated protected areas. Surveys
were distributed to all 170 stakeholders who participated in the
protected area selection process in the four regions and responses were
received from 46 participants for a response rate of 27%. Respondents
were asked to rate the generic importance of the criteria, and the
importance of the criteria in the actual protected areas selection
process that they participated in, on the five point Likert scale. The
average respondents' ranking for each criterion was calculated. The
generic importance ratings were then compared with the actual ratings to
evaluate the extent to which the selection process was based on the
generic importance criteria.
The study found that environmental criteria were ranked as the most
important for protected area selection, followed by social, and
economic. The ranking of the criteria actually used in the selection
process was similar to the generic ranking. However, the use of social
and economic criteria was lower in the actual selection process than
warranted by the generic ranking and some specific criteria such as
increasing employment had a much lower role in the actual selection
process than warranted by their significance ranking. The researchers
recommend that future protected area selection processes be designed to
give due weight to all criteria.
The protected area selection study provides one of the first
attempts to rank the importance of criteria and then assess the degree
to which the criteria were used based on a survey of stakeholders
engaged in the selection process. This evaluative method, in essence,
uses stakeholders to develop the evaluation criteria and then assess the
extent to which the criteria are met. Given that policy is intended to
meet democratically determined goals, the methodology of stakeholder
surveys to set evaluation criteria and assess the degree to which the
criteria are met clearly has merit. However, this approach is limited to
policy cases in which stakeholders are actively engaged and are
therefore well informed. The researchers also observe that their study
may suffer from selection bias because it is based on only four case
studies of over twenty potential cases.
Protected Area Planning Process (Ronmark et al.)
The objective of the final study in this volume, by Ronmark et al.,
is to develop and test an overall methodology for evaluating planning
processes. The first step is to identify best practices criteria for
planning based on a literature review. Thirty-five best practices
criteria are identified and grouped into three categories: planning
process criteria, planning outcome criteria, and planning implementation
criteria. Next, a survey is used to test the importance of the
evaluative criteria and the degree to which the criteria are met in the
planning process. The importance of each criterion is ranked on a four
point Likert-type scale ranging from not important to very important.
The extent to which the criteria are met is assessed by a five point
Likert scale ranging from strongly agree to strongly disagree with
statements describing the planning process. Multiple statements are used
for each criterion and the average of the responses to the multiple
statements is calculated to assess the degree to which each criterion is
met.
The case study used to test the methodology is the preparation of
park master plans in British Columbia. The survey was sent to two types
of participants in the park planning process. One group consisted of the
provincial government park planners and the other group consisted of
non-governmental stakeholders with an interest in protected area
planning. Due to logistical constraints, respondents were limited to
eleven park planners representing different regions of the province and
one representative from each of the fifteen non-governmental stakeholder
organizations that have a stated interest in provincial park planning.
The response rates from park planners and non-governmental stakeholder
organizations were 82% and 67%, respectively.
The results confirmed that all the criteria identified in the
literature review were ranked as important to very important, except for
one (independent facilitation) that was ranked as only somewhat
important. The testing of the degree to which the criteria are met
identified strengths and weaknesses in the planning process that need to
be mitigated. An interesting finding in the study is the wide
discrepancy between the ratings of the government planners and the
non-governmental stakeholders, with the park planners providing more
positive ratings for achievement of the best practices criteria. The
researchers point out that while it is not surprising that the planners
would rank the outcomes more highly, the discrepancy illustrates the
need to include all relevant stakeholders in plan evaluation to counter
the "internal evaluator" bias of the planners. While the
researchers caution that the specific findings in the case study should
be interpreted with caution because of the small sample size, they
conclude that the case study application confirms the feasibility and
utility of the plan evaluation methodology.
Conclusion
Effective environmental planning is contingent on comprehensive
evaluation. Based on the case studies and evaluation theory we can
identify eight keys to successful evaluation.
1. Evaluation should use a comprehensive set of evaluative criteria
that include explicit policy goals and best practice standards that have
been empirically verified.
2. The cases being evaluated should represent a large enough sample
to provide reliable results.
3. Evaluation should include all components of the program
including program theory, design, implementation, outcomes, and
efficiency.
4. Evaluation should occur on an ongoing basis over multiple time
periods at critical steps in the process.
5. If a summative evaluation is being done to assess whether the
program should continue, the program should be evaluated against
feasible alternatives.
6. Care should be taken in concluding causality between programs
and outcomes.
7. Evaluation should incorporate the views of stakeholders in the
assessment, as well as those of external evaluators.
8. Evaluation reports should clearly state the limitations of the
evaluation.
In the real world of evaluation, meeting all eight criteria is
extremely difficult due to resource constraints and methodological
challenges. None of the case studies in this volume meet all criteria.
The case studies show that identifying clear and measurable program
outcomes, conducting multiple period evaluations over an extended time
horizon, comparing programs against feasible options, and determining
causality between programs and outcomes are particularly difficult
criteria to meet. However, the case studies in this volume illustrate
how useful evaluations can be conducted in the complex field of resource
and environmental planning in the face of these resource and
methodological constraints. In particular, the case studies illustrate
the techniques and benefits of incorporating stakeholder views to verify
best practices criteria and assess the degree to which best practices
criteria are met. The case studies also illustrate the importance of
using multiple evaluation criteria to assess program performance.
Hopefully, the case studies in this volume will stimulate ongoing
research in this important field of evaluation in environmental
planning.
Acknowledgements
We would like to thank SSHRC for funding support for this research
and the anonymous referees for their helpful suggestions.
References
Bellamy, J.A., G. T. McDonald, G.J. Syme, and J.E. Butterworth.
1999. Evaluating Integrated Resource Management. Society and Natural
Resources 12: 337-353.
Day, J.C., Thomas I. Gunton, and T. Frame. 2003. Towards Rural
Sustainability in British Columbia: The Role of Biodiversity
Conservation and Other Factors. Environments 31(2): 21-39.
Clark, Alan and Ruth Dawson. 1999. Evaluation Research. Thousand
Oaks, California: Sage Publications.
Esty, Daniel C., Marc A. Levy, Tanja Srebotnjak, Alexander de
Sherbinin, Christine H. Kim, and Bridget Anderson. 2006. Pilot 2006
Environmental Performance Index. New Haven, Conn.: Yale Centre for
Environmental Law and Policy.
Gunton, Thomas.I. 2003. Natural Resource Megaprojects and Regional
Development: Pathologies in Project Planning. Regional Studies 37(95):
505-519.
Gunton, Thomas I., and J. C. Day 2003. The Theory and Practice of
Collaborative Planning in Resource and Environmental Management.
Environments 31(2): 5-19.
Gunton, Thomas I., and Chris Joseph.2006. Toward a National
Sustainability Strategy for Canada: Putting Canada on the Path to
Sustainability within a Generation. Vancouver: David Suzuki Foundation.
Gunton, Thomas I., Ken Calbick, Anita Bedo, Emily Chamberlin,
Andrea Cullen, Krista Englund, Aaron Heidt, Matthew Justice, Gordon
McGee, Sean Moore, Carolyn Pharand, Ian Ponsford, Jennifer Reilly and
Ian Williamson. 2005. The Maple Leaf in the OECD: Comparing Progress
Toward Sustainability. Vancouver: David Suzuki Foundation.
International Institute for Sustainable Development (IISD). 2004.
National Strategies for Sustainable Development: Challenges, Approaches
and Innovations in Strategic and Co-ordinated Action. Winnipeg: IISD.
Joseph, Chris, Thomas I. Gunton and J.C. Day. 2007. Planning
Implementation: An Evaluation of the Strategic Land Use Planning
Framework in British Columbia. Journal of Environmental Management (in
press).
Leach, W.D., N. Pelkey and Paul Sabatier. 2002. Stakeholder
Partnerships as Collaborative Policymaking: Evaluation Criteria Applied
to Watershed Management in California and Washington. Journal of Policy
Analysis and Management 21(4): 645-670.
National Academy of Public Administration (NAPA). 2001. Evaluating
Environmental Progress: How EPA and the States can improve the Quality
of Enforcement and Compliance Information. A Report by a Panel of the
National Academy of Public Administration.
[Accessed on 15 June, 2007].
Organization for Economic Cooperation and Development (OECD). 2004.
OECD Environmental Performance Reviews: Canada. Paris: OECD.
Rossi, Peter, Mark Lipsey and Howard Freeman. 2004. Evaluation: A
Systematic Approach. Thousand Oaks, California: Sage Publications.
Suvedi, Murari, and Shawn Morford. 2003. Conducting Program and
Project Evaluations: A Primer for Natural Resource Program Managers in
British Columbia. FORREX-Forest Research Extension Partnership.
Kamloops, B.C. FORREX Series 6.
[Accessed on 18 April, 2007].
Weiss, Carol. 1998. Evaluation Methods for Studying Programs and
Policies. Second Edition. Upper Saddle River, New Jersey: Prentice-Hall.
Thomas Gunton is a professor in the School of Resource and
Environmental Management and Director of the Resource and Environmental
Planning Program at Simon Fraser University. He has held numerous senior
positions in government including Assistant Deputy Minister of Energy
and Mines for the government of Manitoba and Deputy Minister of
Environment, Lands, and Parks for the government of British Columbia.
His research focuses on environmental mediation and dispute resolution
and resource and environmental planning. He can be contacted at
tgunton@shaw.ca
Murray Rutherford is an Assistant Professor in the School of
Resource and Environmental Management at Simon Fraser University. He is
a policy scientist and planner whose research focuses on policy analysis
and evaluation, ecosystem-based management, and human values and
attitudes toward nature and the conservation of biological diversity. He
can be contacted at mbr@sfu.ca
Peter Williams is a professor in the School of Resource and
Environmental Management and Director of the University Centre for
Tourism Policy and Research at Simon Fraser University. His research
relates to the use of land and resources for sustainable tourism. He can
be contacted at peterw@sfu.ca
Chad Day is professor emeritus and founding director of the School
of Resource and Environmental Management at Simon Fraser University. His
research focuses on institutions for integrated land and water
management and environmental planning. He can be contacted at
jday@sfu.ca
Table 1. Evaluation Options
Issue Option
Who conducts evaluation 1. Internal
2. External
Purpose of evaluation 1. Program Improvement (formative evaluation)
2. Program justification (summative evaluation)
3. Generic knowledge (theoretical evaluation)
4. Hidden agenda (ulterior evaluation)
Timing 1. Before implementation
2. During program operation
3. After program completion
Timing Scope 1. Single snap shot
2. Multiple period
Content Scope 1. Program theory
2. Program design
3. Program implementation
4. Program outcomes
5. Program efficiency
Methodology 1. Qualitative
2. Quantitative
Evaluative Criteria 1. Program goals
2. Best practices
3. Social welfare (benefit-cost)
4. Efficiency (cost-effectiveness)
5. Time series trend
6. Cross sectional comparison
Table 2. Case Study Categorization
Gunton Browne MacNab
Descriptor Case Study et al. et al. et al.
Who
Internal
External [check] [check] [check]
Purpose
Program Improvement
Program justification
Generic knowledge [check] [check] [check]
Hidden agenda
Timing
Before implementation
During operation [check] [check]
After program [check]
completion
Timing
Scope
Single snap shot [check] [check] [check]
Multiple period
Content
Scope
Program theory
Program design
Program [check] [check]
implementation
Program outcomes [check] [check] [check]
Program efficiency
Methodology
Quantitative
Qualitative [check] [check] [check]
Evaluative
criteria
Program goals [check] [check] [check]
Best practices [check] [check]
Social welfare
Time series trends
Cross sectional
Paridaen Ronmark
Descriptor Case Study et al. et al.
Who
Internal
External [check] [check]
Purpose
Program Improvement
Program justification
Generic knowledge [check] [check]
Hidden agenda
Timing
Before implementation
During operation
After program [check] [check]
completion
Timing
Scope
Single snap shot [check] [check]
Multiple period
Content
Scope
Program theory
Program design
Program [check] [check]
implementation
Program outcomes [check] [check]
Program efficiency
Methodology
Quantitative
Qualitative [check] [check]
Evaluative
criteria
Program goals [check] [check]
Best practices [check] [check]
Social welfare
Time series trends
Cross sectional
COPYRIGHT 2006 Wilfrid Laurier
University Reproduced with permission of the copyright holder. Further reproduction or distribution is prohibited without permission.
Copyright 2006, Gale Group. All rights
reserved. Gale Group is a Thomson Corporation Company.
NOTE: All illustrations and photos have been removed from this article.