Entrepreneur: Start & Grow Your Business

Introduction: evaluation in resource and environmental planning.


by Gunton, Thomas I.^Rutherford, M.B.^Williams, Peter W.^Day, J.C.
Environments • Dec, 2006 •

Abstract

Evaluation systematically assesses performance of policies in meeting goals. The primary purpose of evaluation is to provide information to help improve programs. There are many approaches to evaluation, each with its own strengths and weaknesses, and many types of common errors in evaluation methodology. Keys to successful evaluation include: using comprehensive evaluation criteria, using a large enough sample size, evaluating all facets of the program over an extended time horizon, comparing the program to feasible alternatives, clearly identifying causality, incorporating the views of stakeholders, and clearly stating methodological limitations.

Resume

L'evaluation permet d'examiner systematique-ment le rendement des politiques relativement a l'atteinte des objectifs. Le principal objet de l'evaluation est de fournir de l'information qui permette d'ameliorer les programmes. Il existe de nombreuses approches a l'evaluation, chacune possedant ses propres forces et faiblesses et de nombreux types d'erreurs communes en matiere de methodologie devaluation. Les cles d'une evaluation reussie sont, notamment: l'utilisation de criteres d'evaluation complets, l'utilisation d'un echantillonnage suffisamment grand, l'evaluation de tous les aspects du programme sur une longue periode de temps, la comparaison du programme avec les alternatives realisables, l'identification precise de la causalite, l'integration du point de vue des intervenants et l'enonce precis des limites methodologiques.

Key Words

Policy evaluation, resource and environmental management

Introduction

Resource and environmental planning can be defined as a decision making process involving six steps (Figure 1). Evaluation plays a key role in two of these steps: evaluating options and evaluating outcomes. Given its importance in planning and policy-making, it is not surprising that evaluation is institutionalized as a legal requirement in many jurisdictions and has become a distinct field with its own body of theory and methodology.

The purpose of this volume of Environments is to explore recent developments in evaluation applied to resource and environmental planning through a series of case studies. In this introductory article we provide an overview of the field of evaluation and discuss some of the challenges in evaluating resource and environmental planning. We then review the contributions of the case studies to evaluation and identify keys to successful evaluation.

Evaluation Theory

In her landmark text on evaluation, Carol Weiss (1998: 4) provides a definition of evaluation that is commonly accepted in the evaluation literature. According to Weiss, evaluation is "the systematic assessment of the operation and/or the outcomes of a program or policy, compared to a set of explicit or implicit standards, as a means of contributing to the improvement of the program or policy."

Weiss emphasizes the following four key attributes of evaluation in her definition.

* Evaluation is systematic, meaning it utilizes methods of analysis that meet rigorous scientific standards and produces results that can be replicated by other analysts.

* Evaluation focuses on the operation of a program and/or program outcomes.

* Evaluation assesses the degree to which the process and/or outcomes meet plan goals.

* The purpose of evaluation is to inform managers on the strengths and weaknesses of the program and identify ways in which the program can be improved and/or whether the program should be continued.

A wide variety of alternative approaches to evaluation exist (Table 1).

Internal versus External

The first decision in evaluation is whether the evaluation should be conducted internally within the organization that is managing the program or externally by an independent evaluator. Internal and external evaluations have their strengths and weaknesses (Clark and Dawson 1999, Weiss 1998). Internal evaluators may have a bias in favor of the program that they are managing, which reduces their likelihood of producing an accurate evaluation. This bias may be strategic, in that internal evaluators want the program to appear successful, or may simply arise because their thinking has been affected by close involvement with the development and implementation of the program. An advantage of using external evaluators is that they are less likely to have a bias in favor of the program. However, external evaluations also have potential weaknesses. External evaluators are less likely to understand the program and program goals. Results from external evaluations may also be less likely to be used by the program managers to improve the program because the external evaluators may not be trusted or believed, and in any event are not retained within the organization to assist in the implementation of recommendations. Some researchers suggest a way of achieving the combined benefits of external and internal evaluations is to use a hybrid approach, in which external and internal evaluators work together to assess the program (Suvedi and Morford 2003).

[FIGURE 1 OMITTED]

Purpose

A second decision is determining the purpose of evaluation. Four purposes are identified in the evaluation literature (Weiss 1998, Rossi et al. 2004). The most common purpose of evaluation is to assist in the improvement of a program by identifying strengths and weaknesses. This type of evaluation is referred to as formative evaluation. A second purpose is to determine whether the program is justified or not. This type of evaluation, normally referred to a summative evaluation, is often undertaken to fulfill accountability requirements for public spending. A third purpose is pursuit of general knowledge. This type of evaluation, referred to as theoretical evaluation, is normally undertaken by academics and is not linked directly or indirectly to program managers. A fourth purpose of evaluation may be to fulfill a hidden agenda such as justifying a decision that has already been made to eliminate a program or initiate a project. In these cases, the evaluation, which can be referred to as an ulterior evaluation, is managed to achieve a predetermined outcome.

Timing

A third decision in evaluation is timing. Evaluations can be done prior to policy or program development to assess options, during program operation, and/or after program completion. Evaluations can provide single snapshots of program performance or be ongoing over a longer time frame to analyze trends and foster continuous improvement. A comprehensive evaluation process should include evaluations at all key stages of a program's life (Weiss 1998). Another useful time-frame for evaluation is to compare the evaluation results after a program has been implemented (ex-post evaluation) with the forecast results used to select a program (ex-ante evaluation). Ex-post/ex-ante comparisons are especially useful for assessing accuracy of initial evaluation assessments and indicating how ex-ante evaluations can be improved. While ex-post/ex-ante evaluations are rarely undertaken, the limited studies that have been completed show a systematic optimistic bias in ex-ante evaluations that contributes to poor program selection and design (Gunton 2003).

What to Evaluate

The fourth decision in evaluation is deciding what component of the policy or program to evaluate. Evaluations can focus on one or more of five program components: program theory, which is the underlying logic of the program including the causal model of the problem it is intended to address, program design, program implementation, program outcomes, and program efficiency, which measures the outcomes per unit of resources required to operate the program. A comprehensive evaluation normally should include all five components in the evaluation (Weiss 1998).

Methodology

The fifth decision in evaluation is selecting a methodology. First, the evaluator must choose between quantitative and qualitative analysis. Quantitative analysis describes program performance in numerical terms that can be statistically analyzed. Qualitative analysis evaluates programs by observation, interviews, and document analysis and assesses performance in verbal as opposed to numerical terms. Within quantitative analysis there are also options. The preferred technique is randomized experiments. Randomized experiments attempt to identify program impacts by comparing groups or areas where the program is applied to a control group where all relevant variables other than those related to the program are held constant. Randomized experiments attempt to overcome one of the primary challenges in evaluation: distinguishing outcomes due to the program from outcomes due to other factors. The use of randomized experiments is constrained in resource and environmental management due to the complexity and diversity of the systems being evaluated and political and ethical considerations. There are, however, a range of alternatives available between pure experimental methods and pure qualitative methods (Clark and Dawson 1999).

Evaluation Criteria

The final question in evaluation is determining the criteria to use to assess program performance. A common approach for developing evaluative criteria is to use the explicit goals and objectives for the program to assess program performance. The major problem with this approach is that goals and objectives for environmental programs are often too vague or incomplete to provide a clear standard for assessing performance (Bellamy et al. 1999, Gunton and Joseph 2006). Even if the goals and objectives are clear and comprehensive, assessing a program relative to its goals and objectives assumes that the goals and objectives adequately reflect the public interest. Programs may have important unintended consequences relevant to the public interest that may not be expressed in a stated goal of the program. Using only stated goals and ignoring unintended consequences would result in a deficient evaluation. Using explicit goals and objectives also does not indicate whether the program is the most effective or efficient way of achieving the objectives because the program is not being assessed relative to options.

A second approach is to compare performance of the program being evaluated to similar programs in other jurisdictions by completing a cross-sectional analysis For example, an increasingly common way of evaluating environmental performance of jurisdictions is to compare environmental indicators such as greenhouse gas emissions per capita by jurisdiction (Gunton et al. 2005, Esty et al. 2006). The assumption is that the relative performance of a jurisdiction measures the effectiveness of the jurisdiction's plans and programs. For example, lower per capita greenhouse gas emissions relative to other jurisdictions indicate the effectiveness of greenhouse gas emission control strategies. The advantage of this approach is that the data for comparison are more readily available than other program evaluation data such as impacts of programs on the state of the environment. However, this approach does not indicate whether the performance is good or bad--all jurisdictions may be poor performers--and does not normally distinguish between outcomes due to the program versus other factors such as geography and climate.

A third common approach is to construct time series for outcome indicators to determine if trends are improving or deteriorating. This approach is used in most environmental monitoring systems, such as the Canadian government's National Indicator Initiative. The assumption is that if trends are improving, the existing program is effective. The problem with this approach is that it does not distinguish between changes due to the program and those due to other factors (Gunton and Joseph 2006). Further, trend line analysis does not indicate whether the performance is good or bad in absolute terms, just whether it is changing.

Another evaluative criterion is best practice standards based on theory and/or the performance of other jurisdictions. Best practice standards are commonly used in process evaluations to assess program management and planning. For example, best practice analysis has been successfully used to assess the quality of environmental sustainability planning in various countries by the OECD, the United Nations and non-governmental researchers (Gunton and Joseph 2006, IISD 2004). The underlying assumption in these evaluations is that better processes lead to better outcomes. Although this approach is useful in identifying strengths and weaknesses of planning systems, it relies on best practice criteria that are often not empirically verified.

A final criterion for evaluation is a comprehensive benefit-cost analysis that assesses program benefits relative to costs to determine if the program is in the public interest. Although benefit-cost addresses many of the problems with other methods of evaluation, it also suffers from major weaknesses. The most significant challenge in benefit-cost is monetizing intangibles such as pollution and ecological values that are central to resource and environmental planning. Benefit-cost also requires identifying impacts attributed to the program. As discussed above, distinguishing between impacts due to the program and those due to other factors is difficult. Nonetheless, benefit-cost is a legally required component of environmental evaluation in many jurisdictions such as the United States. Cost-effectiveness analysis, which measures outputs per unit of input, is less demanding in terms of monetizing intangibles, but is difficult to apply to resource and environmental plans where outputs are difficult to quantify.

Common Evaluation Errors

Evaluative Criteria Errors

A common problem in evaluation is using inappropriate evaluative criteria. A recent evaluation of the effectiveness of environmental regulations in the U.S. found that the criteria normally used to assess performance such as number of permits issued, enforcement actions, and inspections do not indicate the effectiveness of the programs in meeting environmental objectives (NAPA2001). Evaluations of the success of land use planning in British Columbia have used the implementation of recommendations to increase protected areas as a measure of success (Day et al. 2003). While this is important, the underlying objectives of increasing protected areas, such as reduction in endangered species, also need to be assessed. Evaluations of alternative dispute resolution processes sometimes assess effectiveness by using the single criterion of whether an agreement was reached, which ignores the relative quality of the agreement, and excludes other important benefits such as improved stakeholder relations even when an agreement was not reached (Gunton and Day 2003). These examples show that care must be taken to develop evaluative criteria that are comprehensive and reflect the underlying objectives of the plan. Otherwise the use of inappropriate evaluative criteria will lead to unjustified conclusions regarding program performance.

Causation Assumption Errors

Another common error in evaluation is assuming that a correlation between plan implementation and outcomes is causally linked. For example, evaluation of regional land use planning in British Columbia is based on monitoring time series trends for key environmental indicators (Joseph et al. 2007). The assumption is that the trends accurately assess the impact of the plan. The problem is that there are many confounding factors that affect environmental trends such as weather patterns, natural cycles, and human activity that make it difficult if not impossible to identify impacts of the plan. Also, the impacts of the plan occur over a long time horizon and may not be detected until many years later. The challenge is to compare what would have happened in the absence of the plan with what happened with the plan, holding all other variables constant over a long enough time horizon to assess impacts resulting from the plan. Another example of causation assumption error is a recent evaluation of the effectiveness of mediation processes that concluded that mediators have little positive impact (Leach et al. 2002). The study compared cases with mediators to cases without mediators and assumed that all other factors were constant. One apparent problem is that the cases that opted for independent mediators may have been more challenging cases. Therefore, differences in outcomes may have been due to factors other than the presence of a mediator.

Selection Bias Error

Many evaluations suffer from using a biased sample of cases for evaluation. In social policy, participants in a program may be selected based on attributes that increase the likelihood of success, instead of being randomly selected. A positive impact on recipients relative to non-recipients may be due to these other attributes, not to the program. In planning, evaluation of performance of dispute resolution techniques such as consensus-based negotiation may appear artificially high because negotiation tends to be used in cases where a conflict assessment has indicated the likelihood of success. Selection bias therefore can significantly skew results.

Content Scope Errors

As discussed earlier, evaluation can focus on several different dimensions of plan performance ranging from implementation effectiveness to outcome efficiency. A common error is to complete an evaluation of only a few of the dimensions of the plan and then draw conclusions on the plan effectiveness based on the limited assessment. For example, it is common to evaluate plans by assessing whether the recommendations are implemented. While assessing implementation of recommendations is a necessary component of evaluation, it is not sufficient. The question of whether the implemented recommendations are meeting plan objectives efficiently is also critical to the evaluation. For example, evaluations of acid rain reduction strategies have extolled the success of policies in reducing emissions but have not adequately assessed whether the acidity levels of the environment have returned to acceptable levels or whether the reductions are being achieved in the most cost-effective manner (OECD 2004).

Timing Scope Errors

Another common error in evaluation is undertaking the evaluation only once. Impacts of plans and programs occur over many years and premature evaluation or single point evaluations can miss many of the impacts.

Feasible Options Error

Evaluations based on best practice standards compare plans to a theoretical ideal that may be impossible to achieve. Best practice evaluations are useful in indicating how the plan can be improved, but can lead to the erroneous conclusion that the planning model is deficient and should be rejected. For example, one of the case studies in this volume evaluates an innovative collaborative planning model that failed to reach a consensus agreement and was rated as a failure by the participants. This finding could lead to the conclusion that collaborative models are deficient. This would be an unfounded conclusion unless it could be demonstrated that an alternative planning model would have led to a successful outcome or be more likely to lead to a successful outcome. It is important to compare a plan to the feasible alternatives when assessing performance, rather than to a theoretical ideal.

Case Studies

The five case studies in this volume evaluate various aspects of environmental and resource planning. Each case study provides a framework for evaluation and evaluation findings. Two of the case studies deal with collaborative land use planning processes, two deal with protected areas planning, and one deals with water policy. The case studies are categorized by type of evaluation in Table 2.

Collaborative Planning: Lillooet Land and Resource Management Plan (Gunton et al.)

Collaborative planning has emerged as an increasingly popular model that is alleged to have significant benefits relative to other planning models. Collaborative planning delegates the responsibility for planning to stakeholders who engage in face to face negotiations to prepare a plan by consensus agreement. Despite collaborative planning's growing popularity, evaluation of collaborative planning is still in its infancy (Gunton and Day 2003). Without proper evaluation, the merits of collaborative planning relative to other planning methods and best practice guidelines for effective implementation of collaborative planning will remain uncertain.

Gunton et al. address the need for more evaluative research on collaborative planning by providing a case study evaluation of a collaborative planning process used to prepare a regional land use plan for the Lillooet region in British Columbia. The case study is part of a larger multiyear research program on collaborative planning in the School of Resource and Environmental Management at Simon Fraser University. The Lillooet process was chosen for a detailed case study because it is the only collaborative process under the Land and Resource Management Planning initiative in British Columbia that did not reach a consensus-based agreement. The Lillooet process also experimented with a unique final offer selection technique in an attempt to reach a decision.

The Lillooet evaluation utilizes an evaluation methodology developed as part of the larger collaborative planning research program at Simon Fraser University. The Lillooet evaluation faces several major challenges common to evaluating resource and environmental planning. The first challenge is identifying standards for comparison. The official goals for the collaborative planning process are too vague to provide a clear foundation for evaluation. Consequently the Lillooet case utilizes a comprehensive list of fourteen best practices process management criteria and eleven outcome criteria that were developed based on a review of the literature on collaborative planning. The importance of the evaluation criteria was tested by surveying stakeholders.

The next challenge is assessing the degree to which the evaluative criteria are met. For most of the criteria, such as neutrality of management staff, objective data for assessing the degree to which the criteria are met are not available. Consequently, an alternative approach is required. One approach is for the researchers to make a judgment using some type of rating scale such as met, partially met, or not met. The problem with this approach is that in many cases the researchers do not have a basis for making an accurate judgment. Consequently, researchers' judgment was rejected in favor of relying on a survey of stakeholders engaged in the process to rank the degree to which the criteria are met on a Likert-type scale, complemented by open-ended questions. Follow-up interviews were also conducted with stakeholders to elaborate on questionnaire responses. In this way, the evaluation used external evaluators to design the methodology and construct the evaluation framework, and internal evaluators to rate performance of the program.

The evaluation of the Lillooet planning process successfully identifies strengths and weaknesses and indicates how the process could be improved. For example, the findings suggest that: the process should have allowed the stakeholders more time to achieve a consensus outcome instead of using a final offer selection process; facilitators should have used a single text planning approach to discourage the preparation of competing plans from different stakeholder groups; and, increased effort should have been made to engage key stakeholder groups who were not part of the process. Therefore, the evaluation provides useful data for improving the process in the future.

Gunton et al. also point out that the evaluation method they used has deficiencies. The largest problem is that the evaluation relies on the perspectives of stakeholders, which may or may not be accurate. For some criteria, such as the neutrality of staff and the degree to which the outcome met the interests of each stakeholder group, this is not a problem because the perception of stakeholders is a sound basis for measuring performance. For other criteria, particularly outcome criteria such as the extent to which the process met the public interest and whether the process was superior to alternative methods of planning, stakeholder perceptions are less reliable. Stakeholder evaluations are also constrained by low response rates that make the interpretation of results challenging. The response rate in the Lillooet case study was 33%, which has a low confidence range for a small sample size. To remedy these problems with stakeholder surveys, the authors suggest developing more objective measures of outcomes to assess success. The survey is also based on a single snapshot of stakeholder opinion taken after the completion of a planning process. Accuracy would be improved by completing evaluation in stages over an extended time horizon.

Another limitation is that the evaluation is based on a single case study, which is too small a sample to provide reliable generic conclusions on the merits of the planning model used. The researchers address this limitation by pointing out that the study is part of a research program that is based on a much larger number of collaborative planning processes.

The authors emphasize several important lessons for evaluation from the case study. First, it is crucial to have multiple evaluative criteria. Relying on narrowly defined evaluative criteria may exclude important benefits of the plan and result in unjustified rejection of the planning model. For example, although the planning process did not achieve the desired outcome of a consensus agreement on a plan, the process did achieve important other benefits including improved stakeholder relations and improved stakeholder skills and knowledge. Second, the case study illustrates that it is important to compare the planning model to feasible alternatives as well as best practice ideals to assess merits. A planning model may not meet best practices criteria, but it still may be superior to all the alternatives. However, as the authors caution, comparing alternatives is challenging because the alternatives can rarely if ever be tested as part of a controlled experiment where all factors are held constant.

Ontario Resource Stewardship Agreements (Browne et al.)

Conflict between the tourism and forest industries in Ontario led to the signing of a memorandum of understanding by the Ontario government, the tourism industry, and the forest industry to use a new collaborative process called resource stewardship agreements (RSA) to help resolve land use conflicts. The second paper, by Browne et al., in this volume summarizes an evaluation of the RSA process.

The RSA evaluation sets out to answer two questions. The first question is whether the RSA process is meeting goals set by policy makers and the tourism sector. The second question is the whether the RSA process meets best practice requirements as defined in the academic literature for collaborative planning processes. Criteria used for evaluating the RSAs, were taken from three sources: government goals for the process, tourism industry goals, and academic literature on best practices.

The next step in this evaluation was to assess the degree to which the evaluative criteria were met. The study relies on two sources: stakeholder responses based on a mail survey and researcher assessments based on a review of relevant documents. In some cases, both sources were used to assess the same evaluative criterion. The combination of stakeholder assessments combined with researcher's assessments based on document analysis is an interesting approach that attempts to offset the deficiencies of relying on just one of the sources.

The RSA survey used a Likert-type scale to assess the opinion of respondents concerning the extent of agreement with statements describing the process. Responses were received from 116 stakeholders, for a response rate of 26%. This is another example of the low response rates common for stakeholder evaluations. The evaluative criteria were assessed as met, somewhat met, neutral or not met based on the average response.

The RSA case study provides useful evaluation results for decision makers by identifying strengths and weaknesses of the process. The RSA case study states clearly the limitations of the evaluation. First, the evaluation is done from the perspective of only one stakeholder group: the tourism sector. This is a stated objective of the research that is justified on the grounds that the purpose of the RSA is to meet the interests of the tourist sector and therefore should be evaluated from the tourist sector perspective. The researchers suggest that the government should consider broadening the RSA planning process to include all relevant stakeholders. The researchers also caution that the results are a snap shot of stakeholder views assessed early in the implementation phase. As such, the results help identify how the process can be improved, but do not provide definitive results on the success of the program. More evaluations over an extended time horizon are required.

Canadian Bulk Water Export Policy (MacNab et al.)

The third case study, by MacNab et al., evaluates an innovative approach termed "guided federalism" to develop bulk water export policy in Canada. Guided federalism is based on the federal government encouraging adoption of a national policy by the provinces by providing a recommended policy framework that provinces are encouraged, but not required, to adopt. Guided federalism is designed to address the challenges of developing consistent national policy in a federal state in which two levels of government have overlapping jurisdictions.

Concern over bulk water exports led to calls for development of a national bulk water export policy. The challenge is that both the federal and provincial governments have authority to regulate water under the Canadian constitution. To address the need for development of a consistent national policy while recognizing the rights of the two levels of government, the federal government developed the Accord for the Prohibition of Bulk Water Removal from Drainage Basins (Accord) in 1999 that outlines a proposed policy that the provinces are encouraged to adopt. The purpose of the case study evaluation is to assess whether the guided federalism approach was successful.

The case study uses eight evaluative criteria structured in the form of questions that are based on the goals of the Accord to regulate bulk water exports. The questions are defined specifically enough to allow for yes or no answers. The answers to the questions are based on a review of legislation, other relevant documents and public statements.

The evaluation concluded that guided federalism did not achieve its objectives in the bulk water case based on the fact that none of the jurisdictions meet all eight evaluative criteria. An unavoidable limitation of the study is that it is not able to compare the guided federalism policy against feasible alternative strategies--such as a compulsory national strategy--because no alternative strategy has been implemented and therefore no alternative can be evaluated. Thus the study shows that guided federalism has not met the objectives but does not show that other feasible strategies would have been more successful.

Protected Area Selection (Paridaen et al.)

The case study on protected area selection by Paridaen et al. evaluates the process for selecting the protected areas that were designated in British Columbia to achieve a more than doubling (by area) of the park system between 1990 and 2002. The purpose of the evaluation is to determine: 1) what criteria for the selection of protected areas were deemed to be important by stakeholders; and 2) whether the criteria deemed important were actually used in the selection process.

The first step in the evaluation was to complete a literature review to identify protected area selection criteria. Twenty-four criteria were selected and grouped into three categories: environmental, social, and economic. Next, a survey was developed to test the significance of the criteria by using a five point Likert-type scale ranging from very important to not at all important. The survey questionnaire was designed as a self-administered, mail back survey.

The next step was to select survey respondents. Four case study regions were chosen to represent a cross section of completed land and resource planning processes that designated protected areas. Surveys were distributed to all 170 stakeholders who participated in the protected area selection process in the four regions and responses were received from 46 participants for a response rate of 27%. Respondents were asked to rate the generic importance of the criteria, and the importance of the criteria in the actual protected areas selection process that they participated in, on the five point Likert scale. The average respondents' ranking for each criterion was calculated. The generic importance ratings were then compared with the actual ratings to evaluate the extent to which the selection process was based on the generic importance criteria.

The study found that environmental criteria were ranked as the most important for protected area selection, followed by social, and economic. The ranking of the criteria actually used in the selection process was similar to the generic ranking. However, the use of social and economic criteria was lower in the actual selection process than warranted by the generic ranking and some specific criteria such as increasing employment had a much lower role in the actual selection process than warranted by their significance ranking. The researchers recommend that future protected area selection processes be designed to give due weight to all criteria.

The protected area selection study provides one of the first attempts to rank the importance of criteria and then assess the degree to which the criteria were used based on a survey of stakeholders engaged in the selection process. This evaluative method, in essence, uses stakeholders to develop the evaluation criteria and then assess the extent to which the criteria are met. Given that policy is intended to meet democratically determined goals, the methodology of stakeholder surveys to set evaluation criteria and assess the degree to which the criteria are met clearly has merit. However, this approach is limited to policy cases in which stakeholders are actively engaged and are therefore well informed. The researchers also observe that their study may suffer from selection bias because it is based on only four case studies of over twenty potential cases.

Protected Area Planning Process (Ronmark et al.)

The objective of the final study in this volume, by Ronmark et al., is to develop and test an overall methodology for evaluating planning processes. The first step is to identify best practices criteria for planning based on a literature review. Thirty-five best practices criteria are identified and grouped into three categories: planning process criteria, planning outcome criteria, and planning implementation criteria. Next, a survey is used to test the importance of the evaluative criteria and the degree to which the criteria are met in the planning process. The importance of each criterion is ranked on a four point Likert-type scale ranging from not important to very important. The extent to which the criteria are met is assessed by a five point Likert scale ranging from strongly agree to strongly disagree with statements describing the planning process. Multiple statements are used for each criterion and the average of the responses to the multiple statements is calculated to assess the degree to which each criterion is met.

The case study used to test the methodology is the preparation of park master plans in British Columbia. The survey was sent to two types of participants in the park planning process. One group consisted of the provincial government park planners and the other group consisted of non-governmental stakeholders with an interest in protected area planning. Due to logistical constraints, respondents were limited to eleven park planners representing different regions of the province and one representative from each of the fifteen non-governmental stakeholder organizations that have a stated interest in provincial park planning. The response rates from park planners and non-governmental stakeholder organizations were 82% and 67%, respectively.

The results confirmed that all the criteria identified in the literature review were ranked as important to very important, except for one (independent facilitation) that was ranked as only somewhat important. The testing of the degree to which the criteria are met identified strengths and weaknesses in the planning process that need to be mitigated. An interesting finding in the study is the wide discrepancy between the ratings of the government planners and the non-governmental stakeholders, with the park planners providing more positive ratings for achievement of the best practices criteria. The researchers point out that while it is not surprising that the planners would rank the outcomes more highly, the discrepancy illustrates the need to include all relevant stakeholders in plan evaluation to counter the "internal evaluator" bias of the planners. While the researchers caution that the specific findings in the case study should be interpreted with caution because of the small sample size, they conclude that the case study application confirms the feasibility and utility of the plan evaluation methodology.

Conclusion

Effective environmental planning is contingent on comprehensive evaluation. Based on the case studies and evaluation theory we can identify eight keys to successful evaluation.

1. Evaluation should use a comprehensive set of evaluative criteria that include explicit policy goals and best practice standards that have been empirically verified.

2. The cases being evaluated should represent a large enough sample to provide reliable results.

3. Evaluation should include all components of the program including program theory, design, implementation, outcomes, and efficiency.

4. Evaluation should occur on an ongoing basis over multiple time periods at critical steps in the process.

5. If a summative evaluation is being done to assess whether the program should continue, the program should be evaluated against feasible alternatives.

6. Care should be taken in concluding causality between programs and outcomes.

7. Evaluation should incorporate the views of stakeholders in the assessment, as well as those of external evaluators.

8. Evaluation reports should clearly state the limitations of the evaluation.

In the real world of evaluation, meeting all eight criteria is extremely difficult due to resource constraints and methodological challenges. None of the case studies in this volume meet all criteria. The case studies show that identifying clear and measurable program outcomes, conducting multiple period evaluations over an extended time horizon, comparing programs against feasible options, and determining causality between programs and outcomes are particularly difficult criteria to meet. However, the case studies in this volume illustrate how useful evaluations can be conducted in the complex field of resource and environmental planning in the face of these resource and methodological constraints. In particular, the case studies illustrate the techniques and benefits of incorporating stakeholder views to verify best practices criteria and assess the degree to which best practices criteria are met. The case studies also illustrate the importance of using multiple evaluation criteria to assess program performance. Hopefully, the case studies in this volume will stimulate ongoing research in this important field of evaluation in environmental planning.

Acknowledgements

We would like to thank SSHRC for funding support for this research and the anonymous referees for their helpful suggestions.

References

Bellamy, J.A., G. T. McDonald, G.J. Syme, and J.E. Butterworth. 1999. Evaluating Integrated Resource Management. Society and Natural Resources 12: 337-353.

Day, J.C., Thomas I. Gunton, and T. Frame. 2003. Towards Rural Sustainability in British Columbia: The Role of Biodiversity Conservation and Other Factors. Environments 31(2): 21-39.

Clark, Alan and Ruth Dawson. 1999. Evaluation Research. Thousand Oaks, California: Sage Publications.

Esty, Daniel C., Marc A. Levy, Tanja Srebotnjak, Alexander de Sherbinin, Christine H. Kim, and Bridget Anderson. 2006. Pilot 2006 Environmental Performance Index. New Haven, Conn.: Yale Centre for Environmental Law and Policy.

Gunton, Thomas.I. 2003. Natural Resource Megaprojects and Regional Development: Pathologies in Project Planning. Regional Studies 37(95): 505-519.

Gunton, Thomas I., and J. C. Day 2003. The Theory and Practice of Collaborative Planning in Resource and Environmental Management. Environments 31(2): 5-19.

Gunton, Thomas I., and Chris Joseph.2006. Toward a National Sustainability Strategy for Canada: Putting Canada on the Path to Sustainability within a Generation. Vancouver: David Suzuki Foundation.

Gunton, Thomas I., Ken Calbick, Anita Bedo, Emily Chamberlin, Andrea Cullen, Krista Englund, Aaron Heidt, Matthew Justice, Gordon McGee, Sean Moore, Carolyn Pharand, Ian Ponsford, Jennifer Reilly and Ian Williamson. 2005. The Maple Leaf in the OECD: Comparing Progress Toward Sustainability. Vancouver: David Suzuki Foundation.

International Institute for Sustainable Development (IISD). 2004. National Strategies for Sustainable Development: Challenges, Approaches and Innovations in Strategic and Co-ordinated Action. Winnipeg: IISD.

Joseph, Chris, Thomas I. Gunton and J.C. Day. 2007. Planning Implementation: An Evaluation of the Strategic Land Use Planning Framework in British Columbia. Journal of Environmental Management (in press).

Leach, W.D., N. Pelkey and Paul Sabatier. 2002. Stakeholder Partnerships as Collaborative Policymaking: Evaluation Criteria Applied to Watershed Management in California and Washington. Journal of Policy Analysis and Management 21(4): 645-670.

National Academy of Public Administration (NAPA). 2001. Evaluating Environmental Progress: How EPA and the States can improve the Quality of Enforcement and Compliance Information. A Report by a Panel of the National Academy of Public Administration. [Accessed on 15 June, 2007].

Organization for Economic Cooperation and Development (OECD). 2004. OECD Environmental Performance Reviews: Canada. Paris: OECD.

Rossi, Peter, Mark Lipsey and Howard Freeman. 2004. Evaluation: A Systematic Approach. Thousand Oaks, California: Sage Publications.

Suvedi, Murari, and Shawn Morford. 2003. Conducting Program and Project Evaluations: A Primer for Natural Resource Program Managers in British Columbia. FORREX-Forest Research Extension Partnership. Kamloops, B.C. FORREX Series 6. [Accessed on 18 April, 2007].

Weiss, Carol. 1998. Evaluation Methods for Studying Programs and Policies. Second Edition. Upper Saddle River, New Jersey: Prentice-Hall.

Thomas Gunton is a professor in the School of Resource and Environmental Management and Director of the Resource and Environmental Planning Program at Simon Fraser University. He has held numerous senior positions in government including Assistant Deputy Minister of Energy and Mines for the government of Manitoba and Deputy Minister of Environment, Lands, and Parks for the government of British Columbia. His research focuses on environmental mediation and dispute resolution and resource and environmental planning. He can be contacted at tgunton@shaw.ca

Murray Rutherford is an Assistant Professor in the School of Resource and Environmental Management at Simon Fraser University. He is a policy scientist and planner whose research focuses on policy analysis and evaluation, ecosystem-based management, and human values and attitudes toward nature and the conservation of biological diversity. He can be contacted at mbr@sfu.ca

Peter Williams is a professor in the School of Resource and Environmental Management and Director of the University Centre for Tourism Policy and Research at Simon Fraser University. His research relates to the use of land and resources for sustainable tourism. He can be contacted at peterw@sfu.ca

Chad Day is professor emeritus and founding director of the School of Resource and Environmental Management at Simon Fraser University. His research focuses on institutions for integrated land and water management and environmental planning. He can be contacted at jday@sfu.ca Table 1. Evaluation Options Issue Option Who conducts evaluation 1. Internal

2. External Purpose of evaluation 1. Program Improvement (formative evaluation)

2. Program justification (summative evaluation)

3. Generic knowledge (theoretical evaluation)

4. Hidden agenda (ulterior evaluation) Timing 1. Before implementation

2. During program operation

3. After program completion Timing Scope 1. Single snap shot

2. Multiple period Content Scope 1. Program theory

2. Program design

3. Program implementation

4. Program outcomes

5. Program efficiency Methodology 1. Qualitative

2. Quantitative Evaluative Criteria 1. Program goals

2. Best practices

3. Social welfare (benefit-cost)

4. Efficiency (cost-effectiveness)

5. Time series trend

6. Cross sectional comparison Table 2. Case Study Categorization

Gunton Browne MacNab Descriptor Case Study et al. et al. et al. Who

Internal

External [check] [check] [check] Purpose

Program Improvement

Program justification

Generic knowledge [check] [check] [check]

Hidden agenda Timing

Before implementation

During operation [check] [check]

After program [check]

completion Timing

Scope

Single snap shot [check] [check] [check]

Multiple period Content

Scope

Program theory

Program design

Program [check] [check]

implementation

Program outcomes [check] [check] [check]

Program efficiency Methodology

Quantitative

Qualitative [check] [check] [check] Evaluative

criteria

Program goals [check] [check] [check]

Best practices [check] [check]

Social welfare

Time series trends

Cross sectional

Paridaen Ronmark Descriptor Case Study et al. et al. Who

Internal

External [check] [check] Purpose

Program Improvement

Program justification

Generic knowledge [check] [check]

Hidden agenda Timing

Before implementation

During operation

After program [check] [check]

completion Timing

Scope

Single snap shot [check] [check]

Multiple period Content

Scope

Program theory

Program design

Program [check] [check]

implementation

Program outcomes [check] [check]

Program efficiency Methodology

Quantitative

Qualitative [check] [check] Evaluative

criteria

Program goals [check] [check]

Best practices [check] [check]

Social welfare

Time series trends

Cross sectional


COPYRIGHT 2006 Wilfrid Laurier University Reproduced with permission of the copyright holder. Further reproduction or distribution is prohibited without permission.
Copyright 2006, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.
NOTE: All illustrations and photos have been removed from this article.



Copyright © Entrepreneur.com, Inc. All rights reserved. Privacy Policy