CHAPTER 13 PROGRAM EVALUATION IN HEALTH CARE
Program evaluation is concerned with finding out how well programs work by using social and behavioral science research techniques to assess information of importance to program administrators and public policy makers. The fundamental purpose of program evaluation is to provide information for decision making. Ultimately, evaluation is a judgment of merit or worth about a particular person, place, or thing.
The term research refers to systematic inquiry that leads one to discover or revise knowledge about a particular subject. Basic research is generally focused on discovering facts, relationships, behaviors, and underlying principles. Applied research often deals with the same phenomena, but the focus is usually less on the discovery of basic knowledge and more on the development of tools or the application of knowledge to develop solutions to actual problems. Evaluation is an example of applied research. Administrators, educators, policy makers, and others face questions (problems) about designing, implementing, continuing, and improving social, educational, health, and other programs. Evaluators assess or evaluate those programs to discover or revise knowledge about them and the problems they were designed to address so that informed judgments can be made, modifications can be implemented, and solutions can be achieved.
As researchers, program evaluators engage in scientific inquiry. They use tests, questionnaires, and other measurement devices. They collect and analyze data systematically by using common statistical procedures. Finally, they typically describe their findings in formal reports.1
An important difference between basic research and evaluation research is the generality of the findings. Ideally, the basic scientist is searching for basic knowledge; the more basic or fundamental, the better. Fundamental facts and principles, such as Einstein’s theory of relativity, have broad applicability. They generalize across wide areas of knowledge. Most applied scientists–and program evaluators, in particular–are usually dealing with specific problems in specific settings. Their findings or conclusions can seldom be generalized to “similar” problems.
To elaborate on this distinction between the basic science researcher and the evaluator, consider the role each individual might play in the testing of a fluoride rinse. In examining the value of fluoride rinse, the basic science researcher would probably be concerned with the effects of fluoride on teeth, the strength of the solution necessary to produce a reduction in caries, and whether the conclusions could be generalized across the population. The evaluator would be more concerned with determining whether the actual mouth-rinse program, initiated to test the researcher’s conclusion, was run correctly and followed the objectives that it stated. The evaluator’s concern for the fluoride rinse as such is only superficial. Once the evaluator can judge whether the program is an accurate test of the fluoride rinse, the secondary results might then relate to the positive or negative effects of fluoride rinse. In other words, the particular program’s operation is of prime importance to the evaluator, and the effect of fluoride is important only in terms of its results as applied to a realistic, closely monitored program.
Determining the value of things is another difference between evaluation and basic research. Evaluation eventually comes down to making a decision about what should be done or which course of action is best. Basic researchers strive only to obtain accurate, truthful information. There is no requirement to attach assessments of merit to the discovered knowledge.1 Theoretically, the basic scientist’s task does not involve making value judgments. The evaluator walks a fine line when it comes to value judgements. By its nature, evaluation research is based in a value context–the ultimate question, after all, is whether or not the subject (program) being studied is “of value.” The evaluator must understand the value context within which he or she works. The best evaluation studies are those in which the evaluator is fully cognizant of this value context and is then able to “do objective science” that addresses critical questions.
Evaluation studies ultimately focus on the goals, objectives, or intent of the program or activity being studied. At the simplest level we ask, Does this program do what it was designed to do? There are, of course, many other facets to evaluation. One of the most useful frameworks for looking at the evaluation research task has been put forward by Donabedian.2 He suggests that assessment or evaluation can profitably look at structure, process, and outcome.
Structure refers to the program setting and logistics (i.e., facilities, equipment, financing, human resources). Process refers to the techniques or methods employed in the provision of program services (i.e., delivering health care, educating children). Outcome refers to the “real world” impacts, effects, and changes brought about as a result of the program being evaluated.
Donabedian rightly sees structure, process, and outcomes as inextricably linked: the interrelationships are critical to the program’s ability to meet its goals or fulfill its intent. Examining structure, process, and outcomes allows the evaluator to identify more clearly where problems and program liabilities lie and, hence, where corrections can be made if goals are to be met. Looking at goals, structure, process, and outcomes should be the primary focus for the evaluator. A second set of concerns also exists, however. These questions might be classified as “client” questions; that is, for whom and why is the evaluation research being conducted? This is not a trivial question. The researcher must understand, for example, the hierarchy of authority in the organization involved, what their interests and objectives are in requesting an evaluation, and what sorts of questions need to be asked. Often, one of the evaluator’s biggest contributions lies in his or her ability to help administrators clarify their thinking about the need for and use of evaluation research.
By way of illustration, consider a situation in which a dental school implements a new curriculum for its students. An evaluator who is brought in designs and carries out a carefully planned study to determine if the program has the resources it needs (structure), how well the program is running (process), and how successful the graduates are (outcomes). Such an evaluation is appropriate if the client’s interest is to determine if the curriculum is functioning properly and meeting its goals. The design would not be appropriate, however, if the client wanted to know if the graduates of the new curriculum were better-trained professionals than those of the old curriculum. The evaluator must understand the client’s focus. Without such an understanding, valuable time and resources may be wasted without answering the fundamental questions and the client’s needs.
Individuals interested in the results of evaluation may include program developers, program staff, program directors, policy makers (state or federal bureaucrats), program directors in other similar agencies, or epidemiologists.3 Different groups of people have different needs and thus seek different information. Program developers seek information about ways to improve specific parts of programs that affect them directly. The director of the program is usually interested in knowing the overall effectiveness of the basic program, although he or she is generally more concerned with finding out what specific modifications will be needed to improve the organization and operation of the program. Financial issues are usually of concern to policy makers, who question whether a program should be continued as is, given more resources, or canceled. Costs and benefits are of paramount concern to them. Staff from other programs are interested in whether the program can be generalized for possible adaptation or adoption. Epidemiologists may seek to compare the effect of different program principles and generalize about the factors responsible for success.
Clearly, the evaluator faces a number of potentially competing interests. In responding to those interests the researcher must distinguish between different types of evaluation. As we have seen, Donabedian’s framework allows us to focus on the critical features or components that make up a program. These factors must be taken into account if evaluation efforts are to be successful and useful. At the same time, Scriven4 draws our attention to the fact that evaluation research may be one of two types. He uses the terms formative and summative to describe these types.
Formative evaluation refers to the internal evaluation of a program. It is an examination of the processes or activities of a program as they are taking place. It is usually carried out to aid in the development of a program in its early phases.
The following situation is one in which a formative evaluation is appropriate: a fluoride rinse program is initiated at a neighborhood health center in which paraprofessionals are trained to administer three types of fluoride rinses under a strict sequence of procedures. After 3 days of operation, the work of the paraprofessionals is observed to determine the extent of adherence to a strict sequence of procedures. The observation and determination of correct or incorrect procedure sequence provide an example of examining the activities of a program as they are occurring (formative evaluation). If the sequence is incorrect, formative evaluation allows the program to make remedial changes at that point and thereby improve performance. Such a strategy is much better than waiting until the program is completed and then announcing that there were procedural errors. Formative evaluation is used primarily by program developers and program staff members concerned with whether various components of a program are workable or whether changes should be made to improve program activities.
Summative evaluation, by contrast, judges the merit or worth of a program after it has been in operation. It is an attempt to determine whether a fully operational program is meeting the goals for which it was developed. Summative evaluation is aimed at program decision makers, who will decide whether to continue or terminate a program, and also at decision makers from other programs who might be considering adoption of the program.
Different evaluation designs are needed to carry out these two types of evaluation. Different types of measures and time schedules also are required. Because most programs are ongoing, with changes often being made “on the fly,” a discernible end point or completion date may not exist. In such cases the dichotomy between formative and summative evaluation may not be as precise as described here, and formative evaluation may continue to be important as the program develops and matures.
Most health programs can be divided into four phases of implementation, which should occur in sequence: (1) the pilot phase, the development of which proceeds on a trial-and-error basis; (2) the controlled phase, in which a model of a particular program strategy is run under regulated conditions to judge its effectiveness; (3) the actualization phase, in which a model of the program strategy is subjected to realistic operating conditions; and (4) the operational phase, in which the program is an ongoing part of the structure. Often this ideal progression from phase 1 to phase 4 does not occur, and a program becomes lodged at one state of development. Each phase has different objectives to be met and thus different evaluation designs by which to best assess achievement of program objectives. Formative evaluation plays an important part in both the pilot phase and the controlled phase of program implementation. Summative and formative evaluations are used during the actualization phase, whereas the final operational phase is evaluated with a summative evaluation design.5
One generalization that can be made of health program evaluation is that it is primarily concerned with how well a program is meeting its goals, either at some formative stage (so that the information can be fed back into the program) or at the end. The first step in evaluation, then, is to discover what the program goals are and to then restate them as clear, specific objectives written in measurable terms.
This first step is often a formidable task. Many program directors and staff members develop only general goals expressed as vague abstractions. They find it difficult to translate them into concrete specifications of the changes in behavior, attitude, knowledge, or health outcome that they hope to effect. In addition, programs often have multiple goals. Some are more important than others, some are more immediate (as opposed to long range), some are easier to study, and some may be incompatible with others. Yet all program directors and staff members must establish a sense of goal priorities if they, or external evaluators, are to assess the operation of their program. In many instances directors and staff members are unable to sort out goals, objectives, and priorities clearly, and they find it useful to bring in outside evaluators or administrative consultants to assist in this process.
Because goal statements are so often ambiguous and poorly stated, many observers have been led to speculate about the underlying reasons for this state of affairs. One view is that it usually requires support from diverse groups and individuals to get a program accepted. Program goals must be formulated in ways that satisfy the diversity of interests represented. Another speculation is that program planners lack experience with expressing their thoughts in measurable terms and concentrate mainly on the specifics of program operation. In one sense ambiguous goal statements serve a useful function: they hide differences among diverse groups by allowing for a variety of interpretations. However, such differences between groups and staff or within the staff can be disruptive when the program is implemented. Once a program has been initiated, if there is lack of true consensus as to what the program is specifically attempting to achieve, progress is difficult. Each staff member may be pulling in a different direction and trying to implement a different interpretation of the goal. As an outside agent or more objective observer, the evaluation study director can make a substantial contribution to program planning and administration in formulating goals, clarifying priorities, and reconciling divergent viewpoints related to program direction.
Ultimately, of course, evaluation attempts to measure the outcomes of a particular program. If a program’s goals cannot be operationalized (stated in a precise, measurable manner), it becomes nearly impossible to determine whether the desired outcomes of a program have been achieved. In other words, without clearly stated goals and objectives, evaluation becomes an imprecise tool of questionable usefulness.
One common difficulty in specifying desired objectives is that objectives are often long range in nature, making it extremely difficult to measure success in meeting them. In the interim, evaluation is conducted by relying on surrogate measures of attitudes, knowledge, skills, or behaviors that presumably are related to the ultimate objectives.
Often, it is not until an evaluation study is started that the depth of the problem is discovered. That is, the program was implemented on the basis of important but nonetheless vaguely expressed goals that cannot be addressed effectively until they are reworked, a process that may involve administrators, boards of directors, advocacy organizations, and others. In some cases, programs may be designed to produce certain intermediate changes on the assumption that they are necessary for the attainment of ultimate goals. Probably the best that evaluation can do in such a situation is to discover whether intermediate goals are being met. Only after the more global “goals” are clearly identified and articulated can one begin the larger and more intensified research effort needed to determine the relation between these goals and desired final outcomes.
To evaluate the effectiveness of health programs, specific measurement instruments must be set up for systematic collection of data on the attainment of each program objective and program goal. These procedures follow accepted principles of biostatistical and research design, which are discussed in Chapters 14 and 15.