Chapter 12. A Primer to Biostatistics for Busy Clinicians
Michael Glick, D.M.D., and Barbara L. Greenberg, Ph.D., M.Sc.
In This Chapter:
Research Design and Clinical Interpretation
• Absolute Risk Reduction and Relative Risk Reduction
Hypothesis and Significance Testing
Probability and the Normal Curve
Introduction
Scientific literacy is about an understanding of appropriate use of statistics and statistical concepts, as well as recognition of incorrect use.1 In today’s world of rapidly communicated health information, among and by both health care professionals and the lay public, lack of understanding of statistical concepts is troubling and has even been equated to scientific illiteracy.2
Statistics has its jargon, a language that enables data to be translated into useful information and knowledge that can be communicated among health care professionals and between health care professionals and patients. Statistics also provides evidence that can inform patient care. It is important to realize that many statistical terms are the same as everyday words, but their connotation may be different. Successfully navigating the professional literature requires an understanding of basic statistical concepts. Although some of these concepts have previously been addressed in the oral health literature,3 this chapter will provide a primer on commonly used statistical concepts and relevant research study design issues.
Research Design and Clinical Interpretation
Applied epidemiologic and clinical research can broadly be divided into experimental research, in which exposure is assigned to a participant, and observational research, in which exposure is not assigned but is instead “observed” as being present or absent (Figure 12.1).4 If there is a comparison group in an observational study, it is characterized as analytical, and when no comparison group is included, as descriptive. The appropriate research design is a function of the question being asked and logistics. In some instances, it is constrained by available data and/or resources.
Exposure is a term used to describe a factor that is thought to be associated with or predictive of an outcome, such as a disease or a condition. For example, examining the association between sugar (the exposure) and the risk of developing caries (the outcome) may be the aim of a study.
Adapted from Grimes DA, Schulz KF. “An overview of clinical research: the lay of the land.” The Lancet 2002;359(9300):57-61
Experimental Trial
A randomized controlled trial (RCT) is considered the gold standard for answering questions of therapy (that is, determining the magnitude of the beneficial and harmful effects of health care interventions) and is the most rigorous study design. The hallmark of an RCT is the random allocation or assignment of study participants to treatment, intervention, or exposure groups. The main purpose of randomization (that is, randomly allocating trial participants) is to minimize selection bias on the part of the investigator (see Chapters 3 and 13). In addition, randomization increases comparability of the treatment groups for variables we can measure, as well as those we are not aware of or cannot measure, thus minimizing the impact of potential confounders. (A confounder is a factor that is associated with both the exposure and the outcome but does not lie in the causative pathway.) However, randomization does not ensure the study groups are indeed similar for all known confounders, and investigators should always assess comparability of the study groups at baseline for known relevant clinical and demographic characteristics (risk factors that are likely related to the exposure and outcome of interest). If the study groups are not comparable for all important risk factors that could affect the relationship of the exposure and the outcome, any observed association or difference could be due to a third factor, a confounder, that is linked to the exposure and the outcome. Another important design element of RCTs is blinding or masking. In this situation, study participants, clinicians, researchers, outcome adjudicators, and analysts can be unaware of which treatment group a particular patient has been assigned to. When both the investigators and the study participants are unaware of the group assignment, this is sometimes referred to as double blinding. Blinding is an important design strategy to reduce participant and investigator bias. A well-designed and implemented RCT can therefore minimize selection bias, information bias, and confounding (see Chapter 13). One of the advantages of an RCT is the certainty of the temporal relationship (which one comes first) between an exposure (for example, a treatment or an intervention) and an outcome. A potential concern with RCTs is the often-restrictive inclusion criteria for participant selection. RCT participant selection usually targets one specific condition among a select demographic who, other than the condition of interest, are considered healthy. Therefore, results from RCTs may sometimes be difficult to generalize or apply (external validity) to the total population from which the study participants were selected. The total population is likely to have many characteristics/risk factors or other conditions that have not been eliminated in the study population, so the study results may or may not be applicable to the total population from which the study participants were selected. For example, it would represent a threat to applicability when a study that looks at the success rate of immediately versus nonimmediately placed dental implants uses exclusion criteria for the study population that could affect the outcome (success rate) by excluding based on factors that are commonly found in the general population, such as smoking, systemic diseases, medications, periodontal disease, or excluding certain genders or age groups.
Observational Studies
Analytical observational studies include cohort studies where a group of individuals with and without the exposure of interest are followed prospectively (forward in time), case-control studies where individuals with or without the outcome of interest (cases and controls, respectively) are traced backward in time to determine possible exposure, and cross-sectional studies where exposure and outcome are measured at the same time (Figure 12.1). Unlike RCTs, the exposure in observational studies is not assigned but is observed in groups of interest as it happens naturally.
In case-control studies, researchers will observe an outcome and then retrospectively try to determine the presence of past exposure. In this study design, the cases are those with the outcome of interest and the controls are a comparable group without the outcome of interest—but, it is important to note, with the same characteristics as the cases. Although the selection and source of cases and appropriate controls are critical elements in case-control studies, it is beyond the scope of this chapter to discuss this concern. Using a similar example to the one above, cases of children with caries (those with the outcome) are compared with children without caries (those without the outcome) to determine if an exposure, such as consumption of SSBs, is associated with the presence of caries. Information about prevalence rates or incidence rates cannot be determined by a case-control study design as the cases and the controls are not measured from a population-based sample and there is no information on the temporal relationship between exposure and outcome.
A cross-sectional study will assess the presence or absence of an exposure and the presence or absence of an outcome at a particular time (that is, the prevalence of the exposure and the prevalence of the outcome at the same point in time). Researchers may determine, at one particular time, the presence of children with or without caries who drink or do not drink SSBs. As this is a snapshot in time, it is not possible to know if the consumption of the SSBs occurred prior to the development of caries (a temporal relationship), and accordingly, it is not possible to determine whether drinking SSBs is associated with the development of caries. Cross-sectional studies cannot be used to claim any causative relationships and are generally used to help guide development of research questions.
Case reports and case series are purely descriptive and may, in a similar manner to other observational studies, generate hypotheses about exposure and outcomes that need to be tested with more complex study designs of greater rigor. Descriptive studies can be used to monitor the health of populations but cannot be used to assess associations.
Measures of Association
Measures of association quantify the relationship (an analysis of comparison) between exposure(s) and outcome(s) among groups. There are several different measures of association, such as mean difference (MD), standardized mean difference (SMD), absolute risk (AR), relative risk (RR), odds ratio (OR), and hazard ratio (HR). Effect size quantifies a measure of association as the size of the difference between groups (for example, the MD in number of teeth between two groups) or an estimate of a treatment’s efficacy as a proportion of the reduction or increase in the outcome of interest in the intervention and control group (for example, the relative increase or decrease in developing caries after consuming or not consuming SSBs). An effect size can be standardized by dividing the measure of effect by the standard deviation (SD) of their difference (see below for a description of SD).
Mean Difference
The MD, or the “difference in means,” measures the absolute difference between the mean values in two study groups. It quantifies the average of the means by which the study intervention changes the outcome in the study/treatment/intervention group compared with the means of the control group. Because this estimate is created by subtracting the mean from one group from the mean of the other group, an MD of 0 indicates no difference between the experimental and control groups.
Standard Mean Difference
The SMD is a summary statistic often used in meta-analyses when the studies all assess the same outcome but measure it in different scales (for example, measuring pain with two different types of visual analog scales). In this situation, the results of the different studies must be standardized to a uniform scale before they can be combined and compared and the results summarized. The SMD quantifies the intervention effect in each study relative to the variability observed in the particular study. In meta-analyses, the SMD is calculated for each study in the meta-analysis and then pooled to get an overall SMD. An SMD of 0 indicates there is no difference among groups.
Absolute Risk
Understanding the difference between probability and odds is essential in order to be able to interpret AR, RR, and OR. A probability is the chance of an event occurring as a ratio of all events. (For example, the probability of getting a 4 when tossing a six-sided die is the ratio of the event occurring [tossing a 4] to all possible events [tossing a 1, 2, 3, 4, 5, or 6], which equals 1/6). A probability can be any number between 0 and 1.
The odds is the chance that a particular event occurs versus the chance that it does not occur, or the ratio of the number with the event to the number without the event. For example, the odds of tossing a 4 is the ratio of the chance (probability) of getting a 4 (1/6) to the probability of not getting a 4 (5/6) [ (1/6)/(5/6) ], which equals 1/5. In other words, it is the probability of an event occurring to the probability of that event not occurring. Odds can be any number between 0 and infinity.
As an example, we want to know the relationship between consuming SSBs and the development of caries. In a hypothetical study, one group of 1,000 children who are not consuming SSBs is followed for two years, and another group of 1,000 children, with the same risk factors for developing caries as the first group but who are consuming SSBs, is also followed for two years (Table 12.1). The AR is the number of children who develop caries in each group divided by the total number of children in the group during the designated study period (Table 12.1a). Using the data from Table 12.1, we can state that “not consuming SSBs is associated with an AR of developing caries of 15% (150 out of 1,000) at some point during two years” and “the AR of developing caries when consuming SSBs is 65% (650 out of 1,000) over a time span of two years.”
Absolute risk (AR) of developing caries when drinking SSBs (risk with exposure) |
AR of developing caries when not drinking SSBs (risk without exposure) |
Absolute risk reduction (ARR) (the risk reduction of developing caries when switching from drinking to not drinking SSBs) |
Relative Risk
The relative risk (RR), also known as the risk ratio, is the proportion of participants who developed the outcome in the cohort with the exposure as a ratio of the proportion of participants who developed the outcome in the cohort without the exposure (Table 12.1b). It can also be defined as the probability of an outcome occurring in a treatment, or intervention, group divided by the probability of an outcome occurring in a comparison, or control, group, or vice versa. In other words, the RR is the incidence of the outcome in the exposed group relative to the incidence of the outcome in the nonexposed group and provides a measure of the risk of developing disease if exposed. The RR is the measure of association for cohort studies and clinical trials (Table 12.2). Using the data and the formula in Table 12.1b, we can state, “There is an RR of developing caries of 4.33, over a period of two years, if consuming SSBs compared with not consuming SSBs,” or, “People consuming SSBs have 4.33 times the risk of developing caries compared with those not consuming SSBs, over a period of two years,” or conversely, “People not consuming SSBs have 0.23 times the risk of developing caries compared with those consuming SSBs, over a period of two years.” An RR of 1 suggests no difference in risks, an RR of more than 1 indicates an increased risk, and an RR of less than 1 indicates reduced risk.
Relative risk (RR), or risk ratio, of developing caries when drinking SSBs compared with developing caries when not drinking SSBs = |
Relative risk, or risk ratio, of developing caries when not drinking SSBs compared with developing caries when drinking SSBs = |
Relative risk reduction (RRR) if not drinking SSBs = |
Odds Ratio
Because case-control studies do not have a true denominator of “at risk” individuals and the temporal relationship of exposure to an outcome is not clearly established, case-control studies cannot use the RR as a measure of association and will instead use the odds ratio (OR) as a measure of association (Table 12.2).
The OR in a case-control study is the ratio of the odds of individuals in the disease group having the exposure divided by the odds of individuals in the comparison group having the exposure (Table 12.1c); in other words, it is the odds of having the exposure in the cases compared with the odds of having the exposure in the controls.
When warranted, odds can be converted to risks and subsequently to RR (Table 12.3). An OR approximates an RR when the prevalence of disease is low, typically below 10%.5 (This is illustrated in Table 12.3, where it is noticeable how low odds approximates the risk.) An RR is an inappropriate measure in a case-control study.
Cohort studies can also use an OR as the measure or association; in this case, the OR is the odds of experiencing the outcome or disease in the group exposed to a risk factor compared with the odds of experiencing the outcome or disease in the group not exposed to the same risk factor (Table 12.1.c). Results from RCTs are usually reported as an RR or as an OR. In RCTs, ORs are interpreted similarly to ORs in cohort studies (Table 12.2).
Case-Control Study |
Cohort Study |
|
Relative Risk |
Odds Ratio |
Experimental Studies |
||
Randomized Trial (for comparing treatments or interventions) |
The risk of developing disease among those who are exposed relative to the risk of developing disease among those who are not exposed; the ratio of the incidence of new disease among the exposed relative to the non-exposed. An RR > 1 suggests the exposure is a risk factor for developing the disease, an RR < 1 suggests an exposure is protective against developing the disease, and an RR = 1 suggests there is no association between the exposure and disease. |
The odds that an exposed person develops disease relative to the odds that a non-exposed person develops disease. An OR > 1 suggests the exposure is positively associated with the disease, an OR < 1 suggests the exposure is negatively associated with the disease, and an OR = 1 suggests no association between exposure and disease. |
Observational Studies |
||
Cohort Study |
The risk of developing disease among those who are exposed relative to the risk of developing disease among those who are not exposed; the ratio of the incidence of new disease among the exposed relative to the non-exposed. An RR > 1 suggests the exposure is a risk factor for developing the disease, an RR < 1 suggests an exposure is protective against developing the disease, and an RR = 1 suggests there is no association between the exposure and disease. |
The odds that an exposed person develops disease relative to the odds that a non-exposed person develops disease. An OR > 1 suggests the exposure is positively associated with the disease, an OR < 1 suggests the exposure is negatively associated with the disease, and an OR = 1 suggests no association between exposure and disease. |
Case-Control Study |
Cannot be calculated directly |
The odds of those with disease having been exposed relative to the odds of those without disease having been exposed. An OR > 1 suggests the exposure is positively associated with the exposure, an OR < 1 suggests the exposure is negatively associated with the disease, and an OR = 1 suggests no association. |
Cross-Sectional Study |
Cannot be calculated |
Cannot be calculated unless there is a comparison group; in that case, similar to interpretation for a case-control study. |
Case Report/Case Series |
Cannot be calculated |
Cannot be calculated. |
Absolute Risk Reduction and Relative Risk Reduction
Understanding the difference between AR and RR, and absolute risk reduction (ARR) and relative risk reduction (RRR), is important in order to make appropriate clinical decisions. In the example in Table 12.1, there is a different AR for having caries among children who do not consume SSBs compared with the AR for having caries among children who consume SSBs. The ARR is the difference of the AR in the test group and the control group. As seen from the data in Table 12.1a, not consuming SSBs is associated with an ARR of having caries of 0.50 (0.65 minus 0.15), or stated differently, “not consuming SSBs will reduce the AR for having caries from 650 in 1,000 (65%) to 150 in 1,000 (15%)” or “500 fewer cases of caries can be expected among 1,000 patients who do not consume SSBs compared with 1,000 patients who consume SSBs over a period of two years.”
Looking again at Table 12.1, there is a relationship between the two ARs that can be quantified with RR: the proportion, or relative change, between the AR for caries among children who do not consume SSBs and the AR among children who consume SSBs (Table 12.1b).
In the hypothetical study depicted in Table 12.1, the RRR is the risk reduction for developing caries associated with not consuming SSBs (Table 12.1b). As the RR of not consuming SSBs compared with consuming SSBs is 0.23, the RRR of not consuming sugar is 77% (1 minus 0.23), or a 77% reduction in the risk of developing caries in the group that is not consuming SSBs compared with the group that is consuming SSBs.
Hazard Ratio
The HR is another measure of association that deals with time-to-event data, also known as survival data. Hazard is the instantaneous event rate, which is expressed as the probability for an individual to have an event of interest at a particular time (assuming they are event-free up to that time). The HR quantifies risk as the ratio of hazards in the treatment group and the control group at a particular point in time. It is the hazard of developing an event in the intervention group relative to the hazard of developing the event in the control group at any particular time along the follow-up period. An HR of 1.0 means the event rates are the same in both groups; an HR of 2.0 means that, at any particular time during the study follow-up, twice as many patients in the treatment group are having an event proportionally to the control group. An HR of 0.5 means that, at any particular time, half as many patients in the treatment group are experiencing the event proportionally to the controls. In a hypothetical clinical study, the reported HR is 0.45, which means that patients in the treatment group at any point in time along the follow-up are 55% less likely to experience the event. Although the HR takes into account not only the total number of events but also the timing of each event (that is, the event rate), the RR measures the cumulative risk over the total time period of interest.
Hypothesis and Significance Testing
When researchers are trying to determine whether an association exists between two factors (for example, consuming or not consuming SSBs), or between patients’ characteristics (for example, patients with low education levels or high education levels), and the presence of an outcome, the ideal situation would be to recruit the whole population to whom the results would be applied to into the study. It is not difficult to understand that such an approach would have serious implementation issues, and a massive amount of resources and time would need to be allocated. As a way to solve this conundrum, researchers take a sample, a portion of the whole population, expecting that this sample will provide good representation of the individuals, factors, or characteristics under study. Extrapolating the study sample findings to the whole population is called inferential statistics.(“Population” is a term used in statistics to describe the entire “universe” of individuals from which researchers draw their study sample.) Inferential statistics differ from descriptive statistics, where collected data are only used to describe the study sample without making inferences to a population.
Users of the dental literature will find that there are two types of hypotheses: the null hypothesis (H0) and the research (or alternative) hypothesis (Ha). The null hypothesis states that there is no (that is, null) association between the predictor or exposure and outcome variable, or therefore there is no difference in the outcome between the study groups, and further, any observed difference is due to chance alone (Box 12.1 and Figure 12.2).
Box 12.1. Examples of Hypotheses
Null hypothesis (H0)
The incidence of caries in the group of children consuming sugar-sweetened beverages (SSBs) compared with those children not consuming SSBs is the same.
Research (or alternative) hypothesis (Ha)
1. The incidence of caries in the group of children consuming SSBs compared with those children not consuming SSBs is different.
2. The incidence of caries in the group of children consuming SSBs is higher compared with those children not consuming SSBs.
3. The incidence of caries in the group of children consuming SSBs is lower compared with those children not consuming SSBs.