Statistical testing against baseline

There is abundant evidence that biomedical research can be compromised by both methodologic and reporting limitations, which may lead to flawed inferences and interpretations of the findings of research studies. Statistical analyses are often inappropriately used; common errors are inadequate accounting of clustering effects, erroneous choices of statistical methods with overreliance on P values, and scarce reporting of confidence intervals.

Another significant statistical problem that has not received much attention until recently is statistical testing against baseline—or, in other words, within-group comparisons against baseline. It has been mainly identified in observational research; however, clinical trials also suffer from this misconduct, although there is currently limited diffusion of the extent of the problem.

Problems associated with statistical testing, within-group and against baseline, can be summarized as confounding of the outcome due to natural improvement over time or temporal changes, false-positive (type I) errors, or regression toward the mean. This may lead at first to claiming statistical significance between the groups under comparison, based on 1 within-group comparison for statistical significance. On a second stage, this may lead to inappropriate interpretation and finally to inappropriate implementation of the findings.

As an example in orthodontics, the aim of a hypothetical clinical trial is to compare the effectiveness of headgear and activator for overjet correction over time (in a 12-month interval) in young Class II Division 1 patients. It would be inappropriate to conduct statistical testing within each group separately to identify changes from baseline until 12 months. Both appliances are likely to reduce overjet, which could also be reduced by natural growth over time. The correct approach to answer this research question would be to conduct a statistical test between the 2 groups and to compare the treatment effects from both groups.

A recent publication has examined this methodologic flaw in dentistry with a special focus in leading impact-factor dental journals from a number of specialties, such as orthodontics, maxillofacial surgery, pediatric dentistry, periodontology, and endodontics. The findings suggest that the authors of nearly a quarter of the studies (about 23%) based the interpretations of their outcomes merely on within-group comparisons against baseline, while the majority of these authors analyzed their data only through testing against baseline. The orthodontic journal assessed in particular followed the general trend and showed the use of this misconduct in 22% of its studies.

Statistical testing against baseline in isolation is yet an important methodologic limitation to be considered in both medical and dental research. This may have implications for placing individual studies’ outcomes into the appropriate context and interpreting treatment effects in practice. Study design and reporting guidelines should incorporate this type of error when planning guidance on statistical analysis, and researchers, reviewers, and editors are advised to show increased awareness of the potential implications of the problem in an attempt to ensure establishment of correct and informative inferences.

