“….. the burden of proof lies with those who advise the introduction of a new technique, or the adaptation or modification of an older method .” N. Robertson
The opportunity to discuss the relative benefits and burdens of nasoalveolar molding (NAM) as part of the overarching treatment protocol for infants born with complete clefts of the lip and palate is timely. Rather than approaching this as an adversarial “either/or” debate, we view this as a chance to challenge one another to critically examine the evidence currently available to make sound decisions based on agreed standards regarding the effectiveness of the various treatment options available. With adequate evidence, most debates evaporate in favor of facts derived from studies designed to achieve the highest strength of evidence.
The philosopher Bertrand Russell (1972-1970) articulated: “The most savage controversies are those about matters as to which there is no good evidence either way.” Since we are still hotly debating the outcomes to be expected from various approaches, including NAM, it shows that all “evidence” is not created equal. The explanation for this is the false assumption that findings from retrospective, unblinded, uncontrolled, nonrandomized, single-center studies with small sample sizes and inadequate follow-up constitute true evidence. To the contrary, the hierarchy of the strength of evidence clearly emphasizes the relative weakness of evidence from this type of research, putting it just slightly ahead of anecdotal case reports and case series. David Sackett, arguably the “father” of evidence-based practice, referred to such evidence as “subexperimental.” The likelihood of uncontrolled biases is a major confounder that undermines the reliability and comparability of these investigations. This was emphasized by the World Health Organization in a 2002 report, “Global Strategies to Reduce the Healthcare Burden of Craniofacial Anomalies.” It states that “Differences arising from the [many sources of] biases are likely to exceed actual differences attributable to the procedures,” and “Bias favoring innovative procedures is a major cause for concern with historical control studies.”
Control of bias and strength of evidence are directly related. As the ability to control bias increases, so does the strength of evidence, with randomized controlled clinical trials (RCTs) providing the best opportunity to determine the effectiveness of various treatment options. In many instances when trials are possible, the claims of treatment effectiveness from prior uncontrolled studies have been shown to be significantly overinflated, resulting in treatment protocols that have often gotten ahead of scientific support. Examples of this in medicine are numerous. In cleft palate treatment, the use of primary bone grafting, first heralded as an important breakthrough in retrospective studies and case series reports, was later shown in 2 RCTs to be harmful to facial growth. Similarly, the initial enthusiasm for infant presurgical orthopedics based on case series reports and uncontrolled, nonrandomized retrospective studies was shown to be unwarranted when evaluated in an RCT. Understandably, however, the expense, time considerations, sample size limitations, and ethical concerns inherent in conducting RCTs make it impossible to rely on them alone to provide the evidence necessary to sort out the relative effectiveness of the bewildering array of protocols and procedures currently espoused for clefts and craniofacial anomalies.
Fortunately, another approach to comparative effectiveness research can be used. To be valid with maximum control of bias, these intercenter comparisons require prospectively planned standardized records with samples of patients consecutively treated and strict inclusion criteria ensuring equivalency across centers, blinded evaluation of outcomes by panels of judges using outcome measures proven to be valid and reliable, and robust and appropriate statistical analyses. Although these studies cannot identify the role of unique individual features as can be done with RCTs, they can identify features and total protocols of interest from outcomes that are more favorable, as favorable, or less favorable than those obtained by other collaborating centers.
In 2006, the American Cleft Palate-Craniofacial Association created, and has since supported, the Americleft Project. Americleft’s goal is not to advocate or discredit treatment protocols currently in use but, rather, to blindly and without bias compare the outcomes of those protocols relative to one another so that we as clinicians can make decisions based on the best evidence available short of clinical trials.
Clearly, NAM is an innovative procedure, and much credit goes to Dr Grayson and his colleagues, who have been relentless pioneers. NAM is undoubtedly the next and most important question to be answered in cleft care in terms of craniofacial growth and development, nasolabial esthetics, and the burden of care. So, we ask ourselves, “Does NAM work?” and if so, “Shouldn’t everyone agree to do it in the best interest of the infant with a cleft?” We owe it to patients and families to answer these questions with our best attempts at scientific certainty. Undoubtedly, hundreds of parents of children with clefts treated with NAM would answer in the affirmative. There are, as our colleagues state, dozens of publications on the value of NAM, but at what level of science?
In a recently published systematic review of NAM, van der Heijden et al concluded that there is limited evidence for the effect of NAM on nasal symmetry in unilateral cleft patients. Although they cited a trend toward a positive effect, they noted conflicting conclusions based on “subexperimental” studies, representing only level 3 evidence, which is basically retrospective observational studies with controls. A meta-analysis could not be performed because of the heterogeneity of the studies and the inadequate data reporting. Short of RCTs, which are the gold standard of clinical evidence, would this not seem to support the need for more immediate well-controlled intercenter comparisons?
Our colleagues have given a thorough historical review of presurgical infant orthopedics and cited the Dutchcleft randomized controlled trial as not applicable to NAM because NAM was not included in that trial. We certainly agree on that point. However, it is precisely that type and level of scientific inquiry that lead to data that can then positively impact decision making in cleft treatment protocols with certainty. We cited the Dutchcleft study as an example of the difference between conclusions based on level 1 or 2 evidence as opposed to the more “subexperimental” level 3 studies available on NAM to date.
Unfortunately, after over 20 years of NAM, we are not there yet. Clearly, there may be value to NAM if it could significantly reduce the number of secondary surgeries while producing an equal or a better nasolabial result. From now until we have RCTs on NAM outcomes, we also agree on our need to rely on the next-best evidence. However, we propose that the next-best evidence does not come from the retrospective unblinded, single-center studies cited but, rather, through direct intercenter outcome comparisons by using strict inclusion criteria and established methodologies endorsed by a national organization committed to identifying best practices in cleft care.
To date, the Americleft Project has benefitted from the participation of 2 centers using NAM as part of their infant management protocols. In those studies of nasolabial outcomes of protocols using NAM, we have found the benefits of NAM to be equivocal, with no significant differences compared with infant orthopedics only and 2 other protocols that included secondary lip/nose revisions. In their review of the literature, our colleagues made no mention of the potential benefit of blinded intercenter comparisons such as these as a possible resource for all of us in pursuit of the next-best evidence. Long et al and Long and Deacon recently addressed monitoring and improving outcomes through multicenter collaborations.
Of the references cited by our colleagues, the 2 studies given special attention in support of NAM demonstrate the “subexperiments” of Sackett et al. As far as can be discerned, the article by Barillas et al does not have statistical power (as outlined by the World Health Organization for unilateral cleft studies) in terms of sample size and does not document the initial equivalency and the consecutiveness of their samples. The study by Garfinkle et al, while reporting a large (77 subjects) consecutive sample of bilateral cleft lip and palate patients with an admirable attempt at long-term follow-up (12.5 years), in fact reports on the results of only 9 of the 77 patients at the longest follow-up; this clearly undermines the claim of consecutively treated patients. It is also confusing because with 77 patients consecutively treated with NAM, only 37 were seen at their initial presentation and only 34 were seen before lip repair, presumably after the NAM treatment. It is also noteworthy that the authors refuted the idea of comparison with the results from other non-NAM centers because “most of the children being treated with other protocols have had additional nasal reconstruction by age 12.” In fact, since surgical lip/nose revisions are the current treatment alternative to NAM, and NAM enthusiasts base their choice of NAM on eliminating the need for these additional surgeries, it is exactly this type of treatment outcome comparison that should lead us in the direction of intercenter collaborative outcome studies as the next-best evidence. These studies are specifically designed to look at outcomes in the context of the different total treatment protocols.
An additional benefit of intercenter comparisons with standardized records, a common rating method and “yardstick” for calibration, is the possibility of conducting cross-investigation comparisons with other studies that have followed the same research methodology. Although we agree that the possible benefits for and primary objectives of NAM are most likely not to be found in improvement of dental-arch relationship outcomes, the following represents the potential power of multicenter comparisons. In a 2013 presentation to the 12th International Congress on Cleft Lip/Palate and Related Craniofacial Anomalies, we reported on the Goslon yardstick ratings of dental-arch relationships on 465 patients with complete unilateral cleft lip and palate in the mixed dentition from 6 European and 10 North American centers carried out in 6 studies. The mean Goslon scores were reported ( Table ). Three of the top 4 centers with the best dental-arch relationship outcomes had infant treatment protocols that involved primary lip and palate surgery only, with no presurgical infant orthopedics or NAM. Clearly, additional procedures were not necessary to create a more favorable outcome. Although this might not to be true with nasolabial appearance, until we agree to pursue higher levels of evidence, we will remain mired in unanswerable controversies such as this.