Introduction
During the decision-making process, physicians rely on heuristics that consist of simple, useful procedures for solving problems, intuitive shortcuts that produce reliable decisions based on limited information. In clinical situations characterized by a high degree of uncertainty such as those encountered in orthodontics, cognitive biases and judgment errors related to heuristics are not uncommon. This study aimed at promoting trust in the effective interface between the intuitive reasoning of the orthodontic practitioner and the computational heuristics emerging from simple statistical models.
Methods
We propose an integrative model based on the interaction between clinical reasoning and 2 computational tools, cluster analysis and fast-and-frugal trees, to extract a structured craniofacial representation of untreated subjects with Class III malocclusion and to forecast the worsening of the malocclusion over time.
Results
Cluster analysis of cephalometric values from 144 growing subjects with Class III malocclusion followed longitudinally (T1: mean age, 10.2 ± 1.9 years; T2: mean age, 13.8 ± 2.7 years) produced 3 morphologic subgroups with predominant sagittal, vertical, and slight maxillomandibular imbalances. Fast-and-frugal trees applied to different subgroups extracted heuristics that improved the prediction of key features associated with adverse craniofacial growth.
Conclusions
Provided that cephalometric values are placed in the appropriate framework, the matching between simple and fast computational approaches and clinical reasoning could help the intuitive logic, perception, and cognitive inferences of orthodontic practitioners on the outcome of patients affected by Class III disharmony, decreasing errors associated with flawed judgments and improving the accuracy of decision making.
Highlights
- •
Heuristics related to Class III prognostic judgments is context-dependent.
- •
The orthodontic practitioner could resort to multiple adaptive decision heuristics.
- •
Fast and Frugal Trees improve accuracy of growth forecasting in Class III subjects.
The cognitive processes that underlie medical reasoning are complex; such processes can be affected by some amount of “irreducible uncertainty” that easily results in failure. Rational medical thinking states that doctors must collect, weigh, and summarize all relevant information to arrive at a conclusive clinical judgment. Physicians do so automatically by integrating clinical perceptions within the context of their medical knowledge base, involving the recognition of patterns and schemes that they have previously seen. Practitioners must make decisions in an environment of uncertainty and under the constraint of the limited time available. Emergency physicians generate 75% of diagnostic hypotheses in the first 5 minutes of the clinical encounter, in a domain with over 10,000 known medical diagnoses, and over 40 cognitive biases that may impact the diagnostic outcome.
We conceive medical intelligence as an activity guided by the laws of logic, with each deviation seen as bias. However, much diagnostic reasoning is unconscious and is based on processes unrelated to logic. The mind economizes its efforts by relying on heuristics, which are logical unconscious shortcuts or rules that enable a person to discover or learn something on their own. Such unconscious intuitions overcome rational and statistical approaches with sufficient strength to induce us to act. , These mentally-inductive shortcuts use limited knowledge to make fast inferences when there is no time to integrate a wide variety of clinical information.
Heuristics avoid the intractable sequence of surveying all conceivable diagnostic alternative, probabilities, and utilities for possible outcomes associated with each alternative. Counterintuitively, these judgment shortcuts often perform well in a real medical situation, regardless of their adherence to formal statistical inference methods such as multiple regression, neural networks, and Bayes’ theorem. However, these useful cognitive shortcuts sometimes lead to severe and even disastrous errors.
The orthodontist is forced to handle clinical and cephalometric information under the pressure of time, heavy workload, and insufficient knowledge. The clinician is surrounded by admonitions that dictate “treat the patient, not the numbers,” “achieve diagnostic and prognostic certainties,” as well as “operate early to avoid greater risk of skeletal imbalance in the future.” Nevertheless, scientific data cannot be expected to guide most orthodontic decisions directly.
The sample sizes available from orthodontic epidemiologic studies frequently are not large enough to specify how even a few patient factors (ie, age, sex, comorbidity, level of compliance) alter the benefits of a treatment. Questions about optimal treatment timing or the precise severity threshold at which to start treatment in a given patient are defined incompletely by these studies. In this incompletely formalizable domain, the goal of making optimal orthodontic treatment choices is not always attainable. Consequently, usual orthodontic choices are intrinsically heuristic. In their landmark article, Hicks and Kluemper emphasized the need to further research to deepen the relevance of heuristics in the daily orthodontic practice, both in conditioning useful insights and in avoiding severely erroneous clinical evaluations.
Over the last 20 years, a family of computational tools has been developed to reduce errors associated with failure in perception and failed heuristics (“cognitive de-biasing”). , , Fast-and-frugal trees (FFTs) are simplifications of classical statistical tools that can detect and eliminate irrelevant attributes from data. FFTs have been shown to make better predictions than more complex tools, namely in domains with limited samples, when the progression of a system is highly uncertain and when there is a dependency between features. ,
FFTs have a very simple and transparent structure that does not need to search for all available information to reach a decision, partitioning the data into increasingly precise and specific subsegments. FFTs establish a ranking of characteristics and choose the most significant; for each one, they indicate a decision-making choice and a way to proceed deeper. There is a point at which more information or computation becomes detrimental; a search is stopped after finding the first characteristic that enables an inference to be made. The application of these procedures to the biomedical domain allows the operators to deepen the diagnostic process, considering only 1 or a few characteristics of the patient (“take the best approach”). , Because FFTs use information in sequential order, they guide health care providers in gathering only relevant information and ignoring the rest.
In this study, we propose the use of a sequence of descriptive computational models applied to cephalometric data of 144 growing untreated subjects affected by Class III malocclusion, to investigate whether and when intuitive heuristics commonly used in decision making by the practicing orthodontist could be supported by computational heuristic strategies in improving the accuracy of prognostic judgments.
Material and methods
The sample consisted of semi-longitudinal cephalometric data of white subjects with untreated Class III malocclusion. It derived in part (41 subjects: 17 males, 24 females) from a database of untreated subjects with Class III malocclusion described by Levin et al and in part from a database reported by Alexander et al (104 subjects: 48 males, 56 females), for a total of 145 subjects. These patients were left untreated because they refused treatment, or because they derived from historical samples taken from growth center studies conducted in the United States.
Subjects originally were selected by orthodontists in the United States and Canada in their private practice, or from university-affiliated orthodontic clinics, from growth center studies (including the Bolton-Brush Growth Study, Burlington Growth Center, University of Michigan Elementary and Secondary School Growth Study, and Denver Child Growth Study), from the Orthodontic Clinic of the University of Florence and the Orthodontic Clinic of the University of Michigan.
Each subject had at least 2 cephalometric records related to 2 different ages at least one-year apart. Cephalometric magnification varied originally from 0% to 12.9% and was then standardized to 0%. The sample that met the inclusion criteria ( Table I ) consisted of 145 white patients with untreated Class III malocclusion. Of these subjects, 1 showed incorrect cephalometric data and was eliminated.
Criteria |
---|
1. European or American ancestry (white ethnicity) |
2. No orthodontic or orthopedic treatment before the first cephalometric record, or between the records, has been performed |
3. Initial diagnosis (T1) of:
|
4. Class III skeletal relationship having 1 or both:
|
5. No congenitally missing or extracted teeth |
6. No craniofacial syndromes |
7. Not less than 9 months and not more than 30 months between consecutive cephalometric films |
The final sample consisted of 144 subjects (65 males, 79 females). The sample age ranged from 4.0 years to 19.7 years for the cephalograms at T1 and from 5.7 years to 21.7 years for the cephalograms at T2. For each subject, a lateral cephalometric record at T1 (time of the patient’s first observation: mean, 10.0 ± 3.7 years) and at T2 (patient’s last observation: mean, 13.8 ± 2.7 years) was available. The method error for the cephalometric measurements is reported in Alexander et al and Levin et al.
Cephalometric analysis
The cephalometric analysis was comprised the following 8 variables: SNA, SNB, Wits appraisal, Ar-Go-Me, palatal plane to S-N (PP-SN), palatal plane to mandibular plane (PP-MP), the difference between GoGn and SN, and the difference between NMe and CoGo. We adopted the sagittal skeletal imbalance (SSI) method to describe the SSI during the growth process. The SSI method was defined as the linear difference between Co-Gn (effective mandibular length) and Co-A (effective midfacial length), calculated both at T1 and T2 for all 144 subjects. , To evaluate the progression of imbalance, the individual SSI of each subject with Class III malocclusion was compared with the SSI standard values in a normal population (matched for age and gender) derived from the cephalometric atlas by Bhatia and Leighton. “Good growers” were defined those subjects who approached normal values during the growth process (T1-T2 change), whereas “bad growers” were defined as those subjects who showed an increase in the difference with respect to normal values during T1-T2 change. Of the 144 subjects, 39 (27%) were found to be “good growers” and 105 (72%) “bad growers.”
The sample from the Bhatia and Leighton comprised British subjects of white origin observed between 1952 and 1993. About one-third of our sample was derived from the growth center studies that were collected in North America during about the same period. In our sample, there also were subjects with Class III malocclusion of North American (n = 70) and Italian origin (n = 28) collected between 1970 and 2000. Therefore, the 2 samples were comparable.
This study was exempted from review by the Medical School Institutional Review Board of the University of Michigan (HUM00160284).
Statistical analysis
Clusters
Clustering is the process of partitioning a finite collection of n elements into class types so that items in the same class are as similar as possible through the minimization of intraclass variance. Cluster analysis is a form of unsupervised learning (no information on the class variable is assumed). The cluster analysis was applied to the cephalometric data of the 144 subjects by using the Konstanz Information Miner software (KNIME Desktop version 2.7.4.). ,
The following 5 variables provided the best phenotyping grouping of subjects: Co-A, Co-Gn, Ar-Go-Me, PP-MP, and overjet. Cluster analysis defined 3 morphologic clusters: subjects with horizontal maxillomandibular imbalance (hypermandibular [HM]), subjects with increased divergence (hyperdivergent [HD]), and subjects with intermediate characteristics that were balanced between the first 2 (balanced [Bal]).
FFTs
A tree is a combination of mathematical and computational techniques that can aid in the description, categorization, and generalization of a given set of data. A tree is a useful tool for identifying the variables that are likely to be important for the prediction of particular outcomes. Trees can be used to help identify a strategy that most likely will reach a goal. , FFTs models are decision strategies that aim for a simple but satisfactory solution in an uncertain prediction when the assumptions about the model are unknown. , In an uncertain dataset, when patient samples are small, features are abundant, and predictability is moderate. An ideal statistical tool must find ways to extract the important features of the domain and ignore the rest. FFTs are based on statistically optimal splitting of the patients into pairs of smaller subgroups. Splits produce maximum separation among 2 subgroups and a minimum variability within these groups. FFTs allow assigning different importance to different predictor variables by ordering them sequentially; they can make inferences about future events, such as medical diagnosis and prognosis.
FFTs choose the best option for the aspect that is regarded as most important. At each repartition (dividing into subgroups), there are 2 branches under each node of the tree: exit node, and therefore a classification (“leaf node”) and another value that leads to the consultation of the next feature. In this study, FFTs were applied to find a search rule to predict prognosis (bad and/or good growth) for new patients not yet classified. A part of the data (training set) was used for learning, another part (test set) to determine the accuracy of the model (cross-validation). By repeatedly analyzing the data set, FFTs learn how things have gone in the past (learning set), and, on the basis of the learned rule, they make predictions on new, unseen subjects (test set). The choice of a stopping criterion depends on the problem at hand; the success of FFTs seems to be due to the characteristics that they ignore.
This strategy has been studied and applied in disparate domains, including weather forecasting, finance, psychology, law, ecology, outcome of sport competitions, and political elections. Compared with the ideal sequential sampling models such as multiple regression and Bayesian tools, FFTs exhibit high robustness and well-adapted capacity to determine the structure of the system and to avoid overfitting. In statistics and informatics, an overfitted model is an analysis that corresponds too closely to a particular set of data; this dependency determines a higher error rate to predict the reliability of future observations. When overfitting occurs, the model has difficulty generalizing (ie, to transfer the rule learned from patients already known to new patients). The model recognizes the characteristics that are specific only to the training set, but that are not present in the rest of cases.
Trees were performed using FFTrees R package (https://github.com/ndphillips/FFTrees).
The accuracy of the forecast is given by the ratio between the number of correct predictions and the number of possible predictions. Algorithms exhibited the following 5 measures of decision accuracy:
- 1.
Sensitivity: the probability of correctly identifying a true positive case.
- 2.
Specificity: the probability of correctly identifying a true negative case.
- 3.
Accuracy: the probability of correctly identifying any case.
- 4.
Weighted accuracy: weighted average of sensitivity and specificity dictated by a sensitivity weighting parameter.
- 5.
Balanced accuracy: the average of sensitivity and specificity (with weighted parameter = 0.5)
Results
Table II shows the repartition of subjects with Class III malocclusion (“bad growers” and/or “good growers”) after cluster analysis. In comparison with HD and Bal clusters, HM subjects showed the highest probability of dentoskeletal imbalance worsening during the growth process, 89% compared with 64% and 68% in HD and Bal subjects, respectively.