Intrarater agreement about the etiology of Class II malocclusion and treatment approach


The management of patients with Class II malocclusion has been an ongoing discussion in orthodontics. The aim of this study was to determine whether orthodontists agree among themselves and with each other about the etiology, timing, and difficulty of treating subjects with Class II malocclusion.


The initial records of 159 Class II subjects were sent to 8 orthodontists. In this sample, duplicate records of 18 subjects were dispersed. A questionnaire was sent with the records.


The intrarater consistency values were 65% when determining the type of malocclusion, 60% when deciding which arch was at fault, and 81% when determining the need for immediate treatment. Consistency values were 33% regarding case difficulty and 77% regarding phase 2 treatment need. There was a significant negative correlation between the consistency of the orthodontists’ responses and the peer assessment rating score.


We found that practitioners had only moderate agreement among themselves when diagnosing a patient’s type of malocclusion and which arch was at fault when a skeletal discrepancy was noted. Intrarater agreement improved as the peer assessment rating score increased, but the correlation was weak, and this was not consistent for all examiners. Because of insufficient intrarater agreement, interrater agreement was not examined.

Many factors go into the decision-making process for determining the etiology, diagnosis, prognosis, and treatment planning of patients with Class II malocclusion. Some of these factors are subjective. A Class II malocclusion can originate from discrepancies in the skeleton, the dentition, or both. The clinician determines whether the cause of the imbalance is in the maxillary arch, the mandibular arch, or both. For many years, it was widely accepted that, if a Class II patient had a skeletal imbalance, it could be corrected in the mixed dentition with early or phase 1 treatment, and then the dental correction could be completed during phase 2 treatment after all teeth had erupted. There is little doubt that phase 1 treatment is important for the correction of certain orthodontic problems that could cause further harm, such as crossbites, impinging overbites, severe crowding, or disfiguring malocclusions, but clinicians continue to disagree on treatment timing and approach for patients with a Class II malocclusion, even though these questions have been studied with well-controlled randomized clinical trials. The authors of these studies concluded that there are benefits to early treatment, but there is no long-term skeletal effect.

Randomized clinical trials are considered by many to be the gold standard of clinical research, so the Class II trials should answer practitioners’ questions on how we ought to approach the treatment of a patient with a Class II malocclusion. Many clinicians, however, continue to perform early treatment to achieve what they believe is a unique orthopedic effect. Several commentaries have questioned the validity of the Class II trials completed at academic institutions. One argument raised was that clinical decisions are not made randomly, so the randomization of subjects into the phase 1 treatment groups, which was done in these studies, is not a valid study design. Although they are considered the gold standard for establishing treatment efficacy, the validity of randomized clinical trials and whether one can directly correlate their findings to the real world has been challenged in the past. There is much individual variation in terms of response to treatment, and it is the responsibility of clinicians to identify patients who can be affected positively by the proposed treatment plan. In other words, practitioners must be able to categorize different patients with a Class II malocclusion based on its etiology and then make the appropriate treatment decision.

Clinicians are faced with making diagnosis and treatment planning decisions with every patient, and there are various reasons that practitioners choose 1 treatment modality over another. Some might have strong personal beliefs about appropriate treatments, whereas others might not be up to date with the current literature; also, there are inconsistent reports in the literature. It can be difficult to be consistent with each other and among ourselves, especially with the vast amount of orthodontic literature that is published each year. Baumrind et al completed a study that addressed agreement among orthodontists about the decision of whether to extract teeth and found that the clinicians agreed with each other almost 66% of the time. Lee et al assessed intrarater and interrater agreement between 10 orthodontists planning treatment for 60 patients. They found that the intrarater agreement was moderate, and the interrater agreement was poor. It appears likely that when clinicians diagnose a complex problem such as a Class II malocclusion, consistency could be a concern. The purpose of this study was to determine whether orthodontists from around the United States could agree on the phenotype of the Class II malocclusion and the treatment approach for a patient.

Material and methods

This study was retrospective; we used the subjects who were treated in a previous longitudinal randomized clinical trial at the University of Florida. The original study was designed to determine the effectiveness of treating subjects with a Class II malocclusion via early treatment with a bionator or headgear and a biteplane. An observation group was included as well to compare data with the treatment groups. A total of 277 subjects began the study: 95 in the bionator group, 100 in the headgear and biteplane group, and 82 in the observation group. The 2 treatment groups comprised 40% girls, and the observation group was 36% female. The average ages for the headgear and biteplane, bionator, and control groups were 9.7, 9.6, and 9.5 years, respectively. Inclusion criteria required the subjects to have at least a half-cusp Class II molar relationship bilaterally or greater than a half-cusp Class II molar relationship unilaterally. The subjects also must have fully erupted first molars, positive overjet and overbite, no more than 3 permanent canines or premolars, and good overall dental and general health.

For this study, baseline orthodontic records of 159 subjects were sent to 8 orthodontists at various academic institutions throughout the United States. Only 159 of the 277 subjects were used because of the marginal quality of the initial records of some subjects. In the set of 159 subjects, the records of 18 were duplicated and peppered throughout the sample randomly; this was not disclosed to the participating orthodontists. Peer assessment rating (PAR) scores were calculated on all subjects in the study according to methods previously described. The PAR is a scoring system that summarizes all of the occlusal discrepancies. The duplicate subjects were selected based on pretreatment PAR scores to ensure an equal distribution of low, moderate, and high scores. Furthermore, the sample of 18 duplicate subjects was similar to the total group in terms of SNA, SNB, and ANB angles. The only measurement that was significantly different was FMA, where standard deviation, and minimum and maximum values were less extreme for the duplicate sample compared with the total group. The records consisted of intraoral and extraoral photos; panoramic, lateral cephalometric, and hand-wrist radiographs; and photos of plaster models. The records were arranged in a PowerPoint (Microsoft, Redmond, Wash) presentation, loaded onto a password-protected flash drive, and mailed to the different orthodontists. Along with the records, a questionnaire was sent to be filled out on each patient. Each practitioner received the questionnaire and the flash drive and then was instructed to contact an author (D.A.M.) via e-mail for the password to begin. The questionnaire ( Fig 1 ) included questions aimed at pinpointing the etiology of the Class II malocclusion and whether early treatment was necessary in the eyes of the examiner. The institutional review board for the protection of human subjects at the University of Florida reviewed and approved the research protocol.

Fig 1
Questionnaire sent to orthodontists with the records of 159 patients.

Seven of the 8 investigators who participated in this study were board certified in the United States or the country where they received their orthodontic training. Three investigators completed their orthodontic education at the University of Florida. Three others received their degrees from the University of Washington, the University of Indiana, and the University of Michigan. Two investigators completed programs in Korea and Brazil. These 8 investigators held teaching or adjunct faculty positions at various academic institutions throughout the United States.

The questions were written to identify what each orthodontist viewed as the etiology of the malocclusion and, subsequently, how he or she would treat the patient ( Fig 1 ). The respondents were told that any extractions of deciduous teeth, without the placement of appliances, was not considered immediate orthodontic treatment. Any extractions of permanent teeth with or without the placement of appliances was considered immediate orthodontic treatment (phase 1). Any fixed appliance in the mixed dentition was considered immediate orthodontic treatment (phase 1). Phase 2 treatment was defined as fixed appliances in the permanent dentition. So, a patient could have phase 2 treatment even without phase 1 treatment.

Statistical analysis

Summary statistics were generated for overall responses and the intrarater subsample. Intrarater agreement was characterized by the percentage of agreement and the kappa statistic. The kappa statistic takes chance agreement into account, and it is commonly represented as the following: less than 0.20 corresponds to slight agreement, 0.20 to 0.40 corresponds to fair agreement, 0.40 to 0.60 corresponds to moderate agreement, 0.60 to 0.80 corresponds to substantial agreement, and 0.80 to 1.00 corresponds to almost perfect agreement. Weighted kappa statistics are computed for ordinal responses, with larger discrepancies contributing less to the kappa statistic. To quantify intrarater agreement, a discrepancy score was calculated, based on the amount of disagreement for each question, summed over all questions. Thus, a discrepancy score of 0 indicated exact agreement for all responses, and higher scores corresponded to less agreement. Spearman correlation coefficient estimates were used to examine the relationship between the amount of disagreement, represented by the discrepancy score, and the severity of the malocclusion, represented by the PAR score. A P value of less than 0.05 was considered statistically significant.


We summarized the overall responses using only the first review for duplicate subjects. The investigators identified a majority of the subjects with a combination of skeletal and dental malocclusion; this answer was chosen 47.5% of the time. In terms of the arch that each investigator would treat, the mandible was chosen 38% of the time, but 37% would not treat either arch. Difficulty ratings varied greatly with a score of 7 on a 10-point scale the most commonly chosen, approximately 25% of the time.

The intrarater agreement is depicted in Table I , based on the results from the duplicated subjects in the sample. Overall, the raters were consistent among themselves 65% of the time when determining what type of malocclusion the subjects had, 60% of the time when deciding which arch was at fault when a skeletal imbalance was noted, and 81% of the time when determining the need for immediate orthodontic treatment. The raters were consistent 33% and 77% of the time in regard to determining case difficulty and phase 2 treatment need, respectively. The question regarding case difficulty was answered on a 10-point scale, with 1 the easiest and 10 the most difficult. The raters’ answers on case difficulty were exact or within ±1 point on the difficulty scale 72% of the time and were off by more than 1 point 25% of the time. Raters 4 and 8 were the least consistent of all raters in any category with the question regarding case difficulty, with intrarater exact agreement of only 18%. Raters 4 and 5 were the most consistent of all the raters in any category when it came to determining phase 2 treatment need, agreeing with themselves 100% of the time. However, this is somewhat misleading because both raters classified all subjects as needing treatment.

Table I
Intrarater percentage agreement comparison
Overall Raters
1 2 3 4 5 6 7 8
Type 65 78 62 61 67 47 61 82 65
Arch 60 72 62 67 50 47 22 82 71
Need 81 78 94 78 67 76 78 88 88
Difficulty exact 33 44 20 22 18 53 44 41 18
± 1 42 44 53 39 35 24 44 47 47
>1 25 11 27 39 47 24 11 12 35
Phase 2 exact 77 67 69 65 100 100 89 71 59
± 1 19 11 31 29 0 0 6 29 41
>1 4 22 0 6 0 0 6 0 0

The kappa statistic not only measures the agreement of the investigators but also takes the chance of agreement into account ( Table II ). Overall, the investigators had moderate agreement for determining the subjects’ type of malocclusion, for deciding which arch was at fault if a skeletal imbalance was detected, and for determining whether phase 2 treatment was necessary. The kappa statistic ranged from 0.18 to 0.55, with an overall value of 0.48 for the raters when determining the type of malocclusion, from 0.13 to 0.60 with an overall value of 0.43 for the raters when determining which arch was at fault, and from −0.10 to 0.82 with an overall value of 0.55 for determining whether phase 2 treatment was necessary. When determining the need for immediate treatment, the investigators substantially agreed among themselves. The agreement ranged from 0.33 to 0.87, with an overall value of 0.62 among the raters. When determining the difficulty of each case, the investigators had fair agreement. The values ranged from 0.02 to 0.31, with an overall value of 0.21. Raters 4 and 8 were the least consistent of all the raters in any category with the question regarding case difficulty with a score of 0.02, or slight agreement. Overall, the raters were the most consistent when determining whether there was an immediate need for orthodontic treatment and the least consistent when quantifying case difficulty.

Table II
Intrarater agreement comparison (kappa statistic)
Overall Raters
1 2 3 4 5 6 7 8
Type 0.48 0.55 0.33 0.35 0.33 0.18 0.46 0.30 0.47
Arch 0.43 0.47 0.36 0.44 0.34 0.26 0.13 0.30 0.60
Need 0.62 0.56 0.87 0.51 0.33 0.61 0.56 0.72 0.67
Unweighted 0.21 0.21 0.05 0.12 0.02 0.39 0.34 0.12 0.02
Weighted 0.55 0.32 0.34 0.39 0.15 0.31 0.68 0.26 0.48
Phase 2
Unweighted 0.55 0.42 0.39 0.55 0.82 −0.10 0.27
Weighted 0.62 0.47 0.39 0.55 0.82 −0.10 0.33
Only gold members can continue reading. Log In or Register to continue

Apr 8, 2017 | Posted by in Orthodontics | Comments Off on Intrarater agreement about the etiology of Class II malocclusion and treatment approach
Premium Wordpress Themes by UFO Themes