Posttreatment tooth movement is inevitable, but its degree depends on a variety of factors that are both iatrogenic and innate to each patient. Although various retention techniques have been developed to minimize posttreatment movement, it is unrealistic to believe that the entire dentition can be retained in all dimensions. Relapse is usually considered an adverse phenomenon, but some dimensions of posttreatment tooth movement might actually enhance occlusal function and esthetics. Favorable movement is often considered as “settling” if that aspect of the occlusion is improved over time.
In 1998, the American Board of Orthodontics (ABO) published an objective method of evaluating posttreatment results by using 7 cast measurements and 1 radiographic measurement. Formerly called the Objective Grading System, the ABO now refers to this as the Cast Radiograph Evaluation (C-R Eval). This evaluation technique is used to score final casts and the panoramic radiograph produced within 12 months of debanding. The 8 scoring parameters are alignment/rotations, marginal ridges, buccolingual inclination, overjet, occlusal contacts, occlusal relationships, interproximal contacts, and root angulation. The C-R Eval was developed to increase objectivity in evaluation of treatment results; it uses a point system with precise criteria for cast and radiographic grading to score the case result in terms of finishing and detailing. Each case is scored after individual and group calibration of examiners in an effort to secure equity in grading among all examiners. Hence, 8 aspects of the finished result can be quantified at the time of final records production.
Currently, all 8 parameters of the C-R Eval are weighted equally with 1 point for each deficiency. The deficiencies are clearly defined to the accuracy of 0.5 mm. It became apparent that possibly the parameters that show improvement after treatment should be scored less heavily than those that tend to deteriorate after treatment. In other words, treatment deficiencies that will tend to relapse unfavorably should be scored more heavily than parameters that improve after debanding. The difference in scoring between improving and deteriorating parameters will be herein referred to as weighting. This information can also be helpful to the orthodontic practitioner to prognosticate which aspects of the final treatment result might settle toward the ideal and which will deteriorate over time.
The purpose of this article was to determine posttreatment tooth position changes in orthodontic cases with the intent to predict favorable vs unfavorable movement. This information is to be used to weight certain parameters of the finished result as scored by the C-R Eval. A secondary purpose of this investigation was to compare tooth movement between 2 forms of retention.
Our 2 null hypotheses were that all 7 (cast assessment) parameters of the C-R Eval have equal magnitudes of posttreatment changes, and that fixed and removable retention methods show equal amounts of posttreatment movement.
It is well established that posttreatment changes occur in orthodontic patients, but prediction of favorable vs unfavorable movement is difficult. Little et al demonstrated that more deterioration of alignment occurs between 10 and 20 years after treatment than from debanding to 10 years posttreatment. Only 10% of their 100-patient sample had acceptable alignment after 10 years. Fidler et al noted that posttreatment Class II Division 1 patients relapsed as much as 3.5 mm at first molar and canine relationships, 3 mm in overjet, and 4.5 mm in overbite after treatment. On a more positive note, Razdolsky et al assessed 40 patients at up to 21 months posttreatment and found continuous increases in interarch tooth contacts by 3 months, but buccolingual relationships showed minimal changes. Nett and Huang used the ABO’s cast radiographic evaluation (called the OGS in that article). They assessed 100 random subjects at least 10 years postretention and found that although adverse alignment changes occurred after debanding, significant improvement in marginal ridges, buccolingual relationships, occlusal contacts, and overjet occurred. Occlusal relationships improved insignificantly. None of these studies proposed a weighting factor for the C-R Eval scoring instrument.
Tooth movement during fixed and removable retention is also recognized. Although few studies compared fixed vs removable retention methods with regard to amount of relapse or settling, 1 such study was conducted by Atack et al, whose 58-patient sample demonstrated no difference in mandibular incisor irregularity between bonded canine-to-canine retainers and removable (spring aligner) appliances. They found that movement occurred with either retainer design. They speculated that the movement in the fixed retention group could be related to activity in the bonded wire or wire deformation while the retainer was in place. In a study conducted by Sari et al, 50 orthodontic patients were evaluated by assessment of interocclusal records produced by an impression material technique to record changes in occlusal contacts over time. Twenty-five subjects received maxillary and mandibular Hawley retainers, and 25 used maxillary and mandibular bonded anterior retainers. They also evaluated a control group of 20 subjects. The retained patients were assessed between 14 and 15 months posttreatment, and the control group was assessed at 12 months. These authors concluded that although there were insignificant changes in the control sample, both retention groups had greater numbers of occlusal contacts. More occlusal contacts were noted in the fixed retainer group than in the removable retainer group.
Material and methods
One hundred twenty-six diplomates certified by the ABO in February 2006 and 2007 were contacted by form letter and requested to produce a set of casts of the cases that were successfully displayed at the ABO Clinical Examination. Sixteen diplomates responded by sending at least 1 case to the ABO central office. These diplomates were informed only that the ABO was assessing case stability and that the casts would be evaluated anonymously. They were also assured that the findings of the investigation would remain anonymous and hence would not affect their status as an ABO diplomate. The diplomates were asked to produce casts of these patients between 12 and 24 months after debanding. The diplomates were directed to trim the casts to maximum intercuspation and mail them to the ABO central office in St Louis. These casts were called settling casts. Radiographs for assessment of the root paralleling parameter on the C-R Eval were not requested.
The 2008 directors of the American Board of Orthodontics were calibrated and instructed to individually score the settling casts using the C-R Eval system. These directors will hereafter be called the examiners in this article. The cast assignments of each examiner were randomized. These scores were compared with the scores recorded for these same cases at the February 2006 and 2007 ABO Clinical Examinations.
If fixed (bonded) retention was used in either arch, this was noted. There was no other assessment of retention method.
A total of 50 sets of settling casts were returned to the ABO for scoring. A summary of the examiner distribution for scoring is as follows: 41 cases were scored by 2 examiners, 7 cases were scored by 3 examiners, and 2 cases were scored by 4 examiners, with an average of 2.2 examiners assessing each settling case.
Settling time was defined by the date when the final treatment casts presented in the ABO Examination Case Displays were obtained subtracted from the date that the settling casts were obtained. These times can be summarized as follows: shortest, 9 months; longest, 116 months (9 years 8 months); average, 39.6 months (3 years 4 months).
Note that 50% of the cases were scored in 35 months of settling time (2 years 11 months).
Each case was scored by 2 to 4 examiners during the investigation. For the purpose of clarity here, the original C-R Eval (cast) score will be titled the examination score (ES). This is the scoring of the case when it was originally presented at the ABO Clinical Examination in 2006 or 2007. The subsequent rescoring conducted by the ABO examiners for this investigation will be called the settling score (SS).
Determination of examiner variability and agreement
The scoring of the settling casts was randomized because each examiner scored according to the availability of the casts and time constraints. The internal agreement of the SS was evaluated by determining the variation in each examiner’s range. To evaluate the interexaminer agreement on scoring, the maximum absolute difference between the SS values was calculated for each case. The lowest SS was subtracted from the highest SS, and this difference reflected the range of scores. For example, if 3 examiners gave scores of 3, 4, and 6, the score range was 6 – 3 = 3. Smaller ranges reflect closer values between examiners and, hence, less variation.
After the range was calculated for each SS, this information was summarized among the 50 cases. The shortest range was zero for all cases, indicating that the examiners reached absolute agreement on at least 1 case. For other cases, however, there were greater differences between examiners. For example, SS in the overjet parameter differed by 13 for 1 case and by 20 for another. The average ranges across 50 cases are reported in the bottom row of Table I .
|Average range across 50 cases||2.1||1.8||3.5||2.0||1.8||1.5||0.1||6.1|
The SS difference describes the degree of examiner disagreement despite calibration among examiners. This difference is sensitive to outliers and the clinical range of each parameter. Measurements that were larger in range were thus more likely to display more variation than those of smaller range.
The second method used to evaluate the internal agreement between examiners and their SS was the intraclass correlation (ICC). Since the examinees and examiners were a random sample, the 2-way random ICC assessment was chosen as the appropriate agreement measure for this study. For cases evaluated by 2 examiners, both examiners’ data were used to calculate an ICC estimate. For cases with more than 2 examiners, 2 examiners’ scores were randomly selected from the 3 or 4 examiners’ scores. Finally, all cases were combined, and an overall agreement was estimated for each parameter. Table II gives the ICC values of the examiners.
|Agreement (ICC) for cases by 2 examiners (41 cases)||.86||.46||−.72||.64||.47||.80||.87||.69|
|Agreement (ICC) for cases by 3 examiners (9 cases)||.81||.89||.63||.53||.82||.85||.94||.85|
|Overall agreement (ICC) for all cases||.80||.57||−.28||.49||.51||.79||.87||.67|
An ICC of approximately .60 or greater would indicate sufficient agreement. Using this standard, the examiners achieved agreement on the following measures: alignment/rotations, occlusal relationships, interproximal contacts, and total. The agreement was moderate for marginal ridges, overjet, and occlusal contacts. The agreement was low for buccolingual inclination.
When the ES was compared with the SS, the average SS was used despite the varying levels of agreement because the average SS still provided a good summary of all examiners and allowed a straightforward assessment of changes over time.
Assessment of tooth position changes over time
The ES vs SS, regardless of the duration of settling time or mode of retention, was compared. The SS was obtained by averaging the scores of all examiners that reviewed the settling cases. The examiners’ scores were compared by using a paired-sample t test as depicted in Table III .
Paired t test results confirmed that the scores for alignment, buccolingual inclination, and total changed significantly during settling, regardless of settling time. The other parameters had relative stability without respect to settling time.
From a graphic perspective, the ES average of 50 cases is depicted in the left bar in each graph and the SS is on the right of the Figure . Note the relatively large discrepancy in the alignment/rotation and buccolingual inclination parameters. The other parameters displayed few changes between ES and SS. The whisker extension on the solid bar graph indicates the upper boundary of measurement.
Comparison of settling over 2-year time intervals
As previously stated, the diplomates had been asked to submit settling casts produced between 12 and 24 months after debanding, but many casts were received that were produced more than 2 years after debanding. These data were used in the following analysis to determine settling activity over defined time intervals.
ES and SS classified by settling time were stratified into periods as listed in Table IV . Standard deviations are given in parentheses.
|Less than 2 years
(n = 13)
(n = 23)
(n = 14)
|Alignment/rotations||2.62 (1.98)||4.96 (2.80)||2.91 (1.88)||5.87 (2.85)||2.07 (1.64)||4.91 (2.95)|
|Marginal ridges||3.00 (1.58)||2.77 (1.30)||3.48 (1.93)||3.08 (1.72)||3.57 (1.34)||3.25 (1.65)|
|Buccolingual inclination||2.69 (2.72)||4.12 (1.65)||2.70 (2.14)||3.77 (1.61)||2.36 (1.74)||3.84 (2.15)|
|Overjet||2.15 (2.23)||1.94 (2.43)||2.52 (2.33)||1.83 (1.69)||2.07 (2.20)||3.02 (3.32)|
|Occlusal contacts||1.62 (1.98)||2.01 (1.85)||2.04 (1.69)||2.38 (1.85)||2.93 (2.89)||1.46 (1.26)|
|Occlusal relationships||1.69 (2.14)||1.65 (1.66)||2.43 (2.50)||2.32 (2.78)||2.14 (1.88)||1.45 (1.97)|
|Interproximal contacts||0.15 (0.55)||0.54 (0.90)||0.35 (0.78)||0.12 (0.35)||0.57 (1.16)||0.00 (0.00)|
|Total||13.92 (6.18)||17.99 (7.84)||16.43 (5.13)||19.37 (6.37)||15.86 (6.49)||17.93 (6.71)|