Orthodontics in China has developed rapidly, but there is no standard index of treatment outcomes. We assessed the validity of the American Board of Orthodontics Objective Grading System (ABO-OGS) for the classification of treatment outcomes in Chinese patients.
We randomly selected 108 patients who completed treatment between July 2005 and September 2008 in 6 orthodontic treatment centers across China. Sixty-nine experienced Chinese orthodontists made subjective assessments of the end-of-treatment casts for each patient. Three examiners then used the ABO-OGS to measure the casts. Pearson correlation analysis and receiver operating characteristic curve analysis were conducted to evaluate the correspondence between the ABO-OGS cast measurements and the orthodontists’ subjective assessments.
The average subjective grading scores were highly correlated with the ABO-OGS scores (r = 0.7042). Four of the 7 study cast components of the ABO-OGS score—occlusal relationship, overjet, interproximal contact, and alignment—were statistically significantly correlated with the judges’ subjective assessments. Together, these 4 accounted for 58% of the variability in the average subjective grading scores. The ABO-OGS cutoff score for cases that the judges deemed satisfactory was 16 points; the corresponding cutoff score for cases that the judges considered acceptable was 21 points.
The ABO-OGS is a valid index for the assessment of treatment outcomes in Chinese patients. By comparing the objective scores on this modification of the ABO-OGS with the mean subjective assessment of a panel of highly qualified Chinese orthodontists, a cutoff point for satisfactory treatment outcome was defined as 16 points or fewer, with scores of 16 to 21 points denoting less than satisfactory but still acceptable treatment. Cases that scored greater than 21 points were considered unacceptable.
Various orthodontic indexes that aim to assess orthodontic treatment outcomes objectively have been developed since the 1970s. Derived from prior subjective evaluations by groups of authorities, objective rating or categorizing systems generally assign numeric scores and provide a threshold for evaluating successful treatment. In 1994, the American Board of Orthodontics (ABO) began to develop its Objective Grading System (OGS) to standardize and increase the precision and reliability of dental cast and panoramic radiograph measurements after treatment. This system was introduced in 1999 as a component of the examination to determine whether completed cases met the ABO standard. The ABO-OGS is now widely accepted and has recently been renamed the Cast/Radiograph Evaluation tool by the ABO.
As used by the ABO, the Cast/Radiograph Evaluation scores the results of objective measurements of the final study casts and radiographs of completed patients. The cast measurements are made using a physical measuring tool that has been devised based on evaluations by groups of experienced ABO examiners in previous tests. The casts are scored in 7 categories (alignment, marginal ridges, buccolingual inclinations, occlusal relationships, occlusal contacts, overjet, and interproximal contacts), and panoramic radiographs are scored according to the single category of root angulation.
In each category, points are scored characterizing discrepancies from a standard developed by the ABO. There is a limit to the total number of discrepancy points that can be scored against a case in each category. There is also a limit to the number of discrepancy points that can be scored against each tooth in each category. The ABO score for the case is calculated by summing the scores for the 8 categories. If fewer than 20 points are scored overall, the case is considered to meet the ABO standard. If 20 to 29 points are scored, then the standard of work is undetermined. If more than 30 points are scored, the case is considered unacceptable. In a study that assessed how well the OGS measured the quality of treatment in a sample of adult orthodontic patients, the cutoff value for a case that met the ABO standard was 27 points when the score for root angulation was excluded.
When cutoff values are determined by an aggregation of professional opinions, the diagnostic specificity and sensitivity of any index used for evaluation are optimized. Thus, the validity of any orthodontic treatment index is influenced by local conditions of treatment and judging. Hence, any objective index requires a comparison with subjective evaluations made by a group of experienced orthodontists in a specific geographic region to determine the optimal threshold for treatment standards in that region.
This consideration is particularly relevant in a region as large and diffuse as China. Orthodontics has developed rapidly in China over the past 20 to 30 years. As the number of patients grows, it is important to evaluate the effectiveness of orthodontic treatment provided by the various orthodontic services. The aims of this study were to assess the validity of the ABO-OGS tool as an index of treatment outcomes in China and to investigate the optimum cutoff scores for the Chinese population.
Material and methods
This article was based on a multicenter study joining 6 orthodontic treatment centers in different parts of China. The participants included the Peking University School of Stomatology, the West China College of Stomatology at Sichuan University, the School of Stomatology at the Fourth Military Medical University, the Beijing Stomatological Hospital and School of Stomatology at the Capital Medical University, the Stomatological Hospital at Nanjing Medical University, and the Hospital of Stomatology at Wuhan University. Each center collected the complete medical records of at least 300 patients who had completed treatment between July 2005 and September 2008. From the combined total of 2383 patients’ records, a stratified random sample of 108 subjects was drawn and balanced to include 18 from each collaborating center, consisting of equal numbers of Angle Class I, Class II, and Class III subjects. This sample was then randomly allocated to produce 9 groups that contained 12 subjects each. Each group included 4 Class I subjects, 4 Class II subjects, and 4 Class III subjects. Seventy-two of the 108 patients were less than 18 years of age; the remaining 36 were 18 years or older. There were 30 male and 78 female subjects. All markings that could identify the patient, the clinician, and the treatment center of origin were removed from the casts.
A panel of judges was formed for making subjective assessments of the 108-patient sample. It consisted of 69 experienced orthodontic specialists recommended by the 6 participating treatment centers to represent the different districts of mainland China; they assessed the patients subjectively. The criteria for the inclusion of each judge were (1) more than 10 years of clinical experience in orthodontics, (2) an MS or a PhD degree in orthodontics or experience as a research supervisor of orthodontic postgraduates, and (3) an academic rank of associate professor or above. Thirty-eight judges were men, and 31 were women.
To standardize the responses of the judges, a pilot examination was conducted in each center. Each judge evaluated 4 groups of cases treated locally over a dedicated period of 2 days. Two to 4 months later, the entire sample of 108 cases was evaluated over a 3-day period by all judges gathered at 1 location in Beijing.
For each case, each judge was invited to examine the physical upper and lower study casts individually and in occlusion. For each group of records, 2 separate assessments were made. In the first assessment (ranking), each judge ranked and ordered the 12 study casts in each group numerically from 1 (most favorable) to 12 (least favorable) with respect to treatment outcome. In the second assessment (grading), the judge then identified the highest numerically ranked study casts in each group of 12 with a treatment outcome considered satisfactory. Then, beyond the highest numbered satisfactory casts, the judge identified the highest numbered casts considered acceptable. Cases with casts that had ranking numbers above the highest numbered acceptable casts were considered unacceptable. This procedure helped control for the chance aggregation of more or fewer acceptable cases in any group of 12 cases. Satisfactory cases were assigned a value of 1 point, acceptable cases were given 2 points, and unacceptable cases had 3 points. Over the entire sample, the cutoff points for satisfactory, acceptable, and unacceptable were based on the average scores of all 69 judges.
Under strict adherence to the ABO-OGS guidelines, 3 second-year postgraduate students (W.-Z.W., C.R., X.-R.W.) were invited to measure the study casts. They were asked to record the measurements in 7 ABO-OGS cast assessment categories. In the first session, a random set of 10 cases was measured by the 3 students for standardization. Four weeks later, each student assessed all 108 cases in the second session, including the 10 cases graded previously. Seven ABO-OGS categories were scored, and the grades of the 3 examiners were averaged.
All statistical analyses were performed with SPSS software (version 20.0; SPSS, Chicago, Ill). Spearman correlation coefficients and kappa coefficients were calculated to assess the reliability between judges who undertook the subjective evaluations. Intraclass correlation coefficients (ICC) were computed to evaluate the intraexaminer and interexaminer reliabilities of the examiners who undertook the objective assessments. Stepwise linear regression and Pearson correlation analyses were conducted to assess validity. Receiver operating characteristic (ROC) curves were created to assess the sensitivity and specificity of the ABO-OGS tool and to determine the cutoff points for satisfactory, acceptable, and unacceptable cases. One-way analysis of variance (ANOVA) was used to determine whether the ABO-OGS scores differed systematically between Angle Class I, Class II, and Class III cases. Graphs were generated using MATLAB (R2011b; MathWorks, Natick, Mass), Excel (Excel for Mac 2011; Microsoft, Redmond, Wash), and SPSS software.
The ABO-OGS scores of the 108 cases ranged from 5 to 45, with a mean value of 19.13 ± 8.40. The results of the 1-way ANOVA showed no statistically significant differences in the ABO-OGS scores between Class I, Class II, or Class III cases ( Fig 1 , Table I ). The subjective grading scores of the 108 cases ranged from 1.07 to 3.00, with a mean value of 1.90 ± 0.54.
|Pretreatment Angle classification||n||ABO-OGS scores
Mean ± SD
|F value||P value|
|Class I||36||17.13 ± 6.21||1.585||0.210|
|Class II||36||20.56 ± 8.40|
|Class III||36||19.53 ± 10.02|
|Total||108||19.13 ± 8.40|
The mean value of the Spearman correlation coefficient was 0.64 ± 0.10 for all judge pairs of ranking score. The mean value of the kappa coefficient was 0.58 ± 0.06 for the subjective grading results of the 69 judges. An assessment of interexaminer reliability found that the ICC of the ABO-OGS scores of the 3 examiners was 0.74. For intraexaminer reliability, the ICC values of the ABO-OGS scores of the 3 examiners were 0.79, 0.81, and 0.77.
The average subjective grading scores correlated strongly with the ABO-OGS scores (r = 0.70, P <0.05; Fig 2 ). Validity testing selected the highly correlated categories and determined the weights of the components ( Table II ). Among the 7 categories, “occlusal relationship” was the first to enter into the regression equation, accounting for an R 2 value of 0.4291. “Overjet” entered next, adding an R 2 value of 0.0953. “Interproximal contacts” then added an R 2 value of 0.0313, followed finally by “alignment,” which added a small but statistically significant increment of 0.0278. The overall R 2 value was 0.5835, implying that 58% of the variability in the average subjective grading scores was accounted for by the 4 categories of ABO-OGS scores.