Introduction
Manual linear assessment, a classic method for evaluating orthodontic external root resorption (OERR), has limitations: unreliable dento-osseous junction identification and operator variability, time-consuming measurements, and inability to capture 3-dimensional (3D) morphologic changes.
Methods
We developed OERR-Net, a deep learning system for objective, real-time OERR linear assessment and 3D visualization using pre and postorthodontic treatment cone-beam computed tomography scans. First, leveraging Transformer architecture, inspired by ChatGPT (OpenAI, San Francisco, Calif), the Swin-UNETR model was adopted for apex-aware tooth segmentation and 3D reconstruction. Second, a novel algorithm (ToothLM) was proposed for automatic tooth length measurement. Third, the system achieved simultaneous grading and 3D morphologic visualization. An end-to-end validation workflow was established, covering segmentation to grading, with Swin-UNETR’s superiority demonstrated through qualitative, saliency, and quantitative analyses. Length accuracy was validated via difference and the Bland-Altman analyses, and grading performance was compared with orthodontists’ evaluations.
Results
The study included 100 paired cone-beam computed tomography scans (1560 incisors). First, Swin-UNETR outperformed U-Net and UNETR, achieving the highest agreement with the ground truth (dice similarity score = 90.98%). Second, ToothLM showed excellent agreement with expert manual measurements (intraclass correlation coefficient = 0.999). Finally, OERR-Net achieved superior grading accuracy (maxillary incisors: 97.37% vs 74.34%; mandibular incisors: 96.82% vs 94.27%) than the subjective assessments by orthodontists, captured subtle morphologic changes, and reduced subjective assessment time by 50%, enhancing efficiency and accuracy.
Conclusions
The proposed automatic OERR assessment system aligns with classic practices, clarifies resorption patterns, and helps treatment selection based on severity. Current validation is single-center; broader applicability requires future validation.
Highlights
-
•
Adapted Swin-UNETR to model 3D apex–alveolar relations for precise OERR.
-
•
Proposed a novel 3D dental model length algorithm for accurate OERR grading.
-
•
The OERR system outperformed orthodontists and enabled precise 3D morphology.
Orthodontic external root resorption (OERR) commonly occurs during orthodontic treatment. It is caused by pressure and tension within the periodontal ligament as teeth are moved. Mild OERR, with a prevalence of 98.1%, does not have long-term detrimental effects on teeth. However, clinically significant OERR (defined as a length loss of ≥2 mm) poses risks, such as increased tooth mobility and potential loss if alveolar bone loss occurs. , Once OERR exceeds 2 mm, clinicians must ensure long-term tooth stability. Measures include patient oral hygiene guidance, maintaining passive retention devices, and other appropriate interventions. Therefore, a timely and accurate assessment of OERR is crucial for the affected tooth’s stability.
Radiographic examination is a noninvasive tool for assessing OERR. Cone-beam computed tomography (CBCT) provides detailed 3-dimensional (3D) morphology and spatial relationships of bony structures. It eliminates distortions and superimpositions that are seen in 2-dimensional (2D) images, such as periapical and panoramic radiographs. The sensitivity of CBCT for detecting OERR is approximately 89%, which is 20% higher than that of periapical radiographs.
A major challenge lies in the subjective assessment of root length changes using CBCT 2D slices. This assessment depends heavily on the clinician’s attention and experience, often leading to oversight or misjudgment. To address this issue, semiautomatic methods are often employed for linear changes or to obtain 3D changes in OERR through tooth segmentation and 3D reconstruction. , However, the time-consuming nature of the process hinders clinical adoption. Therefore, achieving both precision and efficiency remains a critical challenge in current research.
Given these limitations, many researchers have sought automatic approaches for accurate OERR assessment. To enhance the accuracy and efficiency of OERR assessment, deep learning methods have shown effectiveness in automatic morphology analysis. , Xu pioneered automatic OERR assessment using image-based classification of paired 2D CBCT images. However, inconsistent reference plane selection compromised its diagnostic validity during root evaluation.
The 3D morphologic assessment provides a more comprehensive understanding of tooth shapes than 2D images. , Previous studies , extracted 3D dental structures using convolutional neural networks (CNN)–based segmentation algorithms, then applied crown-root separation algorithms to separate root morphology. This enabled calculation of OERR volume changes during orthodontic treatment. Although these pioneering studies laid important groundwork for 3D OERR assessment, future research can expand in 3 key areas: (1) first, although volumetric analysis provides valuable insights, the development of linear assessment algorithms is essential. Linear measurements better align with clinicians’ established practices and serve as the primary basis for OERR management in clinical guidelines ; (2) second, CNN-based segmentation architectures, limited by fixed-size kernels, struggle to capture long-range anatomic relationships, particularly between root apex morphology and alveolar bone; and (3) third, existing validation protocols overlooked the propagation of errors arising from upstream segmentation inaccuracies. Despite the high accuracy of upstream modeling, the cumulative error in calculating relevant parameters can be significant.
As the core architecture of ChatGPT, Transformer is known for extracting global features through self-attention. Notably, Swin-UNETR, combining a Swin Transformer encoder with a U-Net decoder, addresses CNN limitations by capturing long-range dependencies and complex spatial relationships. In addition, Swin-UNETR has proven to have higher accuracy in segmenting oral surgery-related tissues and periapical lesion detection compared with CNNs. Leveraging these advantages, this approach was applied to address OERR assessment challenges, focusing on fine root apex structures.
To bridge the gap between automatic assessment and clinical decision-making, we proposed OERR-Net. It integrates Swin-UNETR’s apex-aware segmentation with ToothLM’s length quantification capabilities to enhance clinicians’ understanding of OERR severity. This innovative framework calculates OERR grades based on length changes before and after treatment. It also provides a comprehensive 3D visualization of resorptive patterns. This assists orthodontists in timely interventions to preserve tooth stability.
The significant contributions of this study are summarized as follows:
-
•
Inspired by Transformer successes, such as ChatGPT, we adopted Swin-UNETR to overcome CNN limitations. This improved modeling of long-range anatomic dependencies, especially 3D relationships between root apex and alveolar bone, enabling precise OERR assessment.
-
•
We proposed an original algorithm called ToothLM for length measurement on 3D dental models. This addresses the critical gap in automatic linear quantification, providing a foundation for accurate grading of OERR.
-
•
Experimental results demonstrated that our OERR system outperformed orthodontists’ subjective OERR-grading in double-blind trials. In addition, the 3D visualization analysis provided further insights into the patterns of OERR, enhancing clinicians’ understanding. These precise results were achieved through an end-to-end validation process, minimizing cumulative error in every step and ensuring accuracy across the entire workflow.
Material and methods
The protocol of this retrospective study was reviewed and approved by the ethics committee at the Stomatological Hospital of Chongqing Medical University (LSNo. 95) and was reported according to the Checklist for Artificial Intelligence in Medical Imaging. Details are listed in Supplementary Table I .
The study included pre and postorthodontic treatment CBCT scans of patients who completed orthodontic treatment at the Department of Orthodontics, Stomatological Hospital of Chongqing Medical University. All participants underwent CBCT examinations as part of their diagnostic records for the following purposes: planning for orthognathic surgeries, temporary skeletal anchorage device placement, digital planning, or comprehensive assessments of the temporomandibular joint, alveolar boundary, and airway. The CBCT scans were performed with the participants in a natural head position, the horizontal plane parallel to the floor, and with maximum intercuspation.
All CBCT scans were conducted using the KAVO 3D EXAM (Biberach, Germany) with the following parameters: a field of view of 16 × 11 cm, 120 kVp, 5 mA, voxel size of 0.3-0.4 mm, an exposure duration of 3.7 seconds, and a total scan time of 8.9 seconds. For de-identification, the format of all CBCT scans was converted from Digital Imaging and Communications in Medicine files to Neuroimaging Informatics Technology Initiative files.
The inclusion criteria were as follows: (1) paired CBCT scans at baseline (T0) and endpoint (T1) of orthodontic treatment. OERR was assessed for the maxillary and mandibular incisors (UI and LI) between T0 and T1 and (2) patients who had undergone fixed orthodontic treatment or clear aligner. The exclusion criteria were as follows: (1) unpaired CBCT scans; (2) previous orthodontic treatment; and (3) metal or motion artifacts that severely influenced the quality of the regions of interest. The dataset was divided, allocating 80% of patients to the training set and 20% to the testing set.
To improve annotation efficiency, the ground truth was constructed using our team’s validated deep learning method, followed by expert manual refinement. Orthodontists with at least 3 years of clinical experience performed the annotation. This ensured accurate delineation of tooth morphology, especially in the root region, before and after orthodontic treatment. All experts were systematically trained in CBCT image segmentation with ITK-SNAP (version 4.0.0; www.itksnap.org ) before participating in this study. They had also participated in the team’s previous CBCT annotation projects, , including training sessions, manual labeling, and quality review. Their solid image interpretation and analytical skills ensured the reliability of the final annotations.
The initial segmentation labeled the 8 incisors in different colors, providing a preliminary boundary. Manual refinements focused on the root region and were conducted in 2 rounds: 2 experts (with 3 years of tooth segmentation experience) manually corrected the initial segmentation results. Then, the third expert (with 5 years of tooth segmentation experience) reviewed the segmentation results slice by slice and made corrections. To further verify the reliability of this annotation method, each expert reannotated the data 2 months after the initial annotation. A random selection of 20 teeth was used for intragroup consistency evaluation. The average overall consistency rate among annotators was 95.28%.
Two experienced orthodontists independently measured the same dataset. The average of their measurements was used as the gold standard for tooth length. The details of the manual measurement process are as follows: First, the manual segmentation files were converted into standard tessellation language models and imported into Geomagic Wrap (version 2021; 3D systems, Rock Hill, SC). Second, referring to established methodologies in previous studies, the detailed protocol for manual measurement is outlined as follows: (1) the midpoint of the bilateral incisal edge points of the incisor was identified. The tooth’s long axis was determined by connecting this midpoint to the root apex point; (2) the length of the tooth at T0 was measured as the perpendicular distance between the 2 planes. Both planes were perpendicular to the tooth’s long axis: one plane is tangent to the crown surface, and the other is tangent to the root surface; and (3) for the tooth at T1, its 3D model was superimposed onto the model at T0. The long axis and coronal reference plane from the preorthodontic tooth were retained. A new root section plane was then defined based on the root morphology of the posttreatment model. Finally, the length of the postorthodontic tooth was measured.
Resorptive changes were calculated by subtracting the tooth length at T1 from T0. Length loss was then correlated with OERR grading: OERR <2 mm and OERR ≥2 mm.
Orthodontists often rely on visual inspection and experience to grade OERR based on length loss. This subjective approach lacks precise automatic tools and can delay the detection of OERR ≥2 mm. To enhance OERR grading accuracy and provide 3D visualization of resorptive changes, we developed OERR-Net, illustrated in Figure 1 , A . This system includes a Swin-UNETR–based segmentation model and an automatic length measurement algorithm named ToothLM. It ensures accurate tooth segmentation, especially in the apical region, enabling reliable quantification and visualization of resorptive changes. At the same time, ToothLM enables exact length measurement for grading based on length loss.
Overview of the OERR-Net: A, The workflow of the OERR-Net. OERR-Net consists of a segmentation model based on Swin-UNETR and an automatic measurement model; B, Automatic segmentation network architecture based on Swin-UNETR; C, The automatic measurement pipeline. L0, length at T0; L1, length at T1; △L, length loss; transpose conv, transposed convolution.
CNN-based architectures, such as U-Net, are widely used for CBCT tooth segmentation. However, they rely on local receptive fields, limiting their ability to model long-range dependencies. This limitation is critical in CBCT, in which crowns and apices are distant, yet functionally connected. Transformer-based models address this issue using self-attention to capture global context. Nevertheless, their high computational cost in 3D volumes hinders practical application. In contrast, Swin-UNETR addresses this by introducing a hierarchical window self-attention mechanism. It models both local and global information at multiple scales, balancing representational power and computational efficiency. This makes it well-suited for structures with long-range dependencies, such as tooth segmentation.
Employing a multistage architecture for effective feature extraction, the Swin-UNETR model segments 8 incisors from CBCT images. Figure 1 , B shows the construction of this model. The encoder has 5 stages, each with 2 Transformer blocks (10 layers total) to enhance feature representation and spatial understanding. A linear embedding layer first generates tokens for input features. Each level uses a patch merging layer to halve feature resolution and capture multiscale context. Thanks to this design, the encoder can efficiently capture high-level and fine-grained information. In the decoder stage, skip connections and concatenation preserve spatial details from the encoder. Residual blocks refine feature maps to improve accuracy by integrating multiscale information. Deconvolution layers restore features to their original resolution. Convolutional layers with sigmoid activation generate the final segmentation output as a clear probabilistic map of the incisors. The details of Swin-UNETR are presented in the Supplementary Material.
Previous studies relied on semiautomatic methods to measure length loss, which were time-consuming. To automate this process, we proposed the first 3D dental model length measurement algorithm. This original algorithm, named ToothLM, was inspired by simulating the manual measurement of extracted incisors. Figure 1 , C depicts the ToothLM pipeline, employing the tooth’s centroid as the pivot point. The tooth is first rotated on the x-axis, followed by the y-axis. This process quantifies the maximum variation along the z-axis. This maximum value is output as the tooth length. The formula for ToothLM is presented in the Supplementary Material.
Statistical analysis
Segmentation performance was assessed using standard evaluation metrics. These included the dice similarity coefficient (DSC), intersection over union (IoU), and average symmetrical surface distance (ASSD). The formulas for the evaluation metrics are provided in the Supplementary Material. To validate the precision of Swin-UNETR in delineating dento-osseous junctions, we performed a qualitative evaluation based on CBCT segmentation. Accurate extraction of the apical region was emphasized. As U-Net is widely used for tooth segmentation, it was selected as a baseline for comparison. UNETR, as a transformer-based extension of U-Net, was also included to further highlight the advantages introduced by Swin-UNETR.
ToothLM results were compared with manual length measurements on the same 3D dental models constructed from manual segmentation. The discrepancies were calculated using paired-samples T tests, Wilcoxon signed-rank tests, and the Bland-Altman plot. Intraclass correlation coefficients (ICC) were used to assess the interoperator reliability of the 2 experts and the consistency between ToothLM and the manual measurement.
Grading based on pre and posttreatment tooth length is common in OERR assessment, with losses ≥2 mm requiring clinical attention. Therefore, we evaluated the grading performance of OERR-Net using standard metrics, including accuracy, specificity, recall, F1-score, and precision. The formulas for the evaluation metrics are provided in the Supplementary Material. In addition, grading accuracy near the 2 mm threshold was evaluated, given its clinical sensitivity to minor measurement deviations. We stratified all patients based on the distance between the ground truth and the 2 mm threshold into 3 intervals, using 1 voxel size (0.4 mm) as the interval width: (1) near-boundary (1.6-2.4 mm); (2) intermediate ([1.2, 1.6] ∪ [2.4, 2.8] mm); and (3) far-from-boundary (≤1.2 mm ∪ ≥2.8 mm).
Furthermore, a blinded comparative study was conducted using 50% of randomly selected patients. This included 40 patients from the training set and 10 from the testing set. Both OERR-Net and clinicians’ traditional evaluations were independently compared against ground-truth measurements. This approach ensured an unbiased validation of diagnostic superiority. More importantly, we showed that OERR-Net offers intuitive 3D visualization of resorptive areas, supporting clinical assessment of OERR.
To compare the efficiency between OERR-Net and clinician grading, we recorded the time taken by both methods. For OERR-Net, time measurement began when the system received the raw files and ended once the final grading results were produced. This duration included all steps: data preprocessing, segmentation, length calculation, and grading output. For orthodontists, time was measured from receiving the CBCT images to completing the subjective grading assessment.
The quantification of OERR is based on grading. ToothLM evaluation is independent of Swin-UNETR. By contrast, assessment of OERR-Net’s overall quantification accuracy was dependent on the influence of Swin-UNETR in tooth segmentation. Therefore, the grading accuracy was evaluated by comparing OERR-Net’s measurements with the ground truth from manual segmentation and measurement. Paired-samples t tests and Wilcoxon signed-rank tests were used for statistical analysis. All statistical analyses were performed using SPSS Statistics (version 26.0; IBM, New Yotk, NY). Graphs were generated with GraphPad Prism software (version 10.0; GraphPad Software, Calif)
Results
Sample size estimation ensured sufficient power to detect differences between OERR-Net and the ground truth. After completing the manual annotation and measurement work for 50 patients, the mean and standard deviation of the changes in length between the 2 groups were calculated. With a significance level of 0.05 and a power of 0.90, the statistical results indicated that at least 80 patients were needed. The sample size estimation was performed using PASS (version 2021; NCSS, LLC, Utah). A total of 100 paired CBCT scans were randomly selected. We further excluded unsuitable tooth samples based on the specific requirements of this experiment ( Fig 2 ). The details of patient characteristics are provided in the Table .
Data collection results. N, number of incisors (counting the same tooth at different time points as a whole).
Table
Patient characteristics
| Variables | Clinical cohorts | |
|---|---|---|
| Training set | Testing set | |
| No. of patients | 80 | 20 |
| Age, mean (minimum-maximum), y | 17.16 (10-38) | 18.10 (10-31) |
| Tooth-level (OERR prevalence) | ||
| OERR <2 mm | 593 (95.49) | 143 (92.86) |
| OERR ≥2 mm | 28 (4.51) | 11 (7.14) |
| Patient-level (extraction/nonextraction) | ||
| Extraction | 60 (75.00) | 18 (90.00) |
| Nonextraction | 20 (25.00) | 2 (10.00) |
| Treatment protocol | ||
| Clear aligner | 5 (6.25) | 3 (15.00) |
| Fix orthodontic treatment | 75 (93.75) | 17 (85.00) |
| Gender | ||
| Male | 25 (31.25) | 7 (35.00) |
| Female | 55 (68.75) | 13 (65.00) |
| Dental classification | ||
| Class I | 36 (45.00) | 10 (50.00) |
| Class II | 41 (51.25) | 9 (45.00) |
| Class III | 3 (3.75) | 1 (5.00) |
| Skeletal classification | ||
| Class I | 47 (58.75) | 10 (50.00) |
| Class II | 25 (31.25) | 9 (45.00) |
| Class III | 8 (10.00) | 1 (5.00) |
Note. Data are presented as number (percentage), unless otherwise specified.
Swin-UNETR effectively captured global and local features, especially in the apical region, in which the target closely resembles alveolar bone. Saliency maps ( Fig 3 ) reveal that Swin-UNETR exhibits a denser area of concentrated attention in the apical region. In contrast, UNETR’s attention was less consistent in this area. U-Net mainly focused on the boundaries of the incisor, limiting its ability to capture the overall tooth structure.
Examples of saliency maps from the testing set were randomly selected. The region covered by the square represents the apical area of the tooth and its surrounding structure.
Figure 4 shows that Swin-UNETR precisely segments incisors in the sagittal plane by localizing the root apex and subtle apical structures. This provides a robust foundation for subsequent quantification of length changes. In axial plane evaluations, Swin-UNETR showed higher sensitivity in delineating apical morphology than UNETR. Notably, U-NET performed the poorest, missing segmentation in 3 slices and underestimating apical length by approximately 1 mm.
Qualitative evaluation of the segmentation results of different deep learning methods on the testing sets.
Figure 5 presents the quantitative analysis results of incisor segmentation using 3 advanced models. Swin-UNETR demonstrated the best performance, achieving a mean DSC of 90.98%, a mean ASSD of 0.1910 mm, and a mean IoU of 86.33% on the testing set. U-Net ranked second, with a mean DSC of 90.50%, a mean ASSD of 0.1916 mm, and a mean IoU of 84.20%. UNETR had the lowest performance, with a mean DSC of 88.34%, a mean ASSD of 0.2041 mm, and a mean IoU of 83.73%. Notably, Swin-UNETR performed even better when segmenting UI compared with LI, achieving a mean DSC of 94.07%, a mean ASSD of 0.0305 mm, and a mean IoU of 91.04%.
Quantitative evaluation of the segmentation results on the testing set.
For manual length measurement, interoperator reliability between the two experts showed a high ICC (0.998) and a low discrepancy between assessments (0.0750 ± 0.131; P <0.001). Figure 6 reveals high consistency between the ToothLM and manual length measurements of 3D dental models. The 4 groups—UI at T0, UI at T1, LI at T0, and LI at T1—showed normal distribution for both automatic and manual measurements. Paired-samples t tests comparing ToothLM and manual measurements showed differences of 0.067 ± 0.129 mm, 0.069 ± 0.091 mm, 0.032 ± 0.096 mm, and 0.037 ± 0.081 mm (all P <0.001).
Statistical comparison between ToothLM and manual length measurements. ∗ P <0.05; ∗∗ P <0.01; ∗∗∗ P <0.001. n, number of incisors (counting the same tooth separately at different time points).
The Bland-Altman plots ( Fig 7 ) show that the ToothLM demonstrated high consistency and stability across different groups. For UI at T0, the mean difference (MD) was 0.0665 mm, with a 95% confidence interval (CI) of–0.186 mm to 0.319 mm. For UI at T1, the MD was 0.0716 mm, with a 95% CI ranging from–0.0824 to 0.226 mm. For LI at T0, the MD was 0.0274 mm, with a 95% CI of–0.116 to 0.170 mm, and at T1, the MD was 0.0386 mm, with a 95% CI ranging from–0.103 to 0.180 mm.
Bland-Altman plots for comparing the ToothLM and manual length measurements ( red , a 95% CI provided for the limits of agreement; blue , the mean of the difference). The y-axis represents the difference between the 2 methods, and the x-axis represents the mean length measured by the 2 methods. MD, mean deviation; SD, standard deviation; n, number of incisors (counting the same tooth separately at different time points).
Moreover, the ICCs for all 4 groups were 0.999 ( P <0.001), demonstrating excellent consistency with manual length measurements. Therefore, the ToothLM showed minimal discrepancies compared with manual methods on the same set of models.
Based on the strong performance of the segmentation and length measurement algorithms, we integrated them into OERR-Net for grading OERR by length loss. The OERR-Net demonstrated good grading results for UI ( Fig 8 , A ) in the testing set, achieving an accuracy of 98.70%, a specificity of 98.65%, a recall of 100.00%, an F1-score of 85.71%, and a precision of 75.00%.
OERR-Net’s grading performance according to length loss: A, Evaluation of OERR-Net’s grading performance; B, Performance of OERR-Net in different resorption length intervals relative to the 2-mm threshold; C, Comparison between orthodontists and OERR-Net in grading. acc, accuracy; spc, specificity; rec, recall; F1, F1-score; prec, precision; S+J, senior and junior orthodontists (all orthodontists); N, numbers of incisors (treating the same tooth at different time points as a whole).
To evaluate grading accuracy near the critical 2 mm threshold, patients were stratified by their distance from it. For UI, accuracy remained high near the threshold and reached 100% farther away ( Fig 8 , B ). LI showed reduced accuracy near the threshold (66.67%), but improved to 96.92% farther away. This indicates strong UI performance overall, with LI grading near the threshold needing improvement.
In addition, the accuracy of the OERR-Net substantially surpassed that of the orthodontists (senior: 76.97%, junior: 71.71%, and all orthodontists: 74.34%), achieving 97.37% on the testing set for grading UI ( Fig 8 , C ). The average time for outputting results from OERR-Net was 132.78 seconds, whereas the average time for the orthodontists’ subjective grading was 279.74 seconds. The performance of OERR-Net in the length measurement of resorptive changes is presented in Supplementary Table II .
OERR-Net consistently reconstructed morphology in longitudinal CBCT scans and captured subtle apical changes, aiding visualization of resorption. Based on its outstanding performance, the visualization results of resorptive changes using surface deviation analysis were constructed. Figure 9 shows that the crown remained unchanged during orthodontic treatment, highlighting OERR-Net’s ability to maintain consistent morphologic reconstruction in longitudinal CBCT scans. At the same time, it accurately detected subtle morphologic changes in the roots.
Three-dimensional visualization of root resorption morphology using OERR-Net. The images show the surface with the most evident resorption sites.
Discussion
To support diagnosis, guide clinical decisions, and improve understanding of OERR severity, we developed a transformer-based assessment system. The comprehensive validation pipeline confirmed OERR-Net’s superior performance in segmentation accuracy, measurement reliability, and grading consistency. Through comparative validation with clinicians of varying experience levels, the system demonstrated superior accuracy in UI OERR grading. By establishing an automatic assessment pipeline, the system reduced clinical decision-making time by 50%, enhancing efficiency. Length-based grading directly guided clinical interventions, setting a new paradigm for OERR management.
Swin-UNETR’s hierarchical windowed self-attention sets new standards in apical segmentation accuracy, crucial for precise OERR length measurement. Experimental results showed that Swin-UNETR (DSC = 90.98%, ASSD = 0.1910 mm, and IoU = 86.33%) outperformed both U-Net and UNETR in quantitative analyses. It also demonstrated better attention to the apical region. The Swin Transformer’s hierarchical attention improves the capture of long-range dependencies and fine details. Although U-Net, a traditional CNN, struggles with long-range dependencies ( Fig 3 ), Swin-UNETR effectively captures global context. Compared with UNETR, Swin-UNETR focuses on smaller regions, enhancing apical accuracy through precise root apex localization.
The most current automatic OERR systems focus on volumetric changes. , Yet, existing grading standards lack clinical guidance, limiting practical use. By aligning with clinical practice and most guidelines, ,, length-based OERR grading underscores the value of automatic linear assessment. However, there is still a significant gap in automatic research in this area. Xu et al achieved 0.94 grading accuracy by classifying OERR from paired 2D CBCT slices based on root length changes. Projection errors arose because of misaligned tooth axes. In contrast, our system uses 3D anatomic features for precise linear measurement, eliminating the need for reorientation.
Adding 3D visualization in OERR-Net helped overcome the key limitations of traditional linear measurements in assessing OERR patterns. Figure 9 shows that OERR exhibited not only horizontal but also slanted and middle resorption patterns according to the modified Malmgren classification. Notably, slanted resorption leads to greater tissue loss, but traditional methods using the most distal point often underestimate it. Therefore, to accurately characterize OERR patterns, the OERR-Net system enhances the assessment by integrating advanced 3D visualization of OERR.
To evaluate error accumulation in OERR-Net, this study developed an end-to-end validation process. Results showed lower accuracy for the integrated system than ToothLM, with errors of 0.72 ± 0.75 mm vs 0.067 ± 0.129 mm for UI at T0 ( P <0.001). Despite Swin-UNETR’s high segmentation accuracy (DSC = 90.98%), the segmentation model was the primary source of error. This likely reflects CBCT’s limitations, in which 1-3 voxel deviations can cause 0.3-1.2 mm errors, equal to 15%-60% of the 2 mm OERR clinical threshold. Thus, although many segmentation algorithms achieved excellent metrics, their direct application to OERR assessment requires caution.
OERR-Net achieved superior diagnostic accuracy for incisors compared with subjective clinician assessments (97.37% vs 74.34% for UIs and 96.82% vs 94.27% for LIs). This is notable, as up to one-third of patients with root resorption remain undetected. The 2 mm threshold, as defined by clinical OERR guidelines, marks the point at which clinical intervention becomes necessary. Therefore, this study adopted 2 mm as the grading threshold. Figure 8 , B shows that grading accuracy moderately declines when the UI length change is 1.6-2.4 mm. For LI specifically, a similar trend is observed in the 1.2-2.8 mm range. In these challenging patients, slight deviations may potentially affect the grading results. These findings suggest that this boundary region is particularly sensitive to small variations and should be interpreted with caution.
To minimize unnecessary radiation exposure, current clinical guidelines do not recommend the use of CBCT for routine monitoring of OERR. Importantly, we applied OERR-Net exclusively to ethically approved cases. These involved patients with skeletal malocclusion requiring orthognathic surgery, temporomandibular joint disorders, or other medically indicated conditions requiring CBCT imaging. Furthermore, all scans were acquired using low-resolution protocols to further reduce radiation exposure. , In addition, OERR-Net also shows promise for retrospective CBCT analysis, enabling rapid large-scale OERR screening. Since there are no effective treatments for OERR, timely detection provides an opportunity for targeted recommendations to maintain tooth stability. Early detection enables targeted advice, such as better oral hygiene and short-term splinting for patients with multiple affected teeth.
This study has several limitations that require further investigation and refinement in future work. Using CBCT data from only 1 hospital device limits the model’s generalizability. Future studies should assess the model on CBCT data from multiple devices and parameters, such as tube voltage, current, voxel size, and spacing. In addition, the model requires testing across more patients and challenging conditions, such as metal artifacts or motion blur. Second, improving CBCT resolution through super-resolution processing is crucial for enhancing OERR assessment accuracy. Enhancing image resolution can capture finer details of dental structures, providing essential support for accurate CBCT image segmentation and quantitative analysis.
Conclusions
This study introduces OERR-Net, an artificial intelligence system for assessing root resorption after orthodontic treatment. It surpasses orthodontists in accuracy and provides 3D visualization of resorptive changes. This study proposes a novel automatic assessment approach that aligns with clinical guidelines, while offering a comprehensive assessment of 3D morphologic changes. The success of OERR-Net is attributed to the Transformer-based Swin-UNETR segmentation model and a highly accurate automatic tooth length measurement algorithm. Importantly, it highlights the significance of end-to-end validation in analyzing cumulative system errors, especially segmentation. Despite the advanced models demonstrating excellent quantitative metrics, they remain a primary source of error in OERR assessment tasks that demand an extremely narrow precision range. Therefore, enhancing the resolution of CBCT scans without increasing radiation exposure could be a critical step toward improving the accuracy of OERR assessment in future research.
Author credit statement
Dan Yang contributed to conceptualization, data curation, formal analysis, methodology, validation, visualization, writing– original draft, and writing– review and editing; Shuangjiang Yu contributed to conceptualization, formal analysis, methodology, validation, visualization, and writing– review and editing; Yue Zhao contributed to conceptualization, formal analysis, and writing– review and editing; Yao Hu contributed to conceptualization, formal analysis, and writing– review and editing; Jinlin Song contributed to conceptualization, formal analysis, supervision, and writing– review and editing; Leilei Zheng contributed to conceptualization, formal analysis, and writing– review and editing; Yang Liu contributed to conceptualization, data curation, formal analysis, and writing– review and editing. All authors gave final approval of the version to be submitted.
Stay updated, free dental videos. Join our Telegram channel
VIDEdental - Online dental courses