The study aimed to assess the diagnostic value of high-resolution ultrasonography (HR-US) in the detection of anterior disc displacement (ADD) of the temporomandibular joint. Relevant trials reported in MEDLINE, the Chinese National Knowledge Infrastructure Database, the Chinese Biomedical Literature Database, and Embase were identified. A manual search was also performed. The quality of retrieved data was evaluated using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) criteria. Data were extracted and cross-checked, and a statistically rigorous meta-analysis was performed using a hierarchical summary receiver operating characteristic model (HSROC). The clinical utility of results was assessed using Fagan nomograms (Bayes theory). All data were evaluated using Stata software. A total 11 studies including 1096 subjects were included in the analysis; all reported the utility of HR-US for the diagnosis of ADD with reduction (ADDWR) and without reduction (ADDWoR). For ADDWR, the weighted sensitivity and specificity were 0.83 (95% confidence interval (CI) 0.78–0.88) and 0.85 (95% CI 0.76–0.92) respectively. The lambda value was 3.41 (95% CI 2.37–4.46) and the Fagan nomogram pre-test probability 58%, with a positive likelihood ratio (LR) of 6.01. The positive post-test probability was 89%, with a negative LR of 0.20. The negative post-test probability was 21%. The positive increase in diagnostic utility was 31% and the negative decrement in that value 37%. For ADDWoR, the weighted sensitivity and specificity values were 0.72 (95% CI 0.59–0.81) and 0.90 (95% CI 0.86–0.93), respectively. The lambda value was 3.69 (95% CI 2.39–4.99) and the Fagan nomogram pre-test probability 38%, with a positive LR of 7.00. The positive post-test probability was 82%, with a negative LR of 0.32. The negative post-test probability was 16%. The increase in diagnostic utility was 44% and the negative decrement in that value 22%. HR-US delivers acceptable performance when used to diagnose ADD, being superior for the detection of ADDWoR than ADDWR, and exhibiting a lower negative diagnostic value in the detection of ADDWoR than ADDWR. HR-US may serve as a new method for the rapid diagnosis of ADD. The method has the advantages of simplicity and low cost. Given the uncertainty in some of the estimated values, more high-quality studies are needed to assess that diagnostic efficacy.
Temporomandibular disorders (TMD) and temporomandibular joint dysfunction syndrome exhibit similar clinical manifestations, but the pathogenesis of the conditions remains unclear. TMD affects 10–70% of the population. In terms of the pathophysiology of TMD, movement of the articular disc is a prime focus of research. Anterior disc displacement (ADD) occurs in a high proportion of TMD patients, and can be divided into ADD with reduction (ADDWR) and ADD without reduction (ADDWoR). ADD is presently diagnosed using magnetic resonance imaging (MRI) and arthrography, but MRI is expensive and arthrography invasive. Thus, both techniques have limitations. An inexpensive non-invasive diagnostic test is required.
MRI is very effective when used for the early diagnosis of TMD and in monitoring treatment. MRI allows specific and sensitive interpretation of soft tissue problems and inflammatory conditions in joints, and serves as the gold standard diagnostic test for ADD. In recent years, several techniques have been developed to diagnose conditions of the temporomandibular joint (TMJ). These include thermography, the use of jaw-tracking devices, and electromyography. The first research regarding ultrasound in TMJ imaging was published in 1989. Ultrasonography employs ultrasonic waves emitted by a transducer. The waves enter tissue and reflected echoes are translated into images, allowing disease evaluation.
We have found the use of high-resolution ultrasonography (HR-US) for the detection of ADD of the TMJ to be of diagnostic value. Standardized examination of the TMJ via ultrasonography is gaining increasing attention. Thus, we collected all of the relevant literature to evaluate the diagnostic utility of HR-US in detecting ADD of the TMJ.
Materials and methods
Our inclusion criteria were the following: (1) diagnostic research studies; (2) ADD patients without systemic disease; (3) interventional use of HR-US to plan patient treatment, with MRI or arthrography serving as the diagnostic gold standard in the control groups; (4) ultrasonographic diagnosis by physicians blinded to the gold standard diagnosis; and (5) calculation of sensitivities, specificities, and accuracies, or the availability of data allowing such calculations to be made. Exclusion criteria were the following: (1) duplicate work; (2) animal experiments; (3) reviews of the literature; and (4) incomplete data.
The Bayes Library of Diagnostic Studies and Reviews was consulted. Medical subject heading (MeSH) terms in combination with free terms were used to search MEDLINE/PubMed (1966 to February 2014), the Chinese National Knowledge Infrastructure Database (1994 to February 2014), the Chinese Biomedical Literature Database (1978 to February 2014), and Embase (1974 to February 2014). The retrieval cut-off was February 2014. No language restriction was imposed. The MeSH terms ‘ultrasonography’, ‘diagnosis’, and ‘temporomandibular joint’ served as inputs. Two researchers independently evaluated all hits and any disagreement was resolved by discussion. The following data were extracted: first author’s name, publication date, basic subject characteristics, US transducer used, the diagnostic gold standard employed, the diagnostic purpose, sample size, and the true-positive (TP), true-negative (TN), false-positive (FP), and false-negative (FN) values. Again, any disagreement was resolved by discussion.
Assessment of the risk of bias was conducted using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) model, which assesses 14 items. Each bias risk was categorized as ‘low’, ‘unclear’, or ‘high’. The risk of study bias was categorized as ‘low’ if all items were low risk, as ‘unclear’ if any item was of unclear risk, and as ‘high’ if any item was high risk. Items 3–7, 10–12, and 14 indicate bias; items 1 and 2 indicate variation; items 8, 9, and 13 indicate high-quality (bias-free) data. Quality assessment results were analyzed statistically with reference to these criteria.
Statistical analysis strategy
Statistical analyses were performed using Stata 12.0 (StataCorp LP, College Station, TX, USA). A hierarchical summary receiver operating characteristic (HSROC) statistical model was established using default parameters. Classical parameters were evaluated to reveal diagnostic efficiency. These were the HSROC parameters of lambda, theta, and beta; the closer the lambda value to 5, the higher the efficiency of HR-US. Bayes’ theory was used: Fagan nomograms were calculated as measures of pre-test probabilities or pre-test odds (1 minus the probability value); post-test odds ± the likelihood ratios (LRs) of pre-test odds; post-test probabilities (post-test odds/(1 + post-test odds)); the sensitivity, specificity, lambda, diagnostic odds ratio (DOR), pre-test probability, and post-test probability values; an increase in the positive diagnostic measure (positive post-test probability minus pre-test probability); and an increase in the negative diagnostic measure (negative post-test probability minus pre-test probability). Any increase in a diagnostic value was used to assess diagnostic accuracy. Funnel plots were used to screen for publication bias, with the aid of rank correlation.
A total of 328 records were identified. Duplicate articles were removed by screening of titles and abstracts, after which a total of 56 articles remained. Twenty-five of these were comprehensive. A further six were excluded because of a lack of focus on disc problems, three because they were systematic reviews or (simply) reviews, two because data were incomplete, one because US was not used, and one because subjects with certain diseases, such as arthritis, were excluded. Thus, 11 articles were included in the final evaluation; these articles included 1096 subjects.
Essential characteristics of included studies
|Author||Characteristics of the subjects; number of M/F and age range||Transducer||US diagnostic criteria||Gold standard||Target||Initial positioning||Ultrasonic type|
|Yang et al. , 2012||M/F 15/20
|12 MHz||Hyperechoic||MRI||ADDWR||Vertical||Static and dynamic|
|Cui et al. , 2009||M/F 15/25
|14 MHz||Hyperechoic||Arthrography||ADDWR, ADDWoR||60° to Frankfort plane||Static and dynamic|
|Landes et al. , 2006||M/F 44/24
|8–12.5 MHz||Hyperechoic||MRI||ADDWR, ADDWoR||Horizontal||Dynamic, 2D and 3D|
|Emshoff et al. , 2002||M/F 9/55
|Emshoff et al. , 2003||M/F 8/40
|12.5 MHz||Hyperechoic to isoechoic||MRI||DC, DD||Vertical||Dynamic|
|Jank et al. , 2005||100 subjects including 200 TMJs||12.5 MHz||Hyperechoic||MRI||DC, JE, DD||Unclear||Static|
|Sinha et al. , 2012||10 subjects||10 MHz||Hyperechoic||MRI||DD||Sagittal to frontal plane||Dynamic|
|Emshoff et al. , 2002||M/F 49/159
|Jank et al. , 2001||M/F 15/51
|12 MHz||Hyperechoic||MRI||DD||Horizontal and vertical||Static|
|Landes et al. , 2000||M/F 30/25
28 subjects <30 years, 27 subjects >30 years
|5–10 MHz||Hyperechoic||MRI||ADDWR||Horizontal and vertical||Dynamic|
|Landes et al. , 2007||M/F 10/23
|8–12.5 MHz||Hyperechoic||MRI||DD, DC||Horizontal||Static 2D and 3D|
|Yang et al. , 2012||24||1||6||9||19||1||4||16|
|Cui et al. , 2009||21||5||6||8||6||0||1||33|
|Landes et al. , 2006||29||22||18||36||9||13||12||73|
|Emshoff et al. , 2002||22||5||5||96||50||3||10||65|
|Emshoff et al. , 2003||41||5||2||48||–||–||–||–|
|Jank et al. , 2005||127||5||11||57||66||11||10||113|
|Sinha et al. , 2012||10||0||0||10||–||–||–||–|
|Emshoff et al. , 2002||176||16||45||104||97||15||45||184|
|Jank et al. , 2001||68||10||19||35||33||9||21||69|
|Landes et al. , 2000||45||1||5||6||–||–||–||–|
|Landes et al. , 2007||20||17||10||19||8||11||7||40|
Literature quality assessment
Most publications appeared to meet all 11 quality-assessment criteria ( Table 3 ); we found no obvious bias in or extensive variation among the studies included. No publication met quality assessment criteria 12, 13, and 14. We speculate that not all patients had given reasonable explanations for why they withdrew. Also, not all papers reported unexplained test results. Quality assessment item 4 data showed that the intervals between the reference and index tests were sufficiently short to eliminate the possibility that the nature of disease had changed. Thus, disease progression was unlikely. Items 10 and 11 of the quality assessment revealed that blinding had been strictly implemented. Generally, all studies reviewed were of good quality.
|1. Was the spectrum of patients representative of the patients who will receive the test in practice?||22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32|
|2. Were selection criteria clearly described?||22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32|
|3. Is the reference standard likely to correctly classify the target condition?||22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32|
|4. Is the time period between reference standard and index test short enough to be reasonably sure that the target condition did not change between the two tests?||22, 24, 25, 26, 27, 28, 29, 30, 32||23, 31|
|5. Did the whole sample or a random selection of the sample, receive verification using a reference standard of diagnosis?||22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32|
|6. Did patients receive the same reference standard regardless of the index test result?||22, 24, 25, 26, 27, 28, 29, 30, 31, 32||23|
|7. Was the reference standard independent of the index test (i.e. the index test did not form part of the reference standard)?||22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32|
|8. Was the execution of the index test described in sufficient detail to permit replication of the test?||22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32|
|9. Was the execution of the reference standard described in sufficient detail to permit its replication?||22, 24, 25, 26, 27, 28, 29, 30, 31, 32||23|
|10. Were the index test results interpreted without knowledge of the results of the reference standard?||22, 24, 25, 26, 29, 31, 32||23, 27, 28, 30|
|11. Were the reference standard results interpreted without knowledge of the results of the index test?||22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32|
|12. Were the same clinical data available when test results were interpreted as would be available when the test is used in practice?||22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32|
|13. Were uninterpretable/intermediate test results reported?||22, 23||24, 25, 26, 27, 28, 29, 30, 31, 32|
|14. Were withdrawals from the study explained?||22, 23||24, 25, 26, 27, 28, 29, 30, 31, 32|
On meta-analysis of the utility of HR-US to diagnose ADDWR ( Table 4 ), the weighted sensitivity and specificity were 0.83 (95% confidence interval (CI) 0.75–0.88) and 0.85 (95% CI 0.76–0.92), respectively. Lambda was 3.41 (95% CI 2.37–4.46). The Fagan nomogram pre-test probability was 58%, with a positive LR of 6.01. The positive post-test probability was 89%, with a negative LR of 0.20. The negative post-test probability was 21%. If a patient was assigned a 58% possibility of ADDWR, it was 89% probable that a definitive diagnosis of ADDWR would be made if the HR-US test was positive. When the test was negative, the patient had a 21% possibility of ADDWR. HR-US testing increased the positive diagnosis rate by 31% and the negative rate by 37%.