This study compared the diagnostic ability of computed tomography (CT), magnetic resonance imaging (MRI), ultrasonography (US), and positron emission tomography/CT (PET/CT) for extracapsular spread. MEDLINE, EMBASE, China National Knowledge Infrastructure, Chinese Biomedical Literature Database, and Sciencepaper Online databases were searched. The mean sensitivity of CT was 0.77, specificity was 0.85, positive likelihood ratio (LR+) was 4.839, negative likelihood ratio (LR−) was 0.287, diagnostic odds ratio (DOR) was 19.239, area under the summary receiver operating characteristic curve (AUC) was 0.8615, and Q * was 0.7922. The mean sensitivity of MRI was 0.85, specificity was 0.84, LR+ was 4.615, LR− was 0.191, DOR was 60.270, AUC was 0.9454, and Q * was 0.8844. The sensitivity and specificity of PET/CT were both 0.86. The mean sensitivity of US was 0.87 and specificity was 0.75. Overall, CT had the lowest sensitivity ( P = 0.0355); specificity was similar for all methods ( P = 0.1159). CT and MRI had equivalent summary diagnostic efficacy (AUC and Q *) ( P > 0.05). This evidence indicates that CT might have a relatively lower sensitivity when diagnosing ECS, and that CT and MRI may be similarly effective in diagnosing ECS. MRI showed positive trends in diagnosing ECS. Evidence was lacking for PET/CT and US diagnosis. More related studies are required to confirm these inconclusive results.
Extracapsular spread (ECS) is the spread of cancer cells beyond the capsule of a metastatic lymph node into the surrounding tissues. ECS is a common phenomenon in head and neck cancer patients and is considered to be a sign of tumour invasion, metastasis, and a poor prognosis.
Currently, ECS can be visualized using multiple imaging modalities, which help to determine tumour staging, plan treatment, and predict the prognosis. Commonly used ECS imaging methods currently include computed tomography (CT), magnetic resonance imaging (MRI), ultrasonography (US), and positron emission tomography/CT (PET/CT). Although several trials have reported the diagnostic efficacy of these imaging methods, the results have been inconsistent. Furthermore, none of these trials could compare all of these imaging methods at one time in terms of diagnostic test accuracy. Therefore, the present systematic review and meta-analysis was conducted to determine and compare the diagnostic efficacies of some commonly used imaging methods (CT, MRI, US, and PET/CT) in distinguishing ECS secondary to head and neck cancers.
Materials and methods
This systematic review was granted an exemption by the local institutional review board. A protocol was finalized a priori, and the review steps listed below were conducted in compliance with the protocol. The study inclusion, risk of bias assessment, and data extraction were conducted by two independent authors, and any discrepancies were resolved by discussion.
The inclusion criteria were as follows: (1) Study type: studies included were diagnostic test accuracy studies that were designed as cohort studies; (2) Participants: head and neck cancer patients with a diagnosis confirmed by pathology; (3) Index test: different imaging modalities including CT, MRI, PET/CT, and US; (4) Reference standard: pathology; and (5) Outcome: true-positive (TP), false-positive (FP), false-negative (FN), and true-negative (TN), or other data that could help to calculate these four outcomes. All of these data were further used to calculate the sensitivity, specificity, positive likelihood ratio (LR+), and negative likelihood ratio (LR−).
The strategy was to collect all related studies using electronic and manual searches. The following databases were searched electronically: MEDLINE (via OVID, 1948 to February 12, 2015), EMBASE (via OVID, 1980 to February 12, 2015), Chinese Biomedical Literature Database (1978 to February 12, 2015), and China National Knowledge Infrastructure (1994 to February 12, 2015). The grey literature (studies that have not been formally published) was also searched via Science Paperonline (to February 12, 2015). The search strategies were designed using references from the Cochrane Handbook for Diagnostic Accuracy Reviews , draft version 0.4, which suggests a combination of medical subject heading (MeSH) terms and free text words. The MeSH terms used included ‘lymphatic metastasis’, ‘head and neck neoplasms’, ‘ultrasonography’, ‘magnetic resonance imaging’, ‘positron emission tomography’, and ‘computed tomography’, and the key words used included ‘extracapsular spread’, ‘extranodal spread’, ‘extracapsular invasion’, and ‘extranodal invasion’.
Two reviewers scanned the titles and abstracts in duplicate to identify any possibly eligible studies. The full texts of the possibly eligible studies were retrieved for additional evaluation. For the study scanning and final inclusion phases, consistency between reviewers was assessed via the kappa value.
The Quality Assessment of Diagnostic Accuracy Studies tool (QUADAS-2) was used for quality assessment. This includes four domains: patient selection, index test, reference standard, and flow and timing. Each domain was assessed in terms of risk of bias, and the first three domains were also assessed regarding applicability. Signalling questions were included to help determine the risk of bias.
In accordance with the QUADAS-2 instructions, the tool was tailored to the study by omitting two signalling questions: ‘If a threshold was used, was it pre-specified?’ and ‘Did all the patients receive the same reference standard?’
The signalling questions that remained in the QUADAS-2 for this meta-analysis were as follows: (1) Patient selection: Was a consecutive or random sample of patients enrolled? Was a case–control design avoided? Did the study avoid inappropriate exclusions? (2) Index test: Were the index test results interpreted without knowledge of the reference standard results? (3) Reference standard: Was the reference standard likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the index test results? (4) Flow and timing: Was an appropriate interval allowed between the index tests and reference standard? Did all the patients receive a reference standard? Were all patients included in the analysis?
All the studies were classified as either ‘A’ low risk of bias, ‘B’ unclear risk of bias, or ‘C’ high risk of bias.
The data extraction form used in this study was similar to that used in a previous systematic review reported by Li et al. The form included the following items: re-evaluation of eligibility, basic information from the study (i.e. authors, title, publication date, and correspondence), participant characteristics (i.e. age, sex, inclusion criteria, tumour types, tumour location, clinical examination of the cervical lymph node, surgery types, number of patients included, and follow-up), study location (i.e. country, source of patients), index test and reference standard (i.e. CT details and pathological diagnosis, diagnostic criteria, blinding and consistency of the radiologists), study design (i.e. study types and study), and outcomes (TP, FP, FN, and TN results).
The software Meta-DiSc version 1.4 (the Unit of Clinical Biostatistics team of the Ramón y Cajal Hospital, Madrid, Spain) and Stata version 11.0 (StataCorp LP, College Station, TX, USA) were used to perform the meta-analysis. The studies were pooled when no significant clinical or methodological heterogeneity was found. Slight heterogeneity was detected by meta-regression when the number of studies included exceeded 10. Considering the nature of human cervical lymph nodes, the unit chosen for analysis was a single lymph node, neck level, or individual patient. The measures for diagnostic efficacy were sensitivity, specificity, LR+, LR−, and the diagnostic odds ratio (DOR). A summary receiver operating characteristic (SROC) curve was drawn for each meta-analysis, and the area under the curve (AUC) and Q * (the point on the curve where sensitivity and specificity are equal) were calculated. P < 0.05 was considered to be statistically significant.
When comparing different imaging strategies, GraphPad Prism 5 software was used (GraphPad Software, San Diego, CA, USA). The Z -test was used for pair-wise comparisons, with the following formula: Z = (VAL1 − VAL2)/√(SE1 2 + SE2 2 ), where the variable VAL is the mean sensitivity, specificity, AUC, or Q *, and SE is the standard error of the corresponding variable. For comparisons among all imaging modalities, one-way analysis of variance (ANOVA) was used.
Considering the discrepancies in the presentation of outcome in the different studies included, it was decided to divide the unit of analysis used in the systematic review into ‘neck/node level’ (NL), in which the neck level/lymph node was taken as the unit when detecting ECS, and ‘patient level’ (PL), in which an individual patient was taken as the unit when detecting ECS. This was also because NL introduced many more study objects than the actual number of patients included and this could influence the diagnostic efficacy.
Search and study inclusion
A total of 3971 records were retrieved in the initial search. After screening, 3929 records were excluded and 42 remained for further evaluation. After the full texts had been retrieved and a more detailed assessment performed, 15 studies (15 search records) were finally included ( Fig. 1 ). The consistency between the two reviewers during the study scanning phase and the final inclusion phase was acceptable, with kappa values of 0.89 and 1.00, respectively.
Characteristics of the studies included
A total of 15 studies with 1155 patients and 957 neck levels were involved. All patients underwent CT, MRI, US, or PET/CT and were accounted for in the meta-analysis. The details are listed in Table 1 .
|Study ID||Country||Study type||Number (M/F)||Age, years, mean (range)||Tumour location||Unit||Imaging used|
|Kimura 2008||Japan||RS||109 (89/20)||66 (56–76)||HP, OP, larynx, oral floor, tongue, gingiva, cheek, NP, palate, unknown sites||NL ( n = 140)||MRI|
|King 2004||China||RS||17 (16/1)||62.4 (50–85)||HP, HP/OP, larynx, tongue, oral cavity||NL ( n = 51)||MRI|
|Lodder 2013||Netherlands||RS||39 (25/14)||63 (46–85)||Tongue, HP, OP, gingiva, thyroid, maxillary sinus, parotid gland, submandibular gland, NP, palate, larynx, nasal cavity, unknown site||NL ( n = 60)||MRI|
|Steinkamp 2002||Germany||RS||69||58.2||Tongue, floor of mouth, retromolar trigone, cheek, gingiva||NL ( n = 79)||MRI|
|Sumi 2011||Japan||RS||43 (37/6)||62 (37–82)||Larynx, HP, OP, oral floor, tongue, gingiva, buccal mucosa, nasopharynx, palate, unknown sites||NL ( n = 54)||MRI|
|Dhanda 2014||UK||RS||83||–||Tongue, floor of mouth, cheek, gingiva||PL ( n = 83)||MRI|
|Carvalho 1991||UK||RS||28||–||Thyroid, larynx, NP, HP, OP, tongue, parotid gland, submaxillary gland, maxillary sinus, superior segmental oesophagus, mammary gland||NL ( n = 21)||CT|
|King 2004||China||RS||17 (16/1)||62.4 (50–85)||HP, HP/OP, larynx, tongue, oral cavity||NL ( n = 51)||CT|
|Luo 1997||China||RS||60 (35/25)||16–75||Thyroid, HP, larynx, oral cavity, sinus, OP, parotid gland||NL ( n = 101)||CT|
|Souter 2009||New Zealand||RS||127||–||Tongue, larynx, NP, oral cavity, pharynx, unknown site||NL ( n = 149)||CT|
|Steinkamp 1999||Germany||RS||165 (136/29)||57.5||Head and neck||NL ( n = 97)||CT|
|Chai 2013||USA||RS||100 (79/21)||62 (37–89)||Head and neck||PL ( n = 100)||CT|
|Url 2013||Austria||RS||49 (44/5)||60 (49–71)||Oral cavity, OP, larynx, skin, unknown site||PL ( n = 49)||CT|
|Luo 1997||China||RS||60 (35/25)||16–75||Thyroid||NL ( n = 76)||US|
|Steinkamp 2003||Germany||RS||97||58.2||Thyroid, larynx, NP, HP, OP, tongue, parotid gland, submaxillary gland, maxillary sinus, superior segmental oesophagus||NL ( n = 97)||US|
|Chun 2014||Korea||RS||89 (80/9)||62.5 (32–91)||Larynx||NL ( n = 62)||PET/CT|
|Joo 2013||Korea||RS||80 (55/25)||54 (23–83)||Tongue, floor of mouth, retromolar trigone, cheek, gingiva||NL ( n = 71)||PET/CT|
Risk of bias of the studies included
The risk of bias and applicability of the studies included were assessed using the QUADAS-2 tool. All of the studies had good applicability. The results of the risk of bias assessment showed that one study had a high risk of bias and the remaining 14 studies had an unclear risk of bias ( Table 2 ).
|Study ID||Questions a||Risk of bias b||Applicability (high or low)|
a Patient selection: (1) Was a consecutive or random sample of patients enrolled? (2) Was a case–control design avoided? (3) Did the study avoid inappropriate exclusions? Index test: (4) Were the index test results interpreted without knowledge of the reference standard results? Reference standard: (5) Was the reference standard likely to correctly classify the target condition? (6) Were the reference standard results interpreted without knowledge of the index test results? Flow and timing: (7) Was an appropriate interval allowed between the index tests and reference standard? (8) Did all the patients receive a reference standard? (9) Were all patients included in the analysis?
Evaluation of diagnostic ability
Six studies reported MRI results. For the node/neck level (NL), MRI had a mean sensitivity of 0.85 (95% confidence interval (CI) 0.80–0.89), specificity of 0.84 (95% CI 0.77–0.90), LR+ of 4.615 (95% CI 2.255–9.447), LR− of 0.191 (95% CI 0.072–0.509), and DOR of 60.270 (95% CI 9.314–390.00) ( Fig. 2 ). The AUC was 0.9454 and the Q * was 0.8844. For the patient level (PL), only one study was included; the sensitivity was 0.08 (95% CI 0.02–0.22) and the specificity was 1.00 (95% CI 0.92–1.00).
The effectiveness of different MRI diagnostic criteria was also assessed ( Table 3 ). Fourteen criteria were gathered. The criterion ‘short-axis diameter >15 mm’ had the highest sensitivity of 0.93. The criteria ‘infiltration of adjacent planes’, ‘time–signal intensity curve (TIC) (>44% nodal area with type 2 TIC pattern)’, and ‘short-axis diameter >25 mm’ had specificities of 100%. The criteria ‘shaggy margin (CET1WI)’, ‘TIC (>44% nodal area with type 2 TIC pattern)’, and ‘short-axis diameter >25 mm’ had the highest accuracies (89%). TIC, which can be obtained from MRI images processed using the software ImageJ (National Institutes of Health, Bethesda, MD, USA) and Mathematica (Wolfram Research, Champaign, IL, USA), was classified automatically on the basis of the increment ratio, the time to peak enhancement ( T peak ), and the washout ratio (WR) into four types (types 1–4). The type 2 TIC pattern, which is used as one of the comprehensive diagnostic criteria for ECS, has the following characteristics: increment ratios greater than 20% and peak times equal to or longer than 120 s.