## Abstract

## Objectives

Micro-invasive treatment (sealing, infiltration) seems more efficacious to arrest early (non-cavitated) proximal carious lesions than non-invasive treatment (NI). Uncertainty remains as to the efficacy of sealing versus infiltration and the robustness of the evidence. We aimed to review and synthesize this evidence using pairwise and network meta-analysis (NMA) and to perform trial sequential analysis (TSA).

## Sources

Searching three electronic databases (Medline, Embase, Cochrane Central) was complemented by hand searches and cross-referencing.

## Study selection

Randomized controlled trials comparing micro-invasive strategies against each other, NI or placebo for managing proximal carious lesions were included. The primary outcome was radiographically assessed lesion progression. Pairwise and Bayesian network meta-analyses as well as TSA were used for synthesis.

## Data

Thirteen split-mouth studies (486 participants, mean age 15 years) were included. Mean follow-up was 25 months (min/max 12/36 months). Firm evidence on the superior efficacy of sealing/infiltration over NI (OR; 95% CI: 0.25; 0.18–0.32) was reached. Firm evidence was also reached on the superior efficacy of sealing (OR; 95% CI: 0.29; 0.18–0.46, 7 studies) and infiltration (OR; 95% CI: 0.22; 0.15–0.33, 7 studies) over NI. One study compared infiltration versus sealing and found no significant difference (0.70; 0.34–1.47). Based on Bayesian NMA, infiltration was ranked first in 80% of the simulations (sealing 20%, NI 0%). The surface-under-the-cumulative-ranking (SUCRA) values were 0.90 for infiltration, 0.60 for sealing and 0.00 for NI. We did not detect significant inconsistency (p = 0.89, node-split).

## Conclusions

Sealing or infiltration are likely to be more efficacious for arresting early (non-cavitated) proximal lesions than NI.

## Clinical significance

Practitioners should strive to perform micro-invasive treatment instead of NI for early proximal lesions. The decision between sealing or infiltration should be guided by practical concerns beyond efficacy.

## 1

## Introduction

Dental caries is the most prevalent disease worldwide, affecting billions of individuals . Traditionally, carious lesions have been treated by removing all carious dental hard tissue and replacing it with restorations. Given that restorations need replacement at some point in most instances, with escalating dental hard tissue loss and costs at each restorative cycle, the focus of modern dentistry is towards controlling caries and carious lesions .

Non-invasive strategies (NI) remove no carious tissue at all and include dietary control, biofilm control or control of de- and remineralization (via fluorides etc.), often combined with each other. Micro-invasive strategies remove a few micro-meters of tissue during application, usually when conditioning the tooth surface with acids , and install a diffusion barrier onto (lesion sealing) or within (lesion infiltration) the carious tissue. This barrier (of resins or glass ionomer cements) impedes acid diffusion into the hard tissue and further mineral loss from it, thereby arresting the lesion.

Early proximal carious lesions are highly prevalent . Restoring them sacrifices an especially large amount of sound hard tissue given the need to access these lesions through the marginal ridge. Moreover, proximal restorations show lower survival rates than non-proximal lesion, i.e. need replacement in even shorter intervals. Hence, non-or micro-invasive treatments are especially relevant for such early, i.e. non-cavitated, proximal lesions.

A Cochrane review, published in 2015, has concluded that micro-invasive treatments for proximal carious lesions are supported by sound evidence, and has recommended these over NI or no therapy . However, a limited number of studies (eight) was included, five on sealing and three on infiltration. In the three years since, more studies have been published. Moreover, this review and also subsequent reviews and meta-analyses have not performed a comparative analysis allowing to conclude as to the relative efficacy of sealing versus infiltration versus alternative strategies, for example using network meta-analysis (NMA). Also, no quantitative analysis has been performed showing that the conclusion that micro-invasive therapies is superior to non-invasive alternatives is truly robust. Showing such robustness is important, as meta-analyses building on only a few early trials are prone to risk of bias and erroneous conclusions, as are updates of meta-analysis due to repeated testing . Trial sequential analysis (TSA) is a technique which addresses both issues.

We aimed to systematically review randomized trials involving micro-invasive treatments of proximal carious lesions, and to synthesize these studies using pairwise and NMA. Moreover, TSA was used to assess if the accrued evidence is qualitatively and quantitatively sufficient for making robust conclusions. In light of the majority of dentists worldwide continuing to restoratively manage early proximal lesions , such confirmation and clear guidance as to which micro-invasive strategy to use seems needed.

## 2

## Methods

This review has been registered a priori (CRD42018080895). Reporting follows the PRISMA statement and its extension for NMA .

## 2.1

## Search

Three electronic databases (Medline via PubMed, Embase via Ovid, and Cochrane Central) were searched independently by two reviewers (SR, FS) to identify potentially eligible studies. A three-pronged search strategies using the Boolean operator AND was used, combining: disease centered terms (caries, decay, carious), intervention centered terms (sealing, infiltration) and trial centered terms (randomized, random, trial, participants, clinical studies, lesion progression). No controlled vocabularies (MeSH etc.) were employed. Reviewers screened the identified titles and abstracts of records against the inclusion criteria. If either author found a record potentially eligible, full-texts were assessed, again independently and in duplicate. Inclusion was decided by the two reviewers in consensus, or in consultation with a third reviewer (GG). Further articles were identified by examining the references of retrieved full-text studies. The search was not restricted by language. Neither authors nor journals were blinded to reviewers.

## 2.2

## Selection

Study design and interventions: Parallel group and split-mouth randomized controlled clinical trials (RCTs) comparing micro-invasive strategies against each other, against NI or placebo were included. We excluded cross-over trials from this review as the condition, dental caries, cannot return to baseline level following the initial intervention, neither can the treatment (i.e. sealing or infiltration etc. cannot be reversed).

Participants: Participants with permanent or primary teeth, and proximal carious lesions with intact surface status (non-cavitated), presumed clinically (visually-tactile if possible) or by radiographic diagnosis (with a lesion extension to which cavitation is unlikely) on posterior teeth. A sensitivity analysis to evaluate the impact of different lesion depth (into enamel or into dentin radiographically) was planned, but not conducted given insufficient data reporting depth. Treatment of buccal/vestibular lesions using micro-invasive means was not considered.

Outcome measures: Our primary outcome was lesion progression, assessed using digital subtraction radiography (DSR) or, if not available, pairwise reading or, if not available, lesion staging (for example according to radiographic enamel depth – E1, E2 outer/inner enamel half – or dentin depth – D1-3 in dentin thirds). We chose to prioritize outcome measures this way as this implies a ranking of sensitivity (DSR being most sensitive). Our secondary outcomes were:

- –
subjective evaluation of the treatments by participants and dentists

- –
efficiency (time needed for the intervention), costs or cost-effectiveness (regardless of how effectiveness was defined); and

- –
any safety issues (e.g., allergies) that are noted related to the interventions.

## 2.3

## Data extraction and management

Duplicative data extraction was performed independently by two calibrated reviewers (GG, FS). Disagreements were resolved through discussion. We did not contact study authors. Only data from the most recent publication of a study (longest follow-up) was extracted. Data was recorded according to guidelines outlined by the Cochrane Collaboration . For the present evaluation, the following items were extracted:

- •
Study details – design, year of publication, number randomized/analysed, study setting (e.g. school, practice);

- •
Population – age, gender and number of participants, baseline caries experience or caries risk;

- •
Potentially important effect modifiers (dentition; lesion depth if given);

- •
Interventions – detailed description of the interventions, including number of teeth treated per participant;

- •
Outcome data – details of the outcomes reported and the outcomes measures, including method of assessment and timepoint(s) assessed;

- •
Funding sources, declarations/conflicts of interest.

## 2.4

## Risks of bias, heterogeneity and transitivity

Two review authors (SR, FS) independently assessed the risks of bias of each included study using the domain-based tool of the Cochrane Handbook for Systematic Reviews of Interventions. The presence of heterogeneity was assessed by evaluating the trial and study population characteristics across trials. The assumption of transitivity was assessed by evaluating the distribution of potential effect modifiers across the trials and trial arms forming the network for NMA.

## 2.5

## Treatment effects

Our primary outcome (lesion progression) was analysed by calculating the odds ratio (OR). No analysis of continuous outcomes was performed. Results from NMA were presented as summary relative effect sizes (OR) for each possible pair of interventions. Mean ranks and the cumulative ranking curve (SUCRA) were used to rank strategies according to their efficacy for arresting proximal lesions.

Where multiple lesions within a person were evaluated, the individual was considered to be the statistical unit and the lesions to be clustered within an individual. To account for clustering we computed the *design effect *. Therefore the size of each trial was reduced to its *effective sample size *, sometimes referred to as Donner or C adjustment . The design effect is given by 1 + ( M − 1 ) × I C C , where *M *is the average cluster size and *ICC *is the intracluster correlation coefficient. As all included studies were split-mouth studies, we computed the average cluster size *M *by l e s i o n s × 0.5 p a t i e n t s .

According to Masood et al. we set the *ICC *to 0.2 The number of lesions and the number experiencing the event were then divided by the design effect.

Multi-arm studies were treated as multiple independent two-arm studies in pairwise meta-analyses. A continuous correction of +1 was used for trial arms with zero events. Missing data were not imputed.

## 2.6

## Data synthesis

Pairwise random effects meta-analyses used for direct treatment comparisons was implemented using the *metafor *package in R (R Foundation for Statistical Computing, 2017). As described, all included studies used a split-mouth design, but reported data only in marginal form. To properly compute ORs we applied the Becker-Balagtas method , again setting the *ICC *at 0.2 .

Indirect and mixed comparisons were performed using Bayesian random-effects modelling and Markov Chain Monte Carlo simulations using JAGS implemented in the R package *gemtc *0.8-2 . Networks of interventions were constructed by plotting different treatments (as nodes) and comparisons (as edges) . Binomial likelihood was used to model the data . To fit the model, we used non-informative priors, for the basic parameters from a normal distribution N(0,1000), and a uniform prior U(0,4) for the random-effects standard deviation. The first 20,000 iterations were discarded as ‘burn-in’ and then further 80,000 iterations were undertaken for 4 chains with a thinning of 1. The convergence was assessed based on the Brooks-Gelman-Rubin criteria and inspection of trace plots.

Median OR and their 95% credible intervals (95% CrI) were reported. Credible intervals are the range of estimated parameters after exclusion of extreme values . Different strategies were ranked according to their probability of having the lowest versus the highest odds of arresting lesions , and the average rank calculated. The surface under the cumulative ranking (SUCRA) line was plotted and the area under the plot (SUCRA value) calculated.

## 2.7

## Statistical heterogeneity

In pairwise meta-analyses we estimated different heterogeneity variances for each pairwise comparison. In NMA we assumed a common estimate for the heterogeneity variance across the different comparisons. For pairwise meta-analysis, Chi ^{2 }test was used to assess heterogeneity, with p < 0.1 indicating significant heterogeneity. I ^{2 }statistic and its 95% confidence interval was used to quantify heterogeneity. For NMA, a total I-squared value for heterogeneity in the network was calculated . In order to evaluate the level of (in)consistency, we applied node-splitting, which evaluates one comparison at a time by separating the direct evidence on that comparison from the network of indirect evidence . Comparison adjusted funnel plot were assessed to check for the existence of publication bias .

## 2.8

## Trial sequential analysis (TSA)

Conventional meta-analysis uses Z-values to compare two interventions, with Z = 0.0 indicating no difference between intervention groups. If Z exceeds ±1.96, a difference is traditionally assumed to be statistically significant (p ≤ 0.05, two-sided test). TSA plots the series of Z-values against the number of event, patients or information accumulated with time . This ‘cumulative Z-curve’ is assessed regarding its relation to the conventional significance boundaries (Z = ±1.96), the required information size (RIS), and the trial sequential monitoring boundaries (TSMB) for benefit, harm, or futility.

The RIS was calculated based on type I error risk of α = 0.05, a type II error risk of β = 0.20 (equivalent to a power of 0.80) and the control event proportion. The relative risk reduction (RRR) was based on an a priori defined worthwhile interventional effect of 20%. It should be noted that smaller intervention effects might well be relevant. This, however, would increase the RIS even further . The required RIS was further adjusted for the diversity in the meta-analysis (diversity-adjusted RIS: DARIS). The Lan-DeMets version of the O’Brien–Fleming function was used for calculating the TSMBs. Results of cumulative Z-value crossing the conventional boundary of significance (Z = ±1.96) but not the TSMBs for benefit or harm were defined as spuriously significant. Firm evidence was assumed to be reached when the Z-curve crossed the TSMB for benefit or harm before the DARIS was reached. Firm evidence of futility was confirmed by the Z-curve crossing the TSBM for futility. TSA 0.9 (Copenhagen Trial Unit, Copenhagen, Denmark) was used .

## 2.9

## Grading of evidence

The certainty of the accrued evidence was graded according to the GRADE working group of evidence using Grade Profiler 3.6 GRADE following guidelines for rating the certainty in estimates from network meta-analyses .

## 2

## Methods

This review has been registered a priori (CRD42018080895). Reporting follows the PRISMA statement and its extension for NMA .

## 2.1

## Search

Three electronic databases (Medline via PubMed, Embase via Ovid, and Cochrane Central) were searched independently by two reviewers (SR, FS) to identify potentially eligible studies. A three-pronged search strategies using the Boolean operator AND was used, combining: disease centered terms (caries, decay, carious), intervention centered terms (sealing, infiltration) and trial centered terms (randomized, random, trial, participants, clinical studies, lesion progression). No controlled vocabularies (MeSH etc.) were employed. Reviewers screened the identified titles and abstracts of records against the inclusion criteria. If either author found a record potentially eligible, full-texts were assessed, again independently and in duplicate. Inclusion was decided by the two reviewers in consensus, or in consultation with a third reviewer (GG). Further articles were identified by examining the references of retrieved full-text studies. The search was not restricted by language. Neither authors nor journals were blinded to reviewers.

## 2.2

## Selection

Study design and interventions: Parallel group and split-mouth randomized controlled clinical trials (RCTs) comparing micro-invasive strategies against each other, against NI or placebo were included. We excluded cross-over trials from this review as the condition, dental caries, cannot return to baseline level following the initial intervention, neither can the treatment (i.e. sealing or infiltration etc. cannot be reversed).

Participants: Participants with permanent or primary teeth, and proximal carious lesions with intact surface status (non-cavitated), presumed clinically (visually-tactile if possible) or by radiographic diagnosis (with a lesion extension to which cavitation is unlikely) on posterior teeth. A sensitivity analysis to evaluate the impact of different lesion depth (into enamel or into dentin radiographically) was planned, but not conducted given insufficient data reporting depth. Treatment of buccal/vestibular lesions using micro-invasive means was not considered.

Outcome measures: Our primary outcome was lesion progression, assessed using digital subtraction radiography (DSR) or, if not available, pairwise reading or, if not available, lesion staging (for example according to radiographic enamel depth – E1, E2 outer/inner enamel half – or dentin depth – D1-3 in dentin thirds). We chose to prioritize outcome measures this way as this implies a ranking of sensitivity (DSR being most sensitive). Our secondary outcomes were:

- –
subjective evaluation of the treatments by participants and dentists

- –
efficiency (time needed for the intervention), costs or cost-effectiveness (regardless of how effectiveness was defined); and

- –
any safety issues (e.g., allergies) that are noted related to the interventions.

## 2.3

## Data extraction and management

Duplicative data extraction was performed independently by two calibrated reviewers (GG, FS). Disagreements were resolved through discussion. We did not contact study authors. Only data from the most recent publication of a study (longest follow-up) was extracted. Data was recorded according to guidelines outlined by the Cochrane Collaboration . For the present evaluation, the following items were extracted:

- •
Study details – design, year of publication, number randomized/analysed, study setting (e.g. school, practice);

- •
Population – age, gender and number of participants, baseline caries experience or caries risk;

- •
Potentially important effect modifiers (dentition; lesion depth if given);

- •
Interventions – detailed description of the interventions, including number of teeth treated per participant;

- •
Outcome data – details of the outcomes reported and the outcomes measures, including method of assessment and timepoint(s) assessed;

- •
Funding sources, declarations/conflicts of interest.

## 2.4

## Risks of bias, heterogeneity and transitivity

Two review authors (SR, FS) independently assessed the risks of bias of each included study using the domain-based tool of the Cochrane Handbook for Systematic Reviews of Interventions. The presence of heterogeneity was assessed by evaluating the trial and study population characteristics across trials. The assumption of transitivity was assessed by evaluating the distribution of potential effect modifiers across the trials and trial arms forming the network for NMA.

## 2.5

## Treatment effects

Our primary outcome (lesion progression) was analysed by calculating the odds ratio (OR). No analysis of continuous outcomes was performed. Results from NMA were presented as summary relative effect sizes (OR) for each possible pair of interventions. Mean ranks and the cumulative ranking curve (SUCRA) were used to rank strategies according to their efficacy for arresting proximal lesions.

Where multiple lesions within a person were evaluated, the individual was considered to be the statistical unit and the lesions to be clustered within an individual. To account for clustering we computed the *design effect *. Therefore the size of each trial was reduced to its *effective sample size *, sometimes referred to as Donner or C adjustment . The design effect is given by 1 + ( M − 1 ) × I C C , where *M *is the average cluster size and *ICC *is the intracluster correlation coefficient. As all included studies were split-mouth studies, we computed the average cluster size *M *by l e s i o n s × 0.5 p a t i e n t s .

According to Masood et al. we set the *ICC *to 0.2 The number of lesions and the number experiencing the event were then divided by the design effect.

Multi-arm studies were treated as multiple independent two-arm studies in pairwise meta-analyses. A continuous correction of +1 was used for trial arms with zero events. Missing data were not imputed.

## 2.6

## Data synthesis

Pairwise random effects meta-analyses used for direct treatment comparisons was implemented using the *metafor *package in R (R Foundation for Statistical Computing, 2017). As described, all included studies used a split-mouth design, but reported data only in marginal form. To properly compute ORs we applied the Becker-Balagtas method , again setting the *ICC *at 0.2 .

Indirect and mixed comparisons were performed using Bayesian random-effects modelling and Markov Chain Monte Carlo simulations using JAGS implemented in the R package *gemtc *0.8-2 . Networks of interventions were constructed by plotting different treatments (as nodes) and comparisons (as edges) . Binomial likelihood was used to model the data . To fit the model, we used non-informative priors, for the basic parameters from a normal distribution N(0,1000), and a uniform prior U(0,4) for the random-effects standard deviation. The first 20,000 iterations were discarded as ‘burn-in’ and then further 80,000 iterations were undertaken for 4 chains with a thinning of 1. The convergence was assessed based on the Brooks-Gelman-Rubin criteria and inspection of trace plots.

Median OR and their 95% credible intervals (95% CrI) were reported. Credible intervals are the range of estimated parameters after exclusion of extreme values . Different strategies were ranked according to their probability of having the lowest versus the highest odds of arresting lesions , and the average rank calculated. The surface under the cumulative ranking (SUCRA) line was plotted and the area under the plot (SUCRA value) calculated.

## 2.7

## Statistical heterogeneity

In pairwise meta-analyses we estimated different heterogeneity variances for each pairwise comparison. In NMA we assumed a common estimate for the heterogeneity variance across the different comparisons. For pairwise meta-analysis, Chi ^{2 }test was used to assess heterogeneity, with p < 0.1 indicating significant heterogeneity. I ^{2 }statistic and its 95% confidence interval was used to quantify heterogeneity. For NMA, a total I-squared value for heterogeneity in the network was calculated . In order to evaluate the level of (in)consistency, we applied node-splitting, which evaluates one comparison at a time by separating the direct evidence on that comparison from the network of indirect evidence . Comparison adjusted funnel plot were assessed to check for the existence of publication bias .

## 2.8

## Trial sequential analysis (TSA)

Conventional meta-analysis uses Z-values to compare two interventions, with Z = 0.0 indicating no difference between intervention groups. If Z exceeds ±1.96, a difference is traditionally assumed to be statistically significant (p ≤ 0.05, two-sided test). TSA plots the series of Z-values against the number of event, patients or information accumulated with time . This ‘cumulative Z-curve’ is assessed regarding its relation to the conventional significance boundaries (Z = ±1.96), the required information size (RIS), and the trial sequential monitoring boundaries (TSMB) for benefit, harm, or futility.

The RIS was calculated based on type I error risk of α = 0.05, a type II error risk of β = 0.20 (equivalent to a power of 0.80) and the control event proportion. The relative risk reduction (RRR) was based on an a priori defined worthwhile interventional effect of 20%. It should be noted that smaller intervention effects might well be relevant. This, however, would increase the RIS even further . The required RIS was further adjusted for the diversity in the meta-analysis (diversity-adjusted RIS: DARIS). The Lan-DeMets version of the O’Brien–Fleming function was used for calculating the TSMBs. Results of cumulative Z-value crossing the conventional boundary of significance (Z = ±1.96) but not the TSMBs for benefit or harm were defined as spuriously significant. Firm evidence was assumed to be reached when the Z-curve crossed the TSMB for benefit or harm before the DARIS was reached. Firm evidence of futility was confirmed by the Z-curve crossing the TSBM for futility. TSA 0.9 (Copenhagen Trial Unit, Copenhagen, Denmark) was used .

## 2.9

## Grading of evidence

The certainty of the accrued evidence was graded according to the GRADE working group of evidence using Grade Profiler 3.6 GRADE following guidelines for rating the certainty in estimates from network meta-analyses .