Confounding is a bias that threatens the validity of causal inferences in a study. Rothman and Greenland defined confounding: “A confounding factor must not be affected by the exposure or disease. In particular, it cannot be an intermediate step in the causal pathway between the exposure and the disease.”

Controlling for confounders is particularly vital for observational studies, where subjects are not randomly assigned to different groups. A recent cross-sectional study of 4 major orthodontic journals reported that only 17% of the studies adjusted for confounders.

There are 3 principal methods for assessing confounders: traditional approach (also called standard approach), noncollapsibility approach, and causal diagrams through the use of directed acyclic graphs (DAGs).

In the traditional approach, a variable is a confounder if it is associated with the exposure (E), is an independent risk factor to the outcome (D), and is not on the causal pathway between E and D. The problem with this approach is that some variables meet these criteria, but they are not genuine confounders. This could confuse the association between E and D and introduce bias.

The noncollapsibility concept states that a confounder is identified when the crude or unadjusted estimate for the relationship between E and D is different from the estimate calculated after adjusting for potential confounders. This concept can also lead to inappropriate adjustment and invalid inferences, since the adjusted estimate is not always a less biased one than the unadjusted estimate. For instance, the change in estimate could still occur when an adjustment is implemented for a covariate that is not a confounder because it is on the causal pathway between E and D. This inappropriate adjustment is likely to result in overestimation or underestimation of the true association.

Selection bias is a systematic and nonrandom error that can occur when the methods of subject enrollment into a study or allocations to treatment groups are systematically different among the selected groups. An example of selection bias is when an investigator plans to study retrospectively the effectiveness of functional appliances in the treatment of adolescents with Class II Division 1 malocclusion. The investigator decides to include only successfully treated patients, thus introducing selection bias into the study.

The third approach to control for confounders is the use of causal diagrams or DAGs. DAGs are directed (1 direction), acyclic (no cycles, cannot go back in circle), and graphic (diagrammatic) representations of the causal relationship among variables in a model.

Both the traditional and noncollapsibility approaches depend only on statistical associations among variables with no attention to the background knowledge of the study subject. Failure to consider background knowledge can result in either incomplete adjustment for confounding or overadjustment by inclusion of unnecessary variables in the analysis.

Causal diagrams (DAGs) address the above limitations by combining statistical associations (quantitative approach) with the a priori subject-matter knowledge of the causal relationships among E, D, and the confounders (qualitative approach). DAGs provide a quick and simple visual representation of the structural associations at the design stage of a study, aid in determining which variables to adjust for, and distinguish a genuine confounder from nonconfounders.

However, DAGs are still not very suitable to assess effect modifications (also known as statistical interactions among causes) and strengths of associations between variables (correlation coefficients), or for parametric structural modeling.

## Definition

A DAG is a nonparametric structural method to identify potential confounders through the encoding of assumptions (subject-matter knowledge) about the causal effects in a study and the description of the associations and causations among variables in the form of a graph. DAG is nonparametric, because it does not specify how the relationships among variables should be estimated or the probability distribution of the variables; a DAG presents these relationships in a graphic way instead.

A DAG consists of 3 main elements: nodes (variables), directed arrows (also called edges) connecting the nodes to represent the temporal relationship between the variables (arrows move in 1 direction, from left to right only, since the future cannot cause the past); and the researcher’s subject-matter knowledge about the causal relationships among the variables drawn in the DAG.

## Basic DAG terminology

In this example, we want to investigate the effects of full fixed orthodontic appliances (intervention) on the occurrence of labial gingival recession (outcome) in a case-control study. The cases are patients who have labial gingival recession (outcome), and the controls are those who do not have recession (outcome), whereas the exposure is full fixed orthodontic treatment vs no orthodontic treatment.

The first step is to identify the variables of interest. From the above example, we identified 4 binary variables. “OT” denotes the exposure of interest: full fixed orthodontic appliance treatment; “GLR” denotes the outcome: gingival labial recession; “C” is the confounder: which in this study is patient’s age; and “S” stands for selection bias: bias during the selection process into the case or control group.

The second step is to draw a DAG to model the causal assumptions presented (DAGitty software was used in this example [ www.dagitty.net/ ], but other software programs are also available). We now explain the key concepts behind DAGs using our example.

- 1.
In Figure 1 , the omission of an arrow connecting the exposure “OT” to the disease “GLR” indicates that the probability of developing the disease is the same in the treated and untreated patients. In other words, “OT” does not have a causal association with “GLR,” and the 2 variables are independent.