# Logistic regression: Part 1

In the article discussing the chi-square test, I used a clinical trial scenario with the objective of assessing the clinical alignment efficiency of 2 types of wires. These wires (A and B) were used for 6 months in 2 patient groups, and the outcome recorded was binary: reaching complete alignment (success) or not reaching complete alignment (failure).

Table I shows the tabulation of alignment successes and failures for each wire after 6 months of treatment and the calculation of risks and odds of success, and risk and odds ratios of alignment success vs failure.

Table I
Tabulation of alignment success and failure after 6 months of treatment by wire type and calculation of risk and odds of success, and risk and odds ratios of alignment success vs failure
Wire type Total
A B
Alignment
Yes a = 23 b = 19 42
No c = 8 d = 11 19
Total 31 30 61
How many aligned with A?
Risk = 23/31 = 0.74
Odds = 23/8 = 2.88
How many aligned with B?
Risk = 19/30 = 0.63
Odds = 19/11 = 1.73
Risk ratio
0.77/0.61 = 1.17
Odds ratio (OR)
2.88/1.73 = 1.66

The chi-square test showed no evidence of a difference in the success of alignment after 6 months between the 2 wire groups; the P value was 0.36.

The same result can be calculated using a special type of regression analysis called logistic regression used when the outcome is binary (alignment: yes/no). Remember linear regression is used when the outcome is continuous (eg, millimeters of crowding alleviation). In logistic regression, we can get effect estimates, P values, and confidence intervals directly from the regression output. In logistic regressions, the effect estimates are handled as log odds ratios (log OR, log = natural logarithm), because they have appropriate mathematical properties (can range from −∞ to +∞). We can convert the log ORs to odds ratios (ORs), which are more interpretable, by exponentiating them (OR = exp[log OR]). Logistic regression has a similar form as the linear regression model in the sense that components (y = a + bx) are linearly related in the logarithmic scale (when using log ORs). However, in logistic regression, the response or dependent variable y is the log odds log(p/1−p), which is called the logit:

log ( p 1 − p ) = a + b ∗ x

where a is the intercept (constant), b is the regression coefficient of x, and x is the categorical predictor, with 2 in our example (wire A or wire B).

Specifically, in the above equation that pertains to the logistic regression model, a is the log odds of reaching alignment in patients in the control group, which we assume here is the group with wire B (reference). In the above equation, b is the log OR of reaching alignment in patients fitted with wire A vs patients fitted with wire B.

In a bit more detail, we have groups A and B, and the risk (proportion) of the event for A is p1, whereas the risk of the event for B is p2. The odds of the event would be p1/(1−p1) for the wire A group and p2/(1−p2) for the wire B group, and their natural logarithms would be log(p1/1−p1) = logit(p1) and log(p2/1−p2) = logit(p2), respectively. Then the OR of the event in group A compared with group B would be

p A / ( 1 − p A ) p B / ( 1 − p B )

and the logarithm of the OR would be

l o g ( p A / ( 1 − p A ) p B / ( 1 − p B ) ) = l o g ( p A 1 − p A ) − l o g ( p B 1 − p B )

= l o g i t ( p A ) − l o g i t ( p B )

If we use the values 0 and 1 for wires B and A, respectively, and after appropriate substitutions in equation 1 and using equations 2 and 3 , we arrive at the following:

For wire B, log ( p / 1 − p ) = a + b ∗ x = a + b ∗ 0 = a or log ( p / 1 − p ) = a + b ∗ x = a + b ∗ 0 = a or log ( p / 1 − p ) = l o g ( p B 1 − p B ) and for wire A, log ( p / 1 − p ) = a + b * x = a + b * 1 = a + b and b = log ( p / 1 − p ) − a = l o g ( p A 1 − p A ) − l o g ( p B 1 − p B ) = l o g ( p A / ( 1 − p A ) p B / ( 1 − p B ) )