In a previous article, I discussed how we can practically construct a confidence interval that is likely to include the true population mean from our single sample by taking advantage of the properties of the sampling distribution. In many cases, we can make a skewed distribution symmetrical by transforming the data; this allows us to use the properties of the sampling distribution.
The standard error (SE) of the sample mean is the estimated standard deviation of the sampling distribution; because we carry out the study only once, we have only 1 sample. Even though we do not observe the sampling distribution, since this would require multiple samples, we can calculate the standard deviation of the sampling distribution given the standard deviation and size of our 1 sample ( σ n ) .
Again, using information from this 1 sample, along with knowledge about the sampling distribution of a mean, we can construct a confidence interval. As long as the size of our sample is reasonably large (>30), we know that the sampling distribution of a mean is normal. This allows us to infer a range of values within which the true population mean is likely to lie. This range of values is called a “confidence interval” because we can be reasonably confident that it contains the true mean.
I described earlier that the confidence interval of the mean follows the general formula:
If we took thousands of samples and for each sample calculated the mean and associated 95% confidence interval, we would expect 95% of these confidence intervals to include the population mean.
When we are making statistical comparisons, we state the null (H o ) and the alternative (H a ) hypotheses.
The null hypothesis usually states that there is no difference when comparing groups, and we look for evidence to disprove the null hypothesis and accept the alternative. In our previous example of adolescent patients seeking orthodontic therapy, our sample’s mean age was 14.5 years, and we hypothesized that our true population mean (μ) was 15.3 years. We are interested in assessing whether our assumption about the population mean holds. In other words, we would like to test whether our sample mean differs from the hypothesized true population mean.
To accomplish this, we compare 2 hypotheses: (1) the null hypothesis that the population mean is equal to μ, H 0 :μ 0 = μ; and (2) the alternative hypothesis that the population mean is any value except for μ, H 1 : μ 1 ≠ μ .
The statistical process to assess the null hypothesis is called the statistical test. Since the alternative hypothesis is defined as 2-sided, the statistical test is a 2-sided test. The 2-sided test is applied more frequently than the “strict” 1-sided test (eg, H 1 :μ 1 >μ).
This step allows us to calculate a statistical formula known as the test statistic: a measure of how extreme our data are. The general form of the test statistic compares the observed value estimated from the sample, x ¯ , and the population mean under the null hypothesis, μ 0 = μ. It also takes into account the population variability by using the standard error.
We saw that sampling distributions are approximately normal when the sample size is large. The letter “z” is often used in association with normal distribution, so the value of the test statistic is called a “z value.” It is equal to