Trying to decide whether to use a one-sample z-test or a t-test? The answer depends on your sample size, whether the data follows a normal distribution, and whether the population standard deviation is known or unknown. Selecting the appropriate statistical test ensures that the underlying assumptions are satisfied and that the resulting conclusions are statistically valid. Use the quick guide below to determine which test fits your specific data requirements:
| Population Data Normality | Population Standard Deviation | Sample Size | Appropriate Test |
|---|---|---|---|
| Normal | Known | Any | Z-test |
| Unknown | Any | T-test | |
| Not Normal | Known | Large (\( n > 30 \)) | Z-test |
| Known or Unknown | Any other case | Wilcoxon Signed-Rank Test |
Case 1: The population is normally distributed and the population standard deviation is known
If the population is normally distributed and the population standard deviation is known, the z-test should be applied. Here’s the proof:
Assume we have a random sample \( X_1, X_2, \dots, X_n \) drawn from a normal distribution \( \mathcal{N}(\mu, \sigma^2) \), where \( \mu \) is the population mean and \( \sigma^2 \) is the known population variance. The sample mean is defined as:
\( \bar{X} =\frac{X_1 + X_2 + \dots + X_n}{n}\)
Since each \( X_i \) is normally distributed, and the linear combination of normally distributed variables is also normally distributed, the sample mean \( \bar{X} \) follows a normal distribution.
Finding the expected value of the sample mean:
\( \mathbb{E}[\bar{X}] = \mathbb{E}\left[\frac{X_1 + X_2 + \cdots + X_n}{n}\right] = \frac{1}{n} \mathbb{E}(X_1 + X_2 + \cdots + X_n) = \frac{1}{n} (\mathbb{E}(X_1) + \mathbb{E}(X_2) + \cdots + \mathbb{E}(X_n)) = \frac{1}{n} (\mu + \cdots + \mu) = \frac{1}{n} (n\mu) = \mu \)
By the linearity of expectation, we can express the expected value of a sum as a sum of expected values, and then substitute \( \mu \) for \(E(X_i) \) as each \( X_i \) was drawn from a normal distribution with the expected value of \(\mu\).
Finding the variance of the sample mean:
\( \text{Var}(\bar{X}) = \text{Var}\left(\frac{X_1 + X_2 + \cdots + X_n}{n}\right) = \frac{1}{n^2} \text{Var}(X_1 + X_2 + \cdots + X_n) = \frac{1}{n^2} (\text{Var}(X_1) + \text{Var}(X_2) + \cdots + \text{Var}(X_n)) = \frac{1}{n^2} (n\sigma^2) = \frac{\sigma^2}{n} \)
We use two key properties of variance: first, that the variance of a constant times a random variable is the constant squared times the variance of the random variable (i.e., \( \text{Var}(aX) = a^2\text{Var}(X) \)), and second, that the variance of the sum of independent random variables is the sum of their individual variances. Then, we substitute \( \sigma^2 \) for \(Var(X_i)\) as each \( X_i \) was drawn from a normal distribution with the variance of \( \sigma^2 \).
Since we have shown that the sample mean \( \bar{X} \) is normally distributed with mean \( \mu \) and variance \( \frac{\sigma^2}{n} \), it follows that \( \bar{X} \sim \mathcal{N}\left(\mu, \frac{\sigma^2}{n}\right) \). We can therefore standardize \( \bar{X} \) by subtracting the mean \( \mu \) and dividing by the standard deviation \( \sigma / \sqrt{n} \), resulting in the variable \( Z = \frac{\bar{X} – \mu}{\sigma / \sqrt{n}} \), which follows a standard normal distribution.
Therefore, we can use the one-sample z-test with the test statistics of \( z = \frac{\bar{X} – \mu}{\sigma / \sqrt{n}} \) when the data is normally distributed and the population standard deviation is known.
Case 2: The population is normally distributed, but the population standard deviation is unknown
If the population is normally distributed, but the population standard deviation is unknown, the t-test should be applied. Here’s the proof:
Proof:
Assume we have a random sample \( X_1, X_2, \dots, X_n \) drawn from a normal distribution \( \mathcal{N}(\mu, \sigma^2) \), where \( \mu \) is the population mean and \( \sigma^2 \) is the unknown population variance.
Since the population variance \( \sigma^2 \) is unknown, we use the sample variance \( s^2 = \frac{1}{n – 1} \sum_{i=1}^{n} (x_i – \bar{x})^2 \) as an estimate. Unlike \( \sigma^2 \), \( s^2 \) is a random variable that follows a chi-square distribution with \( n – 1 \) degrees of freedom.
We’ll consider the following test statistic:
\( T= \frac{\bar{X} – \mu}{s / \sqrt{n}} = \frac{\frac{\bar{X} – \mu}{\sigma/\sqrt{n}}}{\sqrt{\frac{(n-1)s^2}{\sigma^2}}/\sqrt{n-1}} \)
In this expression, the numerator \( \frac{\bar{X} – \mu}{s / \sqrt{n}} \) follows the standard normal distribution. In the denominator, the expression \( \frac{(n-1)s^2}{\sigma^2}\) has a Chi-Square distribution with \( n-1 \) degrees of freedom. Finally, the ratio of a normal variable to the square root of a chi-square variable divided by its degrees of freedom follows a t-distribution with \( n-1 \) degrees of freedom.
Therefore, we can use the one-sample t-test with the test statistics of \( t = \frac{\bar{X} – \mu}{s / \sqrt{n}} \) when the data is normally distributed and the population standard deviation is unknown.
Case 3: The population is not normally distributed, but the sample size is large
In large samples, we can still apply the z-test even when the population is not normally distributed and the standard deviation is unknown. Here’s the proof:
Let \( X_1, X_2, \dots, X_n \) be independent and identically distributed (i.i.d.) random variables with mean \( \mu \) and variance \( \sigma^2 \). We consider the test statistic:
\( Z_n = \frac{\bar{X}_n – \mu_0}{s_n / \sqrt{n}} \)
According to the Central Limit Theorem, \(\frac{\bar{X}_n – \mu}{\sigma / \sqrt{n}} \xrightarrow{d} \mathcal{N}(0, 1)\). Also, since the sample standard deviation \( s_n \) is a consistent estimator of the population standard deviation \( \sigma \), we have \( s_n \xrightarrow{p} \sigma \). Since convergence in probability implies convergence in distribution, we also have \( s_n \xrightarrow{d} \sigma \).
By Slutsky’s theorem, if \( A_n \xrightarrow{d} A \) and \( B_n \xrightarrow{d} b \neq 0 \), then \( \frac{A_n}{B_n} \xrightarrow{d} \frac{A}{b} \). Applying this to our test statistic:
\( \frac{\bar{X}_n – \mu}{s / \sqrt{n}} = \left( \frac{\bar{X}_n – \mu}{\sigma / \sqrt{n}} \right) \left( \frac{\sigma}{s} \right) \xrightarrow{d} \mathcal{N}(0,1) \cdot 1 = \mathcal{N}(0,1) \)
Therefore, for large \( n \), the test statistic \( Z_n \) approximately follows the standard normal distribution. This justifies the use of the z-test even when the population is not normally distributed and the standard deviation is unknown, as long as the sample size is sufficiently large.
Case 4: The population is not normally distributed and the sample size is small
When the population is not normally distributed and the sample size is small (typically \( n \leq 30 \)), neither the z-test nor the t-test is appropriate because they rely on the assumption of normality. Instead, we use the Wilcoxon Signed-Rank Test, which is a non-parametric alternative that does not assume normality. It compares the median of the sample to a specified value and evaluates whether the differences between paired observations are symmetrically distributed around zero. This test is particularly useful when the data contains outliers or is skewed.
To perform the Wilcoxon Signed-Rank Test, follow these steps:
- Calculate the difference between each observation and the hypothesized median (e.g., \( X_i – \mu_0 \)).
- Remove any differences equal to zero (i.e., ties).
- Rank the absolute values of the remaining differences from smallest to largest, assigning average ranks for ties.
- Assign a positive or negative sign to each rank based on the sign of the original difference.
- Calculate the test statistic \( W \) as the sum of the signed ranks.
- Compare \( W \) to the critical value from the Wilcoxon distribution table or use statistical software to obtain the p-value.
- Reject or fail to reject the null hypothesis based on the p-value or critical value comparison.
This test provides a robust alternative for small, non-normal samples, ensuring valid inference when the assumptions of parametric tests are not met.
Example Calculations in R
Two-tailed z-test, step-by-step manual calculation:
#Example data: x <- c(1.5, 2.1, 1.8, 2.4, 1.9) mu <- 2 sigma <- 0.5 #Step 1: Calculate the sample mean: x_bar <- mean(x) #Step 2: Set the sample size: n <- length(x) #Step 3: Compute the standard error: se <- sigma / sqrt(n) #Step 4: Compute the z-statistic: z <- (x_bar - mu) / se #Step 5: Compute two-tailed p-value: p_value <- 2 * (1 - pnorm(abs(z))) #Have a look at the output: print(z) print(p_value)
Two-tailed z-test using the z.test function from the BSDA Package:
#Install the BSDA package:
install.packages("BSDA")
library(BSDA)
#Perform the z-test using the z.test function:
z.test(x = c(1.5, 2.1, 1.8, 2.4, 1.9), mu = 2, sigma.x = 0.5)
Two-tailed t-test, step-by-step manual calculation:
#Example data: x <- c(1.5, 2.1, 1.8, 2.4, 1.9) mu <- 2 #Step 1: Calculate the sample mean: x_bar <- mean(x) #Step 2: Set the sample size: n <- length(x) #Step 3: Calculate the sample standard deviation: s <- sd(x) #Step 4: Compute the standard error: se <- s / sqrt(n) #Step 5: Compute the t-statistic: t_stat <- (x_bar - mu) / se #Step 6: Compute the degrees of freedom: df <- n - 1 #Step 7: Compute the two-tailed p-value: p_value <- 2 * pt(-abs(t_stat), df) #Have a look at the output: print(t_stat) print(df) print(p_value)
Two-tailed t-test using the t.test function from the stats package (the stats package is automatically loaded when you start R, so there's no need to install or load it separately):
#Perform the two-tailed t-test: t.test(x = c(1.5, 2.1, 1.8, 2.4, 1.9), mu = 2)
Wilcoxon Signed-Rank Test:
#Perform the Wilcoxon Signed-Rank Test: wilcox.test(x = c(1.5, 2.1, 1.8, 2.4, 1.9), mu = 2)
Need Help from an R Tutor?
If you’re finding it challenging to decide whether to use a z-test or a t-test, working with an experienced tutor can save you time and make learning R a more enjoyable, less stressful experience. Visit our R Tutor page to learn more about our one-on-one tutoring services and assignment assistance.
