Description
From After Treadmill)
Heart rate measurements for 12 subjects before and after a treadmill session (baseline vs after5). Because both measurements come from the same subject, observations are paired, not independent — the appropriate test is the paired-sample version of the t-test (or its non-parametric analogue).
File: data/health_promo_hr.csv. Columns: baseline, after5.
Paired t-test
From After Treadmill)
Compute ; test via
R code
hr_df <- read.csv("data/health_promo_hr.csv")
before <- hr_df$baseline
after <- hr_df$after5
t.test(before, after, paired=TRUE)Python code
import pandas as pd
from scipy import stats
hr_df = pd.read_csv("data/health_promo_hr.csv")
paired_out = stats.ttest_rel(hr_df.baseline, hr_df.after5)
print(f"""
Test statistic: {paired_out.statistic:.3f}.
p-val: {paired_out.pvalue:.3f}.""")Assumption check caveat: with only 12 subjects the case for Normality of the is weak. For small-sample paired data, a graphical check + the non-parametric analogue below is the safer route.
Paired/agreement diagnostic plots:
- Paired-line plot — one line per subject connecting baseline to after. Similar gradients (consistently up or consistently down) support a systematic effect; mixed signs weaken it.
- Agreement plot (after vs. before, with the reference line) — if points scatter around , no mean shift.
Non-parametric paired test
From After Treadmill)
Wilcoxon Signed-Rank Test (WST) — paired-sample analogue of Wilcoxon Rank-Sum. Tests whether the median of is 0. No Normality assumption.
Procedure:
- Drop .
- Rank the remaining from 1 (smallest) to (largest).
- = sum of ranks where .
- Under , ; the continuity-corrected test statistic approximately when the number of non-zero .
R code
wilcox.test(before, after, paired=TRUE, exact=FALSE)Python code
wsr_out = stats.wilcoxon(hr_df.baseline, hr_df.after5,
correction=True, method='approx')
print(f"""Test statistic: {wsr_out.statistic:.3f}.
p-val: {wsr_out.pvalue:.3f}.""")Small-sample caveat: with only 12 subjects here, we do not hit the threshold for the Normal approximation. The ideal is the exact version, but R/Python can’t run exact when there are ties. SAS does use the exact version — which accounts for any difference in p-values between SAS and R/Python.
Why test statistics look different across software:
- R / Python report directly.
- SAS reports .
For :
reconciles a reported 0 in R with SAS’s -39.
If neither approximation is great and exact-with-ties is unavailable, fall back to the bootstrap or a permutation test (see L10 Simulation).
See also: L7 Two-sample Hypothesis Tests · L10 Simulation