Description

From After Treadmill)

Heart rate measurements for 12 subjects before and after a treadmill session (baseline vs after5). Because both measurements come from the same subject, observations are paired, not independent — the appropriate test is the paired-sample version of the t-test (or its non-parametric analogue).

File: data/health_promo_hr.csv. Columns: baseline, after5.

Paired t-test

From After Treadmill)

Compute ; test via

R code
hr_df <- read.csv("data/health_promo_hr.csv")
before <- hr_df$baseline
after  <- hr_df$after5
t.test(before, after, paired=TRUE)
Python code
import pandas as pd
from scipy import stats
 
hr_df = pd.read_csv("data/health_promo_hr.csv")
paired_out = stats.ttest_rel(hr_df.baseline, hr_df.after5)
print(f"""
Test statistic: {paired_out.statistic:.3f}.
p-val: {paired_out.pvalue:.3f}.""")

Assumption check caveat: with only 12 subjects the case for Normality of the is weak. For small-sample paired data, a graphical check + the non-parametric analogue below is the safer route.

Paired/agreement diagnostic plots:

  • Paired-line plot — one line per subject connecting baseline to after. Similar gradients (consistently up or consistently down) support a systematic effect; mixed signs weaken it.
  • Agreement plot (after vs. before, with the reference line) — if points scatter around , no mean shift.

Non-parametric paired test

From After Treadmill)

Wilcoxon Signed-Rank Test (WST) — paired-sample analogue of Wilcoxon Rank-Sum. Tests whether the median of is 0. No Normality assumption.

Procedure:

  1. Drop .
  2. Rank the remaining from 1 (smallest) to (largest).
  3. = sum of ranks where .
  4. Under , ; the continuity-corrected test statistic approximately when the number of non-zero .
R code
wilcox.test(before, after, paired=TRUE, exact=FALSE)
Python code
wsr_out = stats.wilcoxon(hr_df.baseline, hr_df.after5,
                         correction=True, method='approx')
print(f"""Test statistic: {wsr_out.statistic:.3f}.
p-val: {wsr_out.pvalue:.3f}.""")

Small-sample caveat: with only 12 subjects here, we do not hit the threshold for the Normal approximation. The ideal is the exact version, but R/Python can’t run exact when there are ties. SAS does use the exact version — which accounts for any difference in p-values between SAS and R/Python.

Why test statistics look different across software:

  • R / Python report directly.
  • SAS reports .

For :

reconciles a reported 0 in R with SAS’s -39.

If neither approximation is great and exact-with-ties is unavailable, fall back to the bootstrap or a permutation test (see L10 Simulation).


See also: L7 Two-sample Hypothesis Tests · L10 Simulation