The Core Question

Are two categorical variables related, or independent?

  • Independent = knowing one variable tells you nothing about the other
  • Related = knowing one helps predict the other

Chi-Square Test (χ²)

Key Idea

Compare what you OBSERVE vs what you’d EXPECT if there were no relationship. Bigger difference = more evidence of a relationship.

Expected Count Formula

Expected = (Row Total × Column Total) / Grand Total

Chi-Square Statistic

χ² = Σ (Observed - Expected)² / Expected

Sum this over every cell in the table.

Degrees of Freedom

df = (rows - 1) × (columns - 1)

Interpretation

  • Small p-value (< 0.05) → reject independence → variables ARE related
  • Large p-value (≥ 0.05) → no evidence of a relationship

⚠️ Key Assumption

All expected counts must be ≥ 5. If any are < 5, use Fisher’s Exact Test instead.

Python

from scipy.stats import chi2_contingency
 
observed = [[60, 20, 20],
            [30, 40, 30]]
 
chi2, p_value, df, expected = chi2_contingency(observed)
print(f"χ² = {chi2:.2f}, p = {p_value:.4f}, df = {df}")
print(f"Min expected count: {expected.min():.2f}")

Fisher’s Exact Test

Key Idea

Used for 2×2 tables with small counts. Calculates the exact probability of seeing a table this extreme (or more), given fixed row/column totals.

Uses the hypergeometric distribution — no approximation needed.

When to Use

SituationTest
Any size table, all expected ≥ 5Chi-Square
2×2 table, any expected < 5Fisher’s Exact
2×2 table, want exact p-valueFisher’s Exact
Larger than 2×2Chi-Square only

Python

from scipy.stats import fisher_exact
 
table = [[7, 3],
         [2, 8]]
 
odds_ratio, p_value = fisher_exact(table)
print(f"Odds Ratio = {odds_ratio:.2f}, p = {p_value:.4f}")

What to Report

Chi-Square:

“A chi-square test showed a significant association between age and ice cream preference (χ² = 18.66, df = 2, p < 0.001).”

Fisher’s Exact:

“Fisher’s exact test showed a significant association between drug treatment and side effects (p = 0.011, OR = 9.33).”