contingency#
Purpose#
Computes comprehensive statistics and measures of association for contingency tables, including chi-squared tests, Fisher’s exact test, odds ratios, relative risk, and ordinal association measures.
Format#
- out = contingency(freqTable[, ctl])#
- out = contingency(x1, x2[, ctl])
- out = contingency(data, formula[, ctl])
- out = contingency(filename, formula[, ctl])
- Parameters:
freqTable (matrix) – IxJ matrix of observed cell frequencies.
x1 (Nx1 vector) – first categorical variable (row variable).
x2 (Nx1 vector) – second categorical variable (column variable).
data (dataframe) – dataframe containing variables.
filename (string) – name of dataset.
formula (string) – formula string of the form
"rowvar ~ colvar".ctl (struct) –
Optional argument, instance of a
contingencyControlstructure containing the following members:ctl.output
scalar, print results. Default = 1.
- 1:
Print results.
- 0:
Suppress output.
ctl.ordinal
scalar, compute ordinal measures. Default = 1.
- 1:
Compute ordinal association measures (Gamma, Tau-b, Tau-c, Somer’s D).
- 0:
Skip ordinal measures.
- Returns:
out (struct) –
instance of
contingencyOutstructure:out.table
IxJ matrix, observed frequency table.
out.expected
IxJ matrix, expected frequencies under independence.
out.rowLabels
Ix1 string array, row category labels.
out.colLabels
Jx1 string array, column category labels.
out.nObs
scalar, total number of observations.
out.chiSq
scalar, Pearson chi-squared statistic.
out.chiSqDf
scalar, degrees of freedom for chi-squared.
out.chiSqP
scalar, p-value for chi-squared test.
out.lrChiSq
scalar, likelihood ratio (G-squared) statistic.
out.lrChiSqP
scalar, p-value for likelihood ratio test.
out.yatesChiSq
scalar, Yates-corrected chi-squared (2x2 only).
out.yatesP
scalar, p-value for Yates-corrected test.
out.mcnemarChiSq
scalar, McNemar symmetry test (square tables only).
out.mcnemarDf
scalar, degrees of freedom for McNemar test.
out.mcnemarP
scalar, p-value for McNemar test.
out.phi
scalar, phi coefficient.
out.cramersV
scalar, Cramer’s V.
out.contingencyCoef
scalar, Pearson’s contingency coefficient.
out.spearmanRho
scalar, Spearman rank correlation.
out.kappa
scalar, Cohen’s kappa (square tables only).
out.kappaASE
scalar, asymptotic standard error for kappa.
out.yulesQ
scalar, Yule’s Q (2x2 only).
out.yulesQASE
scalar, ASE for Yule’s Q.
out.yulesY
scalar, Yule’s Y (2x2 only).
out.yulesYASE
scalar, ASE for Yule’s Y.
out.oddsRatio
scalar, odds ratio (2x2 only).
out.oddsRatioLo
scalar, 95% CI lower bound for odds ratio.
out.oddsRatioHi
scalar, 95% CI upper bound for odds ratio.
out.relRisk
scalar, relative risk (2x2 only).
out.relRiskLo
scalar, 95% CI lower bound for relative risk.
out.relRiskHi
scalar, 95% CI upper bound for relative risk.
out.fisherP
scalar, Fisher’s exact test p-value (2x2 only).
out.gamma
scalar, Goodman-Kruskal gamma.
out.gammaASE
scalar, ASE for gamma.
out.tauB
scalar, Kendall’s tau-b.
out.tauBASE
scalar, ASE for tau-b.
out.tauC
scalar, Stuart’s tau-c.
out.tauCASE
scalar, ASE for tau-c.
out.somersD
scalar, Somer’s D (column dependent).
out.somersDASE
scalar, ASE for Somer’s D.
out.lambda
scalar, Goodman-Kruskal lambda (column dependent).
out.lambdaASE
scalar, ASE for lambda.
out.uncertainty
scalar, uncertainty coefficient (column dependent).
out.uncertaintyASE
scalar, ASE for uncertainty coefficient.
out.stdResid
IxJ matrix, standardized (Pearson) residuals.
out.adjResid
IxJ matrix, adjusted residuals.
out.diagnostics.minExpected
scalar, minimum expected cell frequency.
out.diagnostics.pctExpectedLt5
scalar, percent of cells with expected frequency < 5.
out.diagnostics.hasZeroCell
scalar, 1 if any observed cell is zero.
out.diagnostics.warnings
string array, warning messages about assumptions.
Examples#
Example 1: Frequency table input#
// Aspirin and heart attack data (Physicians' Health Study)
// Rows: Placebo, Aspirin
// Cols: MI, No MI
x = { 189 10845,
104 10933 };
out = contingency(x);
This produces:
Contingency Table Analysis
Observations: 22071 Table: 2x2
Tests of Independence
------------------------------------------------------------
Statistic Value df p-value
Pearson Chi-Squared 25.0139 1 0.0000
Likelihood Ratio 25.1211 1 0.0000
Risk Measures (2x2)
------------------------------------------------------------
Measure Value 95% CI
Odds Ratio 1.8321 [1.4400, 2.3311]
Relative Risk 1.8177 [1.4371, 2.2990]
Fisher Exact p-value 0.0000
Example 2: Two categorical vectors#
// Smoking status and lung disease
smoking = { 1, 1, 1, 2, 2, 2, 1, 1, 2, 2 }; // 1=smoker, 2=non-smoker
disease = { 1, 1, 2, 2, 2, 2, 1, 2, 2, 2 }; // 1=disease, 2=no disease
out = contingency(smoking, disease);
Example 3: Dataframe with formula#
// Load data
data = loadd("survey.csv");
// Test association between education and income level
out = contingency(data, "education ~ income");
Example 4: Suppress output#
struct contingencyControl ctl;
ctl = contingencyControlCreate();
ctl.output = 0; // silent mode
out = contingency(x, ctl);
// Access specific statistics
print "Odds ratio: " out.oddsRatio;
print "95% CI: [" out.oddsRatioLo "," out.oddsRatioHi "]";
Remarks#
Chi-squared validity: The chi-squared approximation may be unreliable when more than 20% of expected cell frequencies are less than 5, or any expected frequency is less than 1. Warnings are issued automatically.
Fisher’s exact test: Computed for 2x2 tables. Recommended for small samples where chi-squared may be unreliable.
Odds ratio and relative risk: Only computed for 2x2 tables. Confidence intervals use the Woolf (log-transform) method. Returns missing values if any cell is zero.
Ordinal measures: Gamma, tau-b, tau-c, and Somer’s D assume ordinal (ranked) categories. Set
ctl.ordinal = 0to skip these if variables are purely nominal.Cohen’s kappa: Only computed for square tables (same number of rows and columns). Measures agreement beyond chance.
Residual analysis: Adjusted residuals follow approximately a standard normal distribution under independence. Values exceeding |2| suggest significant departure from independence for that cell.
References#
Agresti, Alan. 2002. Categorical Data Analysis. 2nd ed. New York: John Wiley and Sons.
Bishop, Yvonne, Stephen Fienberg and Paul Holland. 1975. Discrete Multivariate Analysis: Theory and Practice. Cambridge, Mass.: MIT Press.
See also
Functions tabulate(), frequency(), crossprod()