contingency#

Purpose#

Computes comprehensive statistics and measures of association for contingency tables, including chi-squared tests, Fisher’s exact test, odds ratios, relative risk, and ordinal association measures.

Format#

out = contingency(freqTable[, ctl])#
out = contingency(x1, x2[, ctl])
out = contingency(data, formula[, ctl])
out = contingency(filename, formula[, ctl])
Parameters:
  • freqTable (matrix) – IxJ matrix of observed cell frequencies.

  • x1 (Nx1 vector) – first categorical variable (row variable).

  • x2 (Nx1 vector) – second categorical variable (column variable).

  • data (dataframe) – dataframe containing variables.

  • filename (string) – name of dataset.

  • formula (string) – formula string of the form "rowvar ~ colvar".

  • ctl (struct) –

    Optional argument, instance of a contingencyControl structure containing the following members:

    ctl.output

    scalar, print results. Default = 1.

    1:

    Print results.

    0:

    Suppress output.

    ctl.ordinal

    scalar, compute ordinal measures. Default = 1.

    1:

    Compute ordinal association measures (Gamma, Tau-b, Tau-c, Somer’s D).

    0:

    Skip ordinal measures.

Returns:

out (struct) –

instance of contingencyOut structure:

out.table

IxJ matrix, observed frequency table.

out.expected

IxJ matrix, expected frequencies under independence.

out.rowLabels

Ix1 string array, row category labels.

out.colLabels

Jx1 string array, column category labels.

out.nObs

scalar, total number of observations.

out.chiSq

scalar, Pearson chi-squared statistic.

out.chiSqDf

scalar, degrees of freedom for chi-squared.

out.chiSqP

scalar, p-value for chi-squared test.

out.lrChiSq

scalar, likelihood ratio (G-squared) statistic.

out.lrChiSqP

scalar, p-value for likelihood ratio test.

out.yatesChiSq

scalar, Yates-corrected chi-squared (2x2 only).

out.yatesP

scalar, p-value for Yates-corrected test.

out.mcnemarChiSq

scalar, McNemar symmetry test (square tables only).

out.mcnemarDf

scalar, degrees of freedom for McNemar test.

out.mcnemarP

scalar, p-value for McNemar test.

out.phi

scalar, phi coefficient.

out.cramersV

scalar, Cramer’s V.

out.contingencyCoef

scalar, Pearson’s contingency coefficient.

out.spearmanRho

scalar, Spearman rank correlation.

out.kappa

scalar, Cohen’s kappa (square tables only).

out.kappaASE

scalar, asymptotic standard error for kappa.

out.yulesQ

scalar, Yule’s Q (2x2 only).

out.yulesQASE

scalar, ASE for Yule’s Q.

out.yulesY

scalar, Yule’s Y (2x2 only).

out.yulesYASE

scalar, ASE for Yule’s Y.

out.oddsRatio

scalar, odds ratio (2x2 only).

out.oddsRatioLo

scalar, 95% CI lower bound for odds ratio.

out.oddsRatioHi

scalar, 95% CI upper bound for odds ratio.

out.relRisk

scalar, relative risk (2x2 only).

out.relRiskLo

scalar, 95% CI lower bound for relative risk.

out.relRiskHi

scalar, 95% CI upper bound for relative risk.

out.fisherP

scalar, Fisher’s exact test p-value (2x2 only).

out.gamma

scalar, Goodman-Kruskal gamma.

out.gammaASE

scalar, ASE for gamma.

out.tauB

scalar, Kendall’s tau-b.

out.tauBASE

scalar, ASE for tau-b.

out.tauC

scalar, Stuart’s tau-c.

out.tauCASE

scalar, ASE for tau-c.

out.somersD

scalar, Somer’s D (column dependent).

out.somersDASE

scalar, ASE for Somer’s D.

out.lambda

scalar, Goodman-Kruskal lambda (column dependent).

out.lambdaASE

scalar, ASE for lambda.

out.uncertainty

scalar, uncertainty coefficient (column dependent).

out.uncertaintyASE

scalar, ASE for uncertainty coefficient.

out.stdResid

IxJ matrix, standardized (Pearson) residuals.

out.adjResid

IxJ matrix, adjusted residuals.

out.diagnostics.minExpected

scalar, minimum expected cell frequency.

out.diagnostics.pctExpectedLt5

scalar, percent of cells with expected frequency < 5.

out.diagnostics.hasZeroCell

scalar, 1 if any observed cell is zero.

out.diagnostics.warnings

string array, warning messages about assumptions.

Examples#

Example 1: Frequency table input#

// Aspirin and heart attack data (Physicians' Health Study)
// Rows: Placebo, Aspirin
// Cols: MI, No MI
x = { 189 10845,
      104 10933 };

out = contingency(x);

This produces:

Contingency Table Analysis
Observations: 22071    Table: 2x2

Tests of Independence
------------------------------------------------------------
                     Statistic       Value      df     p-value
           Pearson Chi-Squared     25.0139       1      0.0000
              Likelihood Ratio     25.1211       1      0.0000

Risk Measures (2x2)
------------------------------------------------------------
                       Measure       Value                  95% CI
                    Odds Ratio      1.8321            [1.4400, 2.3311]
                 Relative Risk      1.8177            [1.4371, 2.2990]
          Fisher Exact p-value      0.0000

Example 2: Two categorical vectors#

// Smoking status and lung disease
smoking = { 1, 1, 1, 2, 2, 2, 1, 1, 2, 2 };   // 1=smoker, 2=non-smoker
disease = { 1, 1, 2, 2, 2, 2, 1, 2, 2, 2 };   // 1=disease, 2=no disease

out = contingency(smoking, disease);

Example 3: Dataframe with formula#

// Load data
data = loadd("survey.csv");

// Test association between education and income level
out = contingency(data, "education ~ income");

Example 4: Suppress output#

struct contingencyControl ctl;
ctl = contingencyControlCreate();
ctl.output = 0;  // silent mode

out = contingency(x, ctl);

// Access specific statistics
print "Odds ratio: " out.oddsRatio;
print "95% CI: [" out.oddsRatioLo "," out.oddsRatioHi "]";

Remarks#

  • Chi-squared validity: The chi-squared approximation may be unreliable when more than 20% of expected cell frequencies are less than 5, or any expected frequency is less than 1. Warnings are issued automatically.

  • Fisher’s exact test: Computed for 2x2 tables. Recommended for small samples where chi-squared may be unreliable.

  • Odds ratio and relative risk: Only computed for 2x2 tables. Confidence intervals use the Woolf (log-transform) method. Returns missing values if any cell is zero.

  • Ordinal measures: Gamma, tau-b, tau-c, and Somer’s D assume ordinal (ranked) categories. Set ctl.ordinal = 0 to skip these if variables are purely nominal.

  • Cohen’s kappa: Only computed for square tables (same number of rows and columns). Measures agreement beyond chance.

  • Residual analysis: Adjusted residuals follow approximately a standard normal distribution under independence. Values exceeding |2| suggest significant departure from independence for that cell.

References#

Agresti, Alan. 2002. Categorical Data Analysis. 2nd ed. New York: John Wiley and Sons.

Bishop, Yvonne, Stephen Fienberg and Paul Holland. 1975. Discrete Multivariate Analysis: Theory and Practice. Cambridge, Mass.: MIT Press.

See also

Functions tabulate(), frequency(), crossprod()