mvnTest#

Purpose#

Tests multivariate normality using one or more methods: Henze-Zirkler (default), Mardia’s skewness/kurtosis, Doornik-Hansen, or Royston.

Format#

out = mvnTest(X[, ctl])#

out = mvnTest(data, formula[, ctl])

out = mvnTest(filename, formula[, ctl])

Parameters:

X (NxK matrix) – data matrix with K >= 2 variables.
data (dataframe) – dataframe containing variables.
filename (string) – name of dataset.
formula (string) – formula string specifying variables, e.g., "x1 + x2 + x3".

ctl (struct) –

Optional argument, instance of an mvnTestControl structure containing the following members:

ctl.output

scalar, print results. Default = 1.

1:: Print results.
0:: Suppress output.

ctl.miss

scalar, missing value handling. Default = 0.

0:: Error if missing values present.
1:: Listwise deletion of rows with missing values.

ctl.method

string, test method to use. Default = "hz".

"hz":: Henze-Zirkler test (recommended omnibus test).
"mardia":: Mardia skewness and kurtosis tests.
"dh":: Doornik-Hansen test.
"royston":: Royston test (based on Shapiro-Wilk).
"all":: Run all four methods.

Returns:

out (struct) –

instance of mvnTestOut structure:

out.n	scalar, sample size after any listwise deletion.
out.k	scalar, number of variables.
out.skewStat	scalar, Mardia normalized skewness statistic (approx N(0,1)).
out.skewP	scalar, p-value for skewness test.
out.kurtStat	scalar, Mardia normalized kurtosis statistic (approx N(0,1)).
out.kurtP	scalar, p-value for kurtosis test.
out.combStat	scalar, Mardia combined chi-squared statistic.
out.combP	scalar, p-value for combined test (chi-sq df=2).
out.hzStat	scalar, Henze-Zirkler test statistic.
out.hzP	scalar, Henze-Zirkler p-value (lognormal approximation).
out.hzBeta	scalar, Henze-Zirkler smoothing parameter.
out.dhStat	scalar, Doornik-Hansen chi-squared statistic.
out.dhP	scalar, Doornik-Hansen p-value.
out.dhDf	scalar, Doornik-Hansen degrees of freedom (2*k).
out.royStat	scalar, Royston H statistic.
out.royP	scalar, Royston p-value.
out.royDf	scalar, Royston equivalent degrees of freedom.

Examples#

Example 1: Basic usage with matrix input#

rndseed 42;

// Generate multivariate normal data
X = rndn(100, 3);

// Test normality using default Henze-Zirkler method
out = mvnTest(X);

Multivariate Normality Test

Observations: 100    Variables: 3

Henze-Zirkler Test (Henze & Zirkler 1990)

                Test     Statistic       p-value
                  HZ        0.6210    7.4283e-01
                Beta        1.4788

The large p-value (0.74) indicates a failure to reject normality, as expected for data generated from a multivariate normal distribution.

Example 2: Run all tests#

rndseed 42;
X = rndn(100, 3);

// Create control structure
struct mvnTestControl ctl;
ctl = mvnTestControlCreate();
ctl.method = "all";

// Run all four tests
out = mvnTest(X, ctl);

Multivariate Normality Tests

Observations: 100    Variables: 3

Mardia's Test (Mardia & Foster 1983)

           Component     Statistic       p-value
            Skewness       -0.6600    7.4538e-01
            Kurtosis       -0.7006    7.5822e-01
            Combined        0.7283    6.9480e-01

Henze-Zirkler Test (Henze & Zirkler 1990)

                Test     Statistic       p-value
                  HZ        0.6210    7.4283e-01
                Beta        1.4788

Doornik-Hansen Test (Doornik & Hansen 2008)

                Test     Statistic      df       p-value
                  DH        3.3778       6    7.6015e-01

Royston Test (Royston 1992)

                Test     Statistic   eq.df       p-value
                   H        1.4209    3.00    7.0081e-01

All four tests fail to reject the null hypothesis of multivariate normality.

Example 3: Checking individual results#

// Check individual results programmatically
if out.hzP < 0.05;
    print "Henze-Zirkler rejects normality";
else;
    print "Henze-Zirkler: fail to reject normality (p = " out.hzP ")";
endif;

Remarks#

The Henze-Zirkler test is recommended as the default because it has good power against a wide range of alternatives and is affine invariant.
For the Henze-Zirkler test, observations are limited to 5000 due to O(N^2) memory requirements.
The Royston test requires 4 <= N <= 2000 observations.
The Doornik-Hansen test requires N >= 8 observations.
All tests require at least 2 variables (K >= 2).
Fields in the output structure are set to missing (.) for methods not run.

References#

Henze, N. & Zirkler, B. (1990). “A Class of Invariant Consistent Tests for Multivariate Normality.” Communications in Statistics - Theory and Methods, 19(10), 3595-3617.

Mardia, K.V. & Foster, K. (1983). “Omnibus Tests of Multinormality Based on Skewness and Kurtosis.” Communications in Statistics - Theory and Methods, 12(2), 207-221.

Doornik, J.A. & Hansen, H. (2008). “An Omnibus Test for Univariate and Multivariate Normality.” Oxford Bulletin of Economics and Statistics, 70, 927-939.

Royston, J.P. (1992). “Approximating the Shapiro-Wilk W-Test for Non-Normality.” Statistics and Computing, 2, 117-119.