pingouin.box_m#

pingouin.box_m(data, dvs, group, alpha=0.001)[source]#

Test equality of covariance matrices using the Box’s M test.

Parameters:
datapandas.DataFrame

Long-format dataframe.

dvslist

Dependent variables.

groupstr

Grouping variable.

alphafloat

Significance level. Default is 0.001 as recommended in [2]. A non-significant p-value (higher than alpha) indicates that the covariance matrices are homogenous (= equal).

Returns:
statspandas.DataFrame
  • 'Chi2': Test statistic

  • 'pval': p-value

  • 'df': The Chi-Square statistic’s degree of freedom

  • 'equal_cov': True if data has equal covariance

Notes

Warning

Box’s M test is susceptible to errors if the data does not meet the assumption of multivariate normality or if the sample size is too large or small [3].

Pingouin uses pandas.DataFrameGroupBy.cov() to calculate the variance-covariance matrix of each group. Missing values are automatically excluded from the calculation by Pandas.

Mathematical expressions can be found in [1].

This function has been tested against the boxM package of the biotools R package [4].

References

[1]

Rencher, A. C. (2003). Methods of multivariate analysis (Vol. 492). John Wiley & Sons.

[2]

Hahs-Vaughn, D. (2016). Applied Multivariate Statistical Concepts. Taylor & Francis.

Examples

  1. Box M test with 3 dependent variables of 4 groups (equal sample size)

>>> import pandas as pd
>>> import pingouin as pg
>>> from scipy.stats import multivariate_normal as mvn
>>> data = pd.DataFrame(mvn.rvs(size=(100, 3), random_state=42),
...                     columns=['A', 'B', 'C'])
>>> data['group'] = [1] * 25 + [2] * 25 + [3] * 25 + [4] * 25
>>> data.head()
          A         B         C  group
0  0.496714 -0.138264  0.647689      1
1  1.523030 -0.234153 -0.234137      1
2  1.579213  0.767435 -0.469474      1
3  0.542560 -0.463418 -0.465730      1
4  0.241962 -1.913280 -1.724918      1
>>> pg.box_m(data, dvs=['A', 'B', 'C'], group='group')
          Chi2    df      pval  equal_cov
box  11.634185  18.0  0.865537       True
  1. Box M test with 3 dependent variables of 2 groups (unequal sample size)

>>> data = pd.DataFrame(mvn.rvs(size=(30, 2), random_state=42),
...                     columns=['A', 'B'])
>>> data['group'] = [1] * 20 + [2] * 10
>>> pg.box_m(data, dvs=['A', 'B'], group='group')
         Chi2   df      pval  equal_cov
box  0.706709  3.0  0.871625       True