pingouin.homoscedasticity#

pingouin.homoscedasticity(data, dv=None, group=None, method='levene', alpha=0.05, **kwargs)[source]#

Test equality of variance.

Parameters:
datapandas.DataFrame, list or dict

Iterable. Can be either a list / dictionnary of iterables or a wide- or long-format pandas dataframe.

dvstr

Dependent variable (only when data is a long-format dataframe).

groupstr

Grouping variable (only when data is a long-format dataframe).

methodstr

Statistical test. ‘levene’ (default) performs the Levene test using scipy.stats.levene(), and ‘bartlett’ performs the Bartlett test using scipy.stats.bartlett(). The former is more robust to departure from normality.

alphafloat

Significance level.

**kwargsoptional

Optional argument(s) passed to the lower-level scipy.stats.levene() function.

Returns:
statspandas.DataFrame
  • 'W/T': Test statistic (‘W’ for Levene, ‘T’ for Bartlett)

  • 'pval': p-value

  • 'equal_var': True if data has equal variance

See also

normality

Univariate normality test.

sphericity

Mauchly’s test for sphericity.

Notes

The Bartlett \(T\) statistic [1] is defined as:

\[T = \frac{(N-k) \ln{s^{2}_{p}} - \sum_{i=1}^{k}(N_{i} - 1) \ln{s^{2}_{i}}}{1 + (1/(3(k-1)))((\sum_{i=1}^{k}{1/(N_{i} - 1))} - 1/(N-k))}\]

where \(s_i^2\) is the variance of the \(i^{th}\) group, \(N\) is the total sample size, \(N_i\) is the sample size of the \(i^{th}\) group, \(k\) is the number of groups, and \(s_p^2\) is the pooled variance.

The pooled variance is a weighted average of the group variances and is defined as:

\[s^{2}_{p} = \sum_{i=1}^{k}(N_{i} - 1)s^{2}_{i}/(N-k)\]

The p-value is then computed using a chi-square distribution:

\[T \sim \chi^2(k-1)\]

The Levene \(W\) statistic [2] is defined as:

\[W = \frac{(N-k)} {(k-1)} \frac{\sum_{i=1}^{k}N_{i}(\overline{Z}_{i.}-\overline{Z})^{2} } {\sum_{i=1}^{k}\sum_{j=1}^{N_i}(Z_{ij}-\overline{Z}_{i.})^{2} }\]

where \(Z_{ij} = |Y_{ij} - \text{median}({Y}_{i.})|\), \(\overline{Z}_{i.}\) are the group means of \(Z_{ij}\) and \(\overline{Z}\) is the grand mean of \(Z_{ij}\).

The p-value is then computed using a F-distribution:

\[W \sim F(k-1, N-k)\]

Warning

Missing values are not supported for this function. Make sure to remove them before using the pandas.DataFrame.dropna() or pingouin.remove_na() functions.

References

[1]

Bartlett, M. S. (1937). Properties of sufficiency and statistical tests. Proc. R. Soc. Lond. A, 160(901), 268-282.

[2]

Brown, M. B., & Forsythe, A. B. (1974). Robust tests for the equality of variances. Journal of the American Statistical Association, 69(346), 364-367.

Examples

  1. Levene test on a wide-format dataframe

>>> import numpy as np
>>> import pingouin as pg
>>> data = pg.read_dataset('mediation')
>>> pg.homoscedasticity(data[['X', 'Y', 'M']])
               W      pval  equal_var
levene  1.173518  0.310707       True
  1. Same data but using a long-format dataframe

>>> data_long = data[['X', 'Y', 'M']].melt()
>>> pg.homoscedasticity(data_long, dv="value", group="variable")
               W      pval  equal_var
levene  1.173518  0.310707       True
  1. Same but using a mean center

>>> pg.homoscedasticity(data_long, dv="value", group="variable", center="mean")
               W      pval  equal_var
levene  1.572239  0.209303       True
  1. Bartlett test using a list of iterables

>>> data = [[4, 8, 9, 20, 14], np.array([5, 8, 15, 45, 12])]
>>> pg.homoscedasticity(data, method="bartlett", alpha=.05)
                 T      pval  equal_var
bartlett  2.873569  0.090045       True