pingouin.homoscedasticity#
- pingouin.homoscedasticity(data, dv=None, group=None, method='levene', alpha=0.05, **kwargs)[source]#
Test equality of variance.
- Parameters:
- data
pandas.DataFrame
, list or dict Iterable. Can be either a list / dictionnary of iterables or a wide- or long-format pandas dataframe.
- dvstr
Dependent variable (only when
data
is a long-format dataframe).- groupstr
Grouping variable (only when
data
is a long-format dataframe).- methodstr
Statistical test. ‘levene’ (default) performs the Levene test using
scipy.stats.levene()
, and ‘bartlett’ performs the Bartlett test usingscipy.stats.bartlett()
. The former is more robust to departure from normality.- alphafloat
Significance level.
- **kwargsoptional
Optional argument(s) passed to the lower-level
scipy.stats.levene()
function.
- data
- Returns:
- stats
pandas.DataFrame
'W/T'
: Test statistic (‘W’ for Levene, ‘T’ for Bartlett)'pval'
: p-value'equal_var'
: True ifdata
has equal variance
- stats
See also
normality
Univariate normality test.
sphericity
Mauchly’s test for sphericity.
Notes
The Bartlett \(T\) statistic [1] is defined as:
\[T = \frac{(N-k) \ln{s^{2}_{p}} - \sum_{i=1}^{k}(N_{i} - 1) \ln{s^{2}_{i}}}{1 + (1/(3(k-1)))((\sum_{i=1}^{k}{1/(N_{i} - 1))} - 1/(N-k))}\]where \(s_i^2\) is the variance of the \(i^{th}\) group, \(N\) is the total sample size, \(N_i\) is the sample size of the \(i^{th}\) group, \(k\) is the number of groups, and \(s_p^2\) is the pooled variance.
The pooled variance is a weighted average of the group variances and is defined as:
\[s^{2}_{p} = \sum_{i=1}^{k}(N_{i} - 1)s^{2}_{i}/(N-k)\]The p-value is then computed using a chi-square distribution:
\[T \sim \chi^2(k-1)\]The Levene \(W\) statistic [2] is defined as:
\[W = \frac{(N-k)} {(k-1)} \frac{\sum_{i=1}^{k}N_{i}(\overline{Z}_{i.}-\overline{Z})^{2} } {\sum_{i=1}^{k}\sum_{j=1}^{N_i}(Z_{ij}-\overline{Z}_{i.})^{2} }\]where \(Z_{ij} = |Y_{ij} - \text{median}({Y}_{i.})|\), \(\overline{Z}_{i.}\) are the group means of \(Z_{ij}\) and \(\overline{Z}\) is the grand mean of \(Z_{ij}\).
The p-value is then computed using a F-distribution:
\[W \sim F(k-1, N-k)\]Warning
Missing values are not supported for this function. Make sure to remove them before using the
pandas.DataFrame.dropna()
orpingouin.remove_na()
functions.References
[1]Bartlett, M. S. (1937). Properties of sufficiency and statistical tests. Proc. R. Soc. Lond. A, 160(901), 268-282.
[2]Brown, M. B., & Forsythe, A. B. (1974). Robust tests for the equality of variances. Journal of the American Statistical Association, 69(346), 364-367.
Examples
Levene test on a wide-format dataframe
>>> import numpy as np >>> import pingouin as pg >>> data = pg.read_dataset('mediation') >>> pg.homoscedasticity(data[['X', 'Y', 'M']]) W pval equal_var levene 1.173518 0.310707 True
Same data but using a long-format dataframe
>>> data_long = data[['X', 'Y', 'M']].melt() >>> pg.homoscedasticity(data_long, dv="value", group="variable") W pval equal_var levene 1.173518 0.310707 True
Same but using a mean center
>>> pg.homoscedasticity(data_long, dv="value", group="variable", center="mean") W pval equal_var levene 1.572239 0.209303 True
Bartlett test using a list of iterables
>>> data = [[4, 8, 9, 20, 14], np.array([5, 8, 15, 45, 12])] >>> pg.homoscedasticity(data, method="bartlett", alpha=.05) T pval equal_var bartlett 2.873569 0.090045 True