pingouin.welch_anova#

pingouin.welch_anova(data=None, dv=None, between=None)[source]#

One-way Welch ANOVA.

Parameters:
datapandas.DataFrame

DataFrame. Note that this function can also directly be used as a Pandas method, in which case this argument is no longer needed.

dvstring

Name of column containing the dependent variable.

betweenstring

Name of column containing the between factor.

Returns:
aovpandas.DataFrame

ANOVA summary:

  • 'Source': Factor names

  • 'ddof1': Numerator degrees of freedom

  • 'ddof2': Denominator degrees of freedom

  • 'F': F-values

  • 'p-unc': uncorrected p-values

  • 'np2': Partial eta-squared

See also

anova

One-way and N-way ANOVA

rm_anova

One-way and two-way repeated measures ANOVA

mixed_anova

Two way mixed ANOVA

kruskal

Non-parametric one-way ANOVA

Notes

From Wikipedia:

It is named for its creator, Bernard Lewis Welch, and is an adaptation of Student’s t-test, and is more reliable when the two samples have unequal variances and/or unequal sample sizes.

The classic ANOVA is very powerful when the groups are normally distributed and have equal variances. However, when the groups have unequal variances, it is best to use the Welch ANOVA that better controls for type I error (Liu 2015). The homogeneity of variances can be measured with the homoscedasticity function. The two other assumptions of normality and independence remain.

The main idea of Welch ANOVA is to use a weight \(w_i\) to reduce the effect of unequal variances. This weight is calculated using the sample size \(n_i\) and variance \(s_i^2\) of each group \(i=1,...,r\):

\[w_i = \frac{n_i}{s_i^2}\]

Using these weights, the adjusted grand mean of the data is:

\[\overline{Y}_{\text{welch}} = \frac{\sum_{i=1}^r w_i\overline{Y}_i}{\sum w}\]

where \(\overline{Y}_i\) is the mean of the \(i\) group.

The effect sums of squares is defined as:

\[SS_{\text{effect}} = \sum_{i=1}^r w_i (\overline{Y}_i - \overline{Y}_{\text{welch}})^2\]

We then need to calculate a term lambda:

\[\Lambda = \frac{3\sum_{i=1}^r(\frac{1}{n_i-1}) (1 - \frac{w_i}{\sum w})^2}{r^2 - 1}\]

from which the F-value can be calculated:

\[F_{\text{welch}} = \frac{SS_{\text{effect}} / (r-1)} {1 + \frac{2\Lambda(r-2)}{3}}\]

and the p-value approximated using a F-distribution with \((r-1, 1 / \Lambda)\) degrees of freedom.

When the groups are balanced and have equal variances, the optimal post-hoc test is the Tukey-HSD test (pingouin.pairwise_tukey()). If the groups have unequal variances, the Games-Howell test is more adequate (pingouin.pairwise_gameshowell()).

Results have been tested against R.

References

[1]

Liu, Hangcheng. “Comparing Welch’s ANOVA, a Kruskal-Wallis test and traditional ANOVA in case of Heterogeneity of Variance.” (2015).

[2]

Welch, Bernard Lewis. “On the comparison of several mean values: an alternative approach.” Biometrika 38.3/4 (1951): 330-336.

Examples

  1. One-way Welch ANOVA on the pain threshold dataset.

>>> from pingouin import welch_anova, read_dataset
>>> df = read_dataset('anova')
>>> aov = welch_anova(dv='Pain threshold', between='Hair color', data=df)
>>> aov
       Source  ddof1     ddof2         F     p-unc       np2
0  Hair color      3  8.329841  5.890115  0.018813  0.575962