pingouin.pairwise_tukey#
- pingouin.pairwise_tukey(data=None, dv=None, between=None, effsize='hedges')[source]#
Pairwise Tukey-HSD post-hoc test.
- Parameters:
- data
pandas.DataFrame
DataFrame. Note that this function can also directly be used as a Pandas method, in which case this argument is no longer needed.
- dvstring
Name of column containing the dependent variable.
- between: string
Name of column containing the between factor.
- effsizestring or None
Effect size type. Available methods are:
'none'
: no effect size'cohen'
: Unbiased Cohen d'hedges'
: Hedges g'r'
: Pearson correlation coefficient'eta-square'
: Eta-square'odds-ratio'
: Odds ratio'AUC'
: Area Under the Curve'CLES'
: Common Language Effect Size
- data
- Returns:
- stats
pandas.DataFrame
'A'
: Name of first measurement'B'
: Name of second measurement'mean(A)'
: Mean of first measurement'mean(B)'
: Mean of second measurement'diff'
: Mean difference (= mean(A) - mean(B))'se'
: Standard error'T'
: T-values'p-tukey'
: Tukey-HSD corrected p-values'hedges'
: Hedges effect size (or any effect size defined ineffsize
)
- stats
See also
Notes
Tukey HSD post-hoc [1] is best for balanced one-way ANOVA.
It has been proven to be conservative for one-way ANOVA with unequal sample sizes. However, it is not robust if the groups have unequal variances, in which case the Games-Howell test is more adequate. Tukey HSD is not valid for repeated measures ANOVA. Only one-way ANOVA design are supported.
The T-values are defined as:
\[t = \frac{\overline{x}_i - \overline{x}_j} {\sqrt{2 \cdot \text{MS}_w / n}}\]where \(\overline{x}_i\) and \(\overline{x}_j\) are the means of the first and second group, respectively, \(\text{MS}_w\) the mean squares of the error (computed using ANOVA) and \(n\) the sample size.
If the sample sizes are unequal, the Tukey-Kramer procedure is automatically used:
\[t = \frac{\overline{x}_i - \overline{x}_j}{\sqrt{\frac{MS_w}{n_i} + \frac{\text{MS}_w}{n_j}}}\]where \(n_i\) and \(n_j\) are the sample sizes of the first and second group, respectively.
The p-values are then approximated using the Studentized range distribution \(Q(\sqrt2|t_i|, r, N - r)\) where \(r\) is the total number of groups and \(N\) is the total sample size.
References
[1]Tukey, John W. “Comparing individual means in the analysis of variance.” Biometrics (1949): 99-114.
[2]Gleason, John R. “An accurate, non-iterative approximation for studentized range quantiles.” Computational statistics & data analysis 31.2 (1999): 147-158.
Examples
Pairwise Tukey post-hocs on the Penguins dataset.
>>> import pingouin as pg >>> df = pg.read_dataset('penguins') >>> df.pairwise_tukey(dv='body_mass_g', between='species').round(3) A B mean(A) mean(B) diff se T p-tukey hedges 0 Adelie Chinstrap 3700.662 3733.088 -32.426 67.512 -0.480 0.881 -0.074 1 Adelie Gentoo 3700.662 5076.016 -1375.354 56.148 -24.495 0.000 -2.860 2 Chinstrap Gentoo 3733.088 5076.016 -1342.928 69.857 -19.224 0.000 -2.875