pingouin.pairwise_tukey#
- pingouin.pairwise_tukey(data=None, dv=None, between=None, effsize='hedges')[source]#
Pairwise Tukey-HSD post-hoc test.
- Parameters:
- data
pandas.DataFrame DataFrame. Note that this function can also directly be used as a Pandas method, in which case this argument is no longer needed.
- dvstring
Name of column containing the dependent variable.
- between: string
Name of column containing the between factor.
- effsizestring or None
Effect size type. Available methods are:
'none': no effect size'cohen': Unbiased Cohen d'hedges': Hedges g'r': Pearson correlation coefficient'eta-square': Eta-square'odds-ratio': Odds ratio'AUC': Area Under the Curve'CLES': Common Language Effect Size
- data
- Returns:
- stats
pandas.DataFrame 'A': Name of first measurement'B': Name of second measurement'mean(A)': Mean of first measurement'mean(B)': Mean of second measurement'diff': Mean difference (= mean(A) - mean(B))'se': Standard error'T': T-values'p-tukey': Tukey-HSD corrected p-values'hedges': Hedges effect size (or any effect size defined ineffsize)
- stats
See also
Notes
Tukey HSD post-hoc [1] is best for balanced one-way ANOVA.
It has been proven to be conservative for one-way ANOVA with unequal sample sizes. However, it is not robust if the groups have unequal variances, in which case the Games-Howell test is more adequate. Tukey HSD is not valid for repeated measures ANOVA. Only one-way ANOVA design are supported.
The T-values are defined as:
\[t = \frac{\overline{x}_i - \overline{x}_j} {\sqrt{2 \cdot \text{MS}_w / n}}\]where \(\overline{x}_i\) and \(\overline{x}_j\) are the means of the first and second group, respectively, \(\text{MS}_w\) the mean squares of the error (computed using ANOVA) and \(n\) the sample size.
If the sample sizes are unequal, the Tukey-Kramer procedure is automatically used:
\[t = \frac{\overline{x}_i - \overline{x}_j}{\sqrt{\frac{MS_w}{n_i} + \frac{\text{MS}_w}{n_j}}}\]where \(n_i\) and \(n_j\) are the sample sizes of the first and second group, respectively.
The p-values are then approximated using the Studentized range distribution \(Q(\sqrt2|t_i|, r, N - r)\) where \(r\) is the total number of groups and \(N\) is the total sample size.
References
[1]Tukey, John W. “Comparing individual means in the analysis of variance.” Biometrics (1949): 99-114.
[2]Gleason, John R. “An accurate, non-iterative approximation for studentized range quantiles.” Computational statistics & data analysis 31.2 (1999): 147-158.
Examples
Pairwise Tukey post-hocs on the Penguins dataset.
>>> import pingouin as pg >>> df = pg.read_dataset('penguins') >>> df.pairwise_tukey(dv='body_mass_g', between='species').round(3) A B mean(A) mean(B) diff se T p-tukey hedges 0 Adelie Chinstrap 3700.662 3733.088 -32.426 67.512 -0.480 0.881 -0.074 1 Adelie Gentoo 3700.662 5076.016 -1375.354 56.148 -24.495 0.000 -2.860 2 Chinstrap Gentoo 3733.088 5076.016 -1342.928 69.857 -19.224 0.000 -2.875