pingouin.mwu#
- pingouin.mwu(x, y, alternative='two-sided', **kwargs)[source]#
Mann-Whitney U Test (= Wilcoxon rank-sum test). It is the non-parametric version of the independent T-test.
- Parameters:
- x, yarray_like
First and second set of observations.
x
andy
must be independent.- alternativestring
Defines the alternative hypothesis, or tail of the test. Must be one of “two-sided” (default), “greater” or “less”. See
scipy.stats.mannwhitneyu()
for more details.- **kwargsdict
Additional keywords arguments that are passed to
scipy.stats.mannwhitneyu()
.
- Returns:
- stats
pandas.DataFrame
'U-val'
: U-value corresponding with sample x'alternative'
: tail of the test'p-val'
: p-value'RBC'
: rank-biserial correlation'CLES'
: common language effect size
- stats
See also
Notes
The Mann–Whitney U test [1] (also called Wilcoxon rank-sum test) is a non-parametric test of the null hypothesis that it is equally likely that a randomly selected value from one sample will be less than or greater than a randomly selected value from a second sample. The test assumes that the two samples are independent. This test corrects for ties and by default uses a continuity correction (see
scipy.stats.mannwhitneyu()
for details).The rank biserial correlation [2] is the difference between the proportion of favorable evidence minus the proportion of unfavorable evidence. Values range from -1 to 1, with negative values indicating that y > x, and positive values indicating x > y.
The common language effect size is the proportion of pairs where
x
is higher thany
. It was first introduced by McGraw and Wong (1992) [3]. Pingouin uses a brute-force version of the formula given by Vargha and Delaney 2000 [4]:\[\text{CL} = P(X > Y) + .5 \times P(X = Y)\]The advantage is of this method are twofold. First, the brute-force approach pairs each observation of
x
to itsy
counterpart, and therefore does not require normally distributed data. Second, the formula takes ties into account and therefore works with ordinal data.When tail is
'less'
, the CLES is then set to \(1 - \text{CL}\), which gives the proportion of pairs wherex
is lower thany
.References
[1]Mann, H. B., & Whitney, D. R. (1947). On a test of whether one of two random variables is stochastically larger than the other. The annals of mathematical statistics, 50-60.
[2]Kerby, D. S. (2014). The simple difference formula: An approach to teaching nonparametric correlation. Comprehensive Psychology, 3, 11-IT.
[3]McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological bulletin, 111(2), 361.
[4]Vargha, A., & Delaney, H. D. (2000). A Critique and Improvement of the “CL” Common Language Effect Size Statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics: A Quarterly Publication Sponsored by the American Educational Research Association and the American Statistical Association, 25(2), 101–132. https://doi.org/10.2307/1165329
Examples
>>> import numpy as np >>> import pingouin as pg >>> np.random.seed(123) >>> x = np.random.uniform(low=0, high=1, size=20) >>> y = np.random.uniform(low=0.2, high=1.2, size=20) >>> pg.mwu(x, y, alternative='two-sided') U-val alternative p-val RBC CLES MWU 97.0 two-sided 0.00556 -0.515 0.2425
Compare with SciPy
>>> import scipy >>> scipy.stats.mannwhitneyu(x, y, use_continuity=True, alternative='two-sided') MannwhitneyuResult(statistic=97.0, pvalue=0.0055604599321374135)
One-sided test
>>> pg.mwu(x, y, alternative='greater') U-val alternative p-val RBC CLES MWU 97.0 greater 0.997442 -0.515 0.2425
>>> pg.mwu(x, y, alternative='less') U-val alternative p-val RBC CLES MWU 97.0 less 0.00278 -0.515 0.7575
Passing keyword arguments to
scipy.stats.mannwhitneyu()
:>>> pg.mwu(x, y, alternative='two-sided', method='exact') U-val alternative p-val RBC CLES MWU 97.0 two-sided 0.004681 -0.515 0.2425
Reversing the order of x and y.
>>> pg.mwu(y, x) U-val alternative p-val RBC CLES MWU 303.0 two-sided 0.00556 0.515 0.7575