pingouin.multicomp#
- pingouin.multicomp(pvals, alpha=0.05, method='holm')[source]#
P-values correction for multiple comparisons.
- Parameters:
- pvalsarray_like
Uncorrected p-values.
- alphafloat
Significance level.
- methodstring
Method used for testing and adjustment of p-values. Can be either the full name or initial letters. Available methods are:
'bonf'
: one-step Bonferroni correction'sidak'
: one-step Sidak correction'holm'
: step-down method using Bonferroni adjustments'fdr_bh'
: Benjamini/Hochberg FDR correction'fdr_by'
: Benjamini/Yekutieli FDR correction'none'
: pass-through option (no correction applied)
- Returns:
- rejectarray, boolean
True for hypothesis that can be rejected for given alpha.
- pvals_correctedarray
P-values corrected for multiple testing.
Notes
This function is similar to the p.adjust R function.
The correction methods include the Bonferroni correction (
'bonf'
) in which the p-values are multiplied by the number of comparisons. Less conservative methods are also included such as Sidak (1967) ('sidak'
), Holm (1979) ('holm'
), Benjamini & Hochberg (1995) ('fdr_bh'
), and Benjamini & Yekutieli (2001) ('fdr_by'
), respectively.The first three methods are designed to give strong control of the family-wise error rate. Note that the Holm’s method is usually preferred. The
'fdr_bh'
and'fdr_by'
methods control the false discovery rate, i.e. the expected proportion of false discoveries amongst the rejected hypotheses. The false discovery rate is a less stringent condition than the family-wise error rate, so these methods are more powerful than the others.The Bonferroni [1] adjusted p-values are defined as:
\[\widetilde {p}_{{(i)}}= n \cdot p_{{(i)}}\]where \(n\) is the number of finite p-values (i.e. excluding NaN).
The Sidak [2] adjusted p-values are defined as:
\[\widetilde {p}_{{(i)}}= 1 - (1 - p_{{(i)}})^{n}\]The Holm [3] adjusted p-values are the running maximum of the sorted p-values divided by the corresponding increasing alpha level:
\[\widetilde {p}_{{(i)}}=\max _{{j\leq i}}\left\{(n-j+1)p_{{(j)}} \right\}_{{1}}\]The Benjamini–Hochberg procedure (BH step-up procedure, [4]) controls the false discovery rate (FDR) at level \(\alpha\). It works as follows:
1. For a given \(\alpha\), find the largest \(k\) such that \(P_{(k)}\leq \frac {k}{n}\alpha.\)
2. Reject the null hypothesis for all \(H_{(i)}\) for \(i = 1, \ldots, k\).
The BH procedure is valid when the \(n\) tests are independent, and also in various scenarios of dependence, but is not universally valid.
The Benjamini–Yekutieli procedure (BY, [5]) controls the FDR under arbitrary dependence assumptions. This refinement modifies the threshold and finds the largest \(k\) such that:
\[P_{(k)} \leq \frac{k}{n \cdot c(n)} \alpha\]References
[1]Bonferroni, C. E. (1935). Il calcolo delle assicurazioni su gruppi di teste. Studi in onore del professore salvatore ortu carboni, 13-60.
[2]Šidák, Z. K. (1967). “Rectangular Confidence Regions for the Means of Multivariate Normal Distributions”. Journal of the American Statistical Association. 62 (318): 626–633.
[3]Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6, 65–70.
[4]Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B, 57, 289–300.
[5]Benjamini, Y., and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics, 29, 1165–1188.
Examples
FDR correction of an array of p-values
>>> import pingouin as pg >>> pvals = [.50, .003, .32, .054, .0003] >>> reject, pvals_corr = pg.multicomp(pvals, method='fdr_bh') >>> print(reject, pvals_corr) [False True False False True] [0.5 0.0075 0.4 0.09 0.0015]
Holm correction with missing values
>>> import numpy as np >>> pvals[2] = np.nan >>> reject, pvals_corr = pg.multicomp(pvals, method='holm') >>> print(reject, pvals_corr) [False True False False True] [0.5 0.009 nan 0.108 0.0012]