pingouin.multicomp#

pingouin.multicomp(pvals, alpha=0.05, method='holm')[source]#

P-values correction for multiple comparisons.

Parameters:
pvalsarray_like

Uncorrected p-values.

alphafloat

Significance level.

methodstring

Method used for testing and adjustment of p-values. Can be either the full name or initial letters. Available methods are:

  • 'bonf': one-step Bonferroni correction

  • 'sidak': one-step Sidak correction

  • 'holm': step-down method using Bonferroni adjustments

  • 'fdr_bh': Benjamini/Hochberg FDR correction

  • 'fdr_by': Benjamini/Yekutieli FDR correction

  • 'none': pass-through option (no correction applied)

Returns:
rejectarray, boolean

True for hypothesis that can be rejected for given alpha.

pvals_correctedarray

P-values corrected for multiple testing.

Notes

This function is similar to the p.adjust R function.

The correction methods include the Bonferroni correction ('bonf') in which the p-values are multiplied by the number of comparisons. Less conservative methods are also included such as Sidak (1967) ('sidak'), Holm (1979) ('holm'), Benjamini & Hochberg (1995) ('fdr_bh'), and Benjamini & Yekutieli (2001) ('fdr_by'), respectively.

The first three methods are designed to give strong control of the family-wise error rate. Note that the Holm’s method is usually preferred. The 'fdr_bh' and 'fdr_by' methods control the false discovery rate, i.e. the expected proportion of false discoveries amongst the rejected hypotheses. The false discovery rate is a less stringent condition than the family-wise error rate, so these methods are more powerful than the others.

The Bonferroni [1] adjusted p-values are defined as:

\[\widetilde {p}_{{(i)}}= n \cdot p_{{(i)}}\]

where \(n\) is the number of finite p-values (i.e. excluding NaN).

The Sidak [2] adjusted p-values are defined as:

\[\widetilde {p}_{{(i)}}= 1 - (1 - p_{{(i)}})^{n}\]

The Holm [3] adjusted p-values are the running maximum of the sorted p-values divided by the corresponding increasing alpha level:

\[\widetilde {p}_{{(i)}}=\max _{{j\leq i}}\left\{(n-j+1)p_{{(j)}} \right\}_{{1}}\]

The Benjamini–Hochberg procedure (BH step-up procedure, [4]) controls the false discovery rate (FDR) at level \(\alpha\). It works as follows:

1. For a given \(\alpha\), find the largest \(k\) such that \(P_{(k)}\leq \frac {k}{n}\alpha.\)

2. Reject the null hypothesis for all \(H_{(i)}\) for \(i = 1, \ldots, k\).

The BH procedure is valid when the \(n\) tests are independent, and also in various scenarios of dependence, but is not universally valid.

The Benjamini–Yekutieli procedure (BY, [5]) controls the FDR under arbitrary dependence assumptions. This refinement modifies the threshold and finds the largest \(k\) such that:

\[P_{(k)} \leq \frac{k}{n \cdot c(n)} \alpha\]

References

[1]

Bonferroni, C. E. (1935). Il calcolo delle assicurazioni su gruppi di teste. Studi in onore del professore salvatore ortu carboni, 13-60.

[2]

Šidák, Z. K. (1967). “Rectangular Confidence Regions for the Means of Multivariate Normal Distributions”. Journal of the American Statistical Association. 62 (318): 626–633.

[3]

Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6, 65–70.

[4]

Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B, 57, 289–300.

[5]

Benjamini, Y., and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics, 29, 1165–1188.

Examples

FDR correction of an array of p-values

>>> import pingouin as pg
>>> pvals = [.50, .003, .32, .054, .0003]
>>> reject, pvals_corr = pg.multicomp(pvals, method='fdr_bh')
>>> print(reject, pvals_corr)
[False  True False False  True] [0.5    0.0075 0.4    0.09   0.0015]

Holm correction with missing values

>>> import numpy as np
>>> pvals[2] = np.nan
>>> reject, pvals_corr = pg.multicomp(pvals, method='holm')
>>> print(reject, pvals_corr)
[False  True False False  True] [0.5    0.009     nan 0.108  0.0012]