pingouin.compute_bootci#

pingouin.compute_bootci(x, y=None, func=None, method='cper', paired=False, confidence=0.95, n_boot=2000, decimals=2, seed=None, return_dist=False)[source]#

Bootstrapped confidence intervals of univariate and bivariate functions.

Parameters:
x1D-array or list

First sample. Required for both bivariate and univariate functions.

y1D-array, list, or None

Second sample. Required only for bivariate functions.

funcstr or custom function

Function to compute the bootstrapped statistic. Accepted string values are:

  • 'pearson': Pearson correlation (bivariate, paired x and y)

  • 'spearman': Spearman correlation (bivariate, paired x and y)

  • 'cohen': Cohen d effect size (bivariate, paired or unpaired x and y)

  • 'hedges': Hedges g effect size (bivariate, paired or unpaired x and y)

  • 'mean': Mean (univariate = only x)

  • 'std': Standard deviation (univariate)

  • 'var': Variance (univariate)

methodstr

Method to compute the confidence intervals (see Notes):

  • 'cper': Bias-corrected percentile method (default)

  • 'norm': Normal approximation with bootstrapped bias and standard error

  • 'per': Simple percentile

pairedboolean

Indicates whether x and y are paired or not. For example, for correlation functions or paired T-test, x and y are assumed to be paired. Pingouin will resample the pairs (x_i, y_i) when paired=True, and resample x and y separately when paired=False. If paired=True, x and y must have the same number of elements.

confidencefloat

Confidence level (0.95 = 95%)

n_bootint

Number of bootstrap iterations. The higher, the better, the slower.

decimalsint

Number of rounded decimals.

seedint or None

Random seed for generating bootstrap samples.

return_distboolean

If True, return the confidence intervals and the bootstrapped distribution (e.g. for plotting purposes).

Returns:
ciarray

Bootstrapped confidence intervals.

Notes

Results have been tested against the bootci Matlab function.

Since version 1.7, SciPy also includes a built-in bootstrap function scipy.stats.bootstrap(). The SciPy implementation has two advantages over Pingouin: it is faster when using vectorized=True, and it supports the bias-corrected and accelerated (BCa) confidence intervals for univariate functions. However, unlike Pingouin, it does not return the bootstrap distribution.

The percentile bootstrap method (per) is defined as the \(100 \times \frac{\alpha}{2}\) and \(100 \times \frac{1 - \alpha}{2}\) percentiles of the distribution of \(\theta\) estimates obtained from resampling, where \(\alpha\) is the level of significance (1 - confidence, default = 0.05 for 95% CIs).

The bias-corrected percentile method (cper) corrects for bias of the bootstrap distribution. This method is different from the BCa method — the default in Matlab and SciPy — which corrects for both bias and skewness of the bootstrap distribution using jackknife resampling.

The normal approximation method (norm) calculates the confidence intervals with the standard normal distribution using bootstrapped bias and standard error.

References

  • DiCiccio, T. J., & Efron, B. (1996). Bootstrap confidence intervals. Statistical science, 189-212.

  • Davison, A. C., & Hinkley, D. V. (1997). Bootstrap methods and their application (Vol. 1). Cambridge university press.

  • Jung, Lee, Gupta, & Cho (2019). Comparison of bootstrap confidence interval methods for GSCA using a Monte Carlo simulation. Frontiers in psychology, 10, 2215.

Examples

  1. Bootstrapped 95% confidence interval of a Pearson correlation

>>> import pingouin as pg
>>> import numpy as np
>>> rng = np.random.default_rng(42)
>>> x = rng.normal(loc=4, scale=2, size=100)
>>> y = rng.normal(loc=3, scale=1, size=100)
>>> stat = np.corrcoef(x, y)[0][1]
>>> ci = pg.compute_bootci(x, y, func='pearson', paired=True, seed=42, decimals=4)
>>> print(round(stat, 4), ci)
0.0945 [-0.098   0.2738]

Let’s compare to SciPy’s built-in bootstrap function

>>> from scipy.stats import bootstrap
>>> bt_scipy = bootstrap(
...       data=(x, y), statistic=lambda x, y: np.corrcoef(x, y)[0][1],
...       method="basic", vectorized=False, n_resamples=2000, paired=True, random_state=42)
>>> np.round(bt_scipy.confidence_interval, 4)
array([-0.0952,  0.2883])
  1. Bootstrapped 95% confidence interval of a Cohen d

>>> stat = pg.compute_effsize(x, y, eftype='cohen')
>>> ci = pg.compute_bootci(x, y, func='cohen', seed=42, decimals=3)
>>> print(round(stat, 4), ci)
0.7009 [0.403 1.009]
  1. Bootstrapped confidence interval of a standard deviation (univariate)

>>> import numpy as np
>>> stat = np.std(x, ddof=1)
>>> ci = pg.compute_bootci(x, func='std', seed=123)
>>> print(round(stat, 4), ci)
1.5534 [1.38 1.8 ]

Compare to SciPy’s built-in bootstrap function, which returns the bias-corrected and accelerated CIs (see Notes).

>>> def std(x, axis):
...     return np.std(x, ddof=1, axis=axis)
>>> bt_scipy = bootstrap(data=(x, ), statistic=std, n_resamples=2000, random_state=123)
>>> np.round(bt_scipy.confidence_interval, 2)
array([1.39, 1.81])

Changing the confidence intervals type in Pingouin

>>> pg.compute_bootci(x, func='std', seed=123, method="norm")
array([1.37, 1.76])
>>> pg.compute_bootci(x, func='std', seed=123, method="percentile")
array([1.35, 1.75])
  1. Bootstrapped confidence interval using a custom univariate function

>>> from scipy.stats import skew
>>> round(skew(x), 4), pg.compute_bootci(x, func=skew, n_boot=10000, seed=123)
(-0.137, array([-0.55,  0.32]))

5. Bootstrapped confidence interval using a custom bivariate function. Here, x and y are not paired and can therefore have different sizes.

>>> def mean_diff(x, y):
...     return np.mean(x) - np.mean(y)
>>> y2 = rng.normal(loc=3, scale=1, size=200)  # y2 has 200 samples, x has 100
>>> ci = pg.compute_bootci(x, y2, func=mean_diff, n_boot=10000, seed=123)
>>> print(round(mean_diff(x, y2), 2), ci)
0.88 [0.54 1.21]

We can also get the bootstrapped distribution

>>> ci, bt = pg.compute_bootci(x, y2, func=mean_diff, n_boot=10000, return_dist=True, seed=9)
>>> print(f"The bootstrap distribution has {bt.size} samples. The mean and standard "
...       f"{bt.mean():.4f} ± {bt.std():.4f}")
The bootstrap distribution has 10000 samples. The mean and standard 0.8807 ± 0.1704