pingouin.pairwise_corr#

pingouin.pairwise_corr(data, columns=None, covar=None, alternative='two-sided', method='pearson', padjust='none', nan_policy='pairwise')[source]#

Pairwise (partial) correlations between columns of a pandas dataframe.

Parameters:

datapandas.DataFrame

DataFrame. Note that this function can also directly be used as a Pandas method, in which case this argument is no longer needed.

columnslist or str

Column names in data:

["a", "b", "c"]: combination between columns a, b, and c.
["a"]: product between a and all the other numeric columns.
[["a"], ["b", "c"]]: product between [“a”] and [“b”, “c”].
[["a", "d"], ["b", "c"]]: product between [“a”, “d”] and [“b”, “c”].
[["a", "d"], None]: product between [“a”, “d”] and all other numeric columns in dataframe.

If column is None, the function will return the pairwise correlation between the combination of all the numeric columns in data. See the examples section for more details on this.

covarNone, string or list

Covariate(s) for partial correlation. Must be one or more columns in data. Use a list if there are more than one covariate. If covar is not None, a partial correlation will be computed using pingouin.partial_corr() function.

Important

Only method='pearson' and method='spearman' are currently supported in partial correlation.

alternativestring

Defines the alternative hypothesis, or tail of the correlation. Must be one of “two-sided” (default), “greater” or “less”. Both “greater” and “less” return a one-sided p-value. “greater” tests against the alternative hypothesis that the correlation is positive (greater than zero), “less” tests against the hypothesis that the correlation is negative.

methodstring

Correlation type:

'pearson': Pearson \(r\) product-moment correlation
'spearman': Spearman \(\rho\) rank-order correlation
'kendall': Kendall’s \(\tau_B\) correlation (for ordinal data)
'bicor': Biweight midcorrelation (robust)
'percbend': Percentage bend correlation (robust)
'shepherd': Shepherd’s pi correlation (robust)
'skipped': Skipped correlation (robust)

padjuststring

Method used for testing and adjustment of pvalues.

'none': no correction
'bonf': one-step Bonferroni correction
'sidak': one-step Sidak correction
'holm': step-down method using Bonferroni adjustments
'fdr_bh': Benjamini/Hochberg FDR correction
'fdr_by': Benjamini/Yekutieli FDR correction

nan_policystring

Can be 'listwise' for listwise deletion of missing values (= complete-case analysis) or 'pairwise' (default) for the more liberal pairwise deletion (= available-case analysis).

Added in version 0.2.9.

Returns:

statspandas.DataFrame

'X': Name(s) of first columns.
'Y': Name(s) of second columns.
'method': Correlation type.
'covar': List of specified covariate(s), only when covariates are passed.
'alternative': Tail of the test.
'n': Sample size (after removal of missing values).
'r': Correlation coefficients.
'CI95': 95% parametric confidence intervals.
'p-unc': Uncorrected p-values.
'p-corr': Corrected p-values.
'p-adjust': P-values correction method.
'BF10': Bayes Factor of the alternative hypothesis (only for Pearson correlation)
'power': achieved power of the test (= 1 - type II error).

Notes

Please refer to the pingouin.corr() function for a description of the different methods. Missing values are automatically removed from the data using a pairwise deletion.

This function is more flexible and gives a much more detailed output than the pandas.DataFrame.corr() method (i.e. p-values, confidence interval, Bayes Factor…). This comes however at an increased computational cost. While this should not be discernible for a dataframe with less than 10,000 rows and/or less than 20 columns, this function can be slow for very large datasets.

A faster alternative to get the r-values and p-values in a matrix format is to use the pingouin.rcorr() function, which works directly as a pandas.DataFrame method (see example below).

This function also works with two-dimensional multi-index columns. In this case, columns must be list(s) of tuple(s). Please refer to this example Jupyter notebook for more details.

If and only if covar is specified, this function will compute the pairwise partial correlation between the variables. If you are only interested in computing the partial correlation matrix (i.e. the raw pairwise partial correlation coefficient matrix, without the p-values, sample sizes, etc), a better alternative is to use the pingouin.pcorr() function (see example 7).

Examples

One-sided spearman correlation corrected for multiple comparisons

>>> import pandas as pd
>>> import pingouin as pg
>>> pd.set_option('display.expand_frame_repr', False)
>>> pd.set_option('display.max_columns', 20)
>>> data = pg.read_dataset('pairwise_corr').iloc[:, 1:]
>>> pg.pairwise_corr(data, method='spearman', alternative='greater', padjust='bonf').round(3)
               X                  Y    method alternative    n      r         CI95%  p-unc  p-corr p-adjust  power
0    Neuroticism       Extraversion  spearman     greater  500 -0.325  [-0.39, 1.0]  1.000   1.000     bonf  0.000
1    Neuroticism           Openness  spearman     greater  500 -0.028   [-0.1, 1.0]  0.735   1.000     bonf  0.012
2    Neuroticism      Agreeableness  spearman     greater  500 -0.151  [-0.22, 1.0]  1.000   1.000     bonf  0.000
3    Neuroticism  Conscientiousness  spearman     greater  500 -0.356  [-0.42, 1.0]  1.000   1.000     bonf  0.000
4   Extraversion           Openness  spearman     greater  500  0.243   [0.17, 1.0]  0.000   0.000     bonf  1.000
5   Extraversion      Agreeableness  spearman     greater  500  0.062  [-0.01, 1.0]  0.083   0.832     bonf  0.398
6   Extraversion  Conscientiousness  spearman     greater  500  0.056  [-0.02, 1.0]  0.106   1.000     bonf  0.345
7       Openness      Agreeableness  spearman     greater  500  0.170    [0.1, 1.0]  0.000   0.001     bonf  0.985
8       Openness  Conscientiousness  spearman     greater  500 -0.007  [-0.08, 1.0]  0.560   1.000     bonf  0.036
9  Agreeableness  Conscientiousness  spearman     greater  500  0.161   [0.09, 1.0]  0.000   0.002     bonf  0.976

Robust two-sided biweight midcorrelation with uncorrected p-values

>>> pcor = pg.pairwise_corr(data, columns=['Openness', 'Extraversion',
...                                        'Neuroticism'], method='bicor')
>>> pcor.round(3)
              X             Y method alternative    n      r           CI95%  p-unc  power
0      Openness  Extraversion  bicor   two-sided  500  0.247    [0.16, 0.33]  0.000  1.000
1      Openness   Neuroticism  bicor   two-sided  500 -0.028   [-0.12, 0.06]  0.535  0.095
2  Extraversion   Neuroticism  bicor   two-sided  500 -0.343  [-0.42, -0.26]  0.000  1.000

One-versus-all pairwise correlations

>>> pg.pairwise_corr(data, columns=['Neuroticism']).round(3)
             X                  Y   method alternative    n      r           CI95%  p-unc       BF10  power
0  Neuroticism       Extraversion  pearson   two-sided  500 -0.350  [-0.42, -0.27]  0.000  6.765e+12  1.000
1  Neuroticism           Openness  pearson   two-sided  500 -0.010    [-0.1, 0.08]  0.817      0.058  0.056
2  Neuroticism      Agreeableness  pearson   two-sided  500 -0.134  [-0.22, -0.05]  0.003      5.122  0.854
3  Neuroticism  Conscientiousness  pearson   two-sided  500 -0.368  [-0.44, -0.29]  0.000  2.644e+14  1.000

Pairwise correlations between two lists of columns (cartesian product)

>>> columns = [['Neuroticism', 'Extraversion'], ['Openness']]
>>> pg.pairwise_corr(data, columns).round(3)
              X         Y   method alternative    n      r         CI95%  p-unc       BF10  power
0   Neuroticism  Openness  pearson   two-sided  500 -0.010  [-0.1, 0.08]  0.817      0.058  0.056
1  Extraversion  Openness  pearson   two-sided  500  0.267  [0.18, 0.35]  0.000  5.277e+06  1.000

As a Pandas method

>>> pcor = data.pairwise_corr(covar='Neuroticism', method='spearman')

Pairwise partial correlation

>>> pg.pairwise_corr(data, covar=['Neuroticism', 'Openness'])
               X                  Y   method                        covar alternative    n         r          CI95%     p-unc
0   Extraversion      Agreeableness  pearson  ['Neuroticism', 'Openness']   two-sided  500 -0.038737  [-0.13, 0.05]  0.388361
1   Extraversion  Conscientiousness  pearson  ['Neuroticism', 'Openness']   two-sided  500 -0.071427  [-0.16, 0.02]  0.111389
2  Agreeableness  Conscientiousness  pearson  ['Neuroticism', 'Openness']   two-sided  500  0.123108   [0.04, 0.21]  0.005944

Pairwise partial correlation matrix using pingouin.pcorr()

>>> data[['Neuroticism', 'Openness', 'Extraversion']].pcorr().round(3)
              Neuroticism  Openness  Extraversion
Neuroticism         1.000     0.092        -0.360
Openness            0.092     1.000         0.281
Extraversion       -0.360     0.281         1.000

Correlation matrix with p-values using pingouin.rcorr()

>>> data[['Neuroticism', 'Openness', 'Extraversion']].rcorr()
             Neuroticism Openness Extraversion
Neuroticism            -                   ***
Openness           -0.01        -          ***
Extraversion       -0.35    0.267            -