pingouin.rcorr#

pingouin.rcorr(self, method='pearson', upper='pval', decimals=3, padjust=None, stars=True, pval_stars={0.001: '***', 0.01: '**', 0.05: '*'})[source]#

Correlation matrix of a dataframe with p-values and/or sample size on the upper triangle (pandas.DataFrame method).

This method is a faster, but less exhaustive, matrix-version of the pingouin.pairwise_corr() function. It is based on the pandas.DataFrame.corr() method. Missing values are automatically removed from each pairwise correlation.

Parameters:

selfpandas.DataFrame

Input dataframe.

methodstr

Correlation method. Can be either ‘pearson’ or ‘spearman’.

upperstr

If ‘pval’, the upper triangle of the output correlation matrix shows the p-values. If ‘n’, the upper triangle is the sample size used in each pairwise correlation.

decimalsint

Number of decimals to display in the output correlation matrix.

padjuststring or None

Method used for testing and adjustment of pvalues.

'none': no correction
'bonf': one-step Bonferroni correction
'sidak': one-step Sidak correction
'holm': step-down method using Bonferroni adjustments
'fdr_bh': Benjamini/Hochberg FDR correction
'fdr_by': Benjamini/Yekutieli FDR correction

starsboolean

If True, only significant p-values are displayed as stars using the pre-defined thresholds of pval_stars. If False, all the raw p-values are displayed.

pval_starsdict

Significance thresholds. Default is 3 stars for p-values < 0.001, 2 stars for p-values < 0.01 and 1 star for p-values < 0.05.

Returns:

rcorrpandas.DataFrame: Correlation matrix, of type str.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> import pingouin as pg
>>> # Load an example dataset of personality dimensions
>>> df = pg.read_dataset('pairwise_corr').iloc[:, 1:]
>>> # Add some missing values
>>> df.iloc[[2, 5, 20], 2] = np.nan
>>> df.iloc[[1, 4, 10], 3] = np.nan
>>> df.head().round(2)
   Neuroticism  Extraversion  Openness  Agreeableness  Conscientiousness
0         2.48          4.21      3.94           3.96               3.46
1         2.60          3.19      3.96            NaN               3.23
2         2.81          2.90       NaN           2.75               3.50
3         2.90          3.56      3.52           3.17               2.79
4         3.02          3.33      4.02            NaN               2.85

>>> # Correlation matrix on the four first columns
>>> df.iloc[:, 0:4].rcorr()
              Neuroticism Extraversion Openness Agreeableness
Neuroticism             -          ***                     **
Extraversion        -0.35            -      ***
Openness            -0.01        0.265        -           ***
Agreeableness      -0.134        0.054    0.161             -

>>> # Spearman correlation and Holm adjustement for multiple comparisons
>>> df.iloc[:, 0:4].rcorr(method='spearman', padjust='holm')
              Neuroticism Extraversion Openness Agreeableness
Neuroticism             -          ***                     **
Extraversion       -0.325            -      ***
Openness           -0.027         0.24        -           ***
Agreeableness       -0.15         0.06    0.173             -

>>> # Compare with the pg.pairwise_corr function
>>> pairwise = df.iloc[:, 0:4].pairwise_corr(method='spearman',
...                                          padjust='holm')
>>> pairwise[['X', 'Y', 'r', 'p-corr']].round(3)  # Do not show all columns
              X              Y      r  p-corr
0   Neuroticism   Extraversion -0.325   0.000
1   Neuroticism       Openness -0.027   0.543
2   Neuroticism  Agreeableness -0.150   0.002
3  Extraversion       Openness  0.240   0.000
4  Extraversion  Agreeableness  0.060   0.358
5      Openness  Agreeableness  0.173   0.000

>>> # Display the raw p-values with four decimals
>>> df.iloc[:, [0, 1, 3]].rcorr(stars=False, decimals=4)
              Neuroticism Extraversion Agreeableness
Neuroticism             -       0.0000        0.0028
Extraversion      -0.3501            -        0.2305
Agreeableness      -0.134       0.0539             -

>>> # With the sample size on the upper triangle instead of the p-values
>>> df.iloc[:, [0, 1, 2]].rcorr(upper='n')
             Neuroticism Extraversion Openness
Neuroticism            -          500      497
Extraversion       -0.35            -      497
Openness           -0.01        0.265        -