pingouin.rcorr#
- pingouin.rcorr(self, method='pearson', upper='pval', decimals=3, padjust=None, stars=True, pval_stars={0.001: '***', 0.01: '**', 0.05: '*'})[source]#
Correlation matrix of a dataframe with p-values and/or sample size on the upper triangle (
pandas.DataFrame
method).This method is a faster, but less exhaustive, matrix-version of the
pingouin.pairwise_corr()
function. It is based on thepandas.DataFrame.corr()
method. Missing values are automatically removed from each pairwise correlation.- Parameters:
- self
pandas.DataFrame
Input dataframe.
- methodstr
Correlation method. Can be either ‘pearson’ or ‘spearman’.
- upperstr
If ‘pval’, the upper triangle of the output correlation matrix shows the p-values. If ‘n’, the upper triangle is the sample size used in each pairwise correlation.
- decimalsint
Number of decimals to display in the output correlation matrix.
- padjuststring or None
Method used for testing and adjustment of pvalues.
'none'
: no correction'bonf'
: one-step Bonferroni correction'sidak'
: one-step Sidak correction'holm'
: step-down method using Bonferroni adjustments'fdr_bh'
: Benjamini/Hochberg FDR correction'fdr_by'
: Benjamini/Yekutieli FDR correction
- starsboolean
If True, only significant p-values are displayed as stars using the pre-defined thresholds of
pval_stars
. If False, all the raw p-values are displayed.- pval_starsdict
Significance thresholds. Default is 3 stars for p-values < 0.001, 2 stars for p-values < 0.01 and 1 star for p-values < 0.05.
- self
- Returns:
- rcorr
pandas.DataFrame
Correlation matrix, of type str.
- rcorr
Examples
>>> import numpy as np >>> import pandas as pd >>> import pingouin as pg >>> # Load an example dataset of personality dimensions >>> df = pg.read_dataset('pairwise_corr').iloc[:, 1:] >>> # Add some missing values >>> df.iloc[[2, 5, 20], 2] = np.nan >>> df.iloc[[1, 4, 10], 3] = np.nan >>> df.head().round(2) Neuroticism Extraversion Openness Agreeableness Conscientiousness 0 2.48 4.21 3.94 3.96 3.46 1 2.60 3.19 3.96 NaN 3.23 2 2.81 2.90 NaN 2.75 3.50 3 2.90 3.56 3.52 3.17 2.79 4 3.02 3.33 4.02 NaN 2.85
>>> # Correlation matrix on the four first columns >>> df.iloc[:, 0:4].rcorr() Neuroticism Extraversion Openness Agreeableness Neuroticism - *** ** Extraversion -0.35 - *** Openness -0.01 0.265 - *** Agreeableness -0.134 0.054 0.161 -
>>> # Spearman correlation and Holm adjustement for multiple comparisons >>> df.iloc[:, 0:4].rcorr(method='spearman', padjust='holm') Neuroticism Extraversion Openness Agreeableness Neuroticism - *** ** Extraversion -0.325 - *** Openness -0.027 0.24 - *** Agreeableness -0.15 0.06 0.173 -
>>> # Compare with the pg.pairwise_corr function >>> pairwise = df.iloc[:, 0:4].pairwise_corr(method='spearman', ... padjust='holm') >>> pairwise[['X', 'Y', 'r', 'p-corr']].round(3) # Do not show all columns X Y r p-corr 0 Neuroticism Extraversion -0.325 0.000 1 Neuroticism Openness -0.027 0.543 2 Neuroticism Agreeableness -0.150 0.002 3 Extraversion Openness 0.240 0.000 4 Extraversion Agreeableness 0.060 0.358 5 Openness Agreeableness 0.173 0.000
>>> # Display the raw p-values with four decimals >>> df.iloc[:, [0, 1, 3]].rcorr(stars=False, decimals=4) Neuroticism Extraversion Agreeableness Neuroticism - 0.0000 0.0028 Extraversion -0.3501 - 0.2305 Agreeableness -0.134 0.0539 -
>>> # With the sample size on the upper triangle instead of the p-values >>> df.iloc[:, [0, 1, 2]].rcorr(upper='n') Neuroticism Extraversion Openness Neuroticism - 500 497 Extraversion -0.35 - 497 Openness -0.01 0.265 -