pingouin.partial_corr#
- pingouin.partial_corr(data=None, x=None, y=None, covar=None, x_covar=None, y_covar=None, alternative='two-sided', method='pearson')[source]#
Partial and semi-partial correlation.
- Parameters:
- data
pandas.DataFrame
Pandas Dataframe. Note that this function can also directly be used as a
pandas.DataFrame
method, in which case this argument is no longer needed.- x, ystring
x and y. Must be names of columns in
data
.- covarstring or list
Covariate(s). Must be a names of columns in
data
. Use a list if there are two or more covariates.- x_covarstring or list
Covariate(s) for the
x
variable. This is used to compute semi-partial correlation (i.e. the effect ofx_covar
is removed fromx
but not fromy
). Only one ofcovar
,x_covar
andy_covar
can be specified.- y_covarstring or list
Covariate(s) for the
y
variable. This is used to compute semi-partial correlation (i.e. the effect ofy_covar
is removed fromy
but not fromx
). Only one ofcovar
,x_covar
andy_covar
can be specified.- alternativestring
Defines the alternative hypothesis, or tail of the partial correlation. Must be one of “two-sided” (default), “greater” or “less”. Both “greater” and “less” return a one-sided p-value. “greater” tests against the alternative hypothesis that the partial correlation is positive (greater than zero), “less” tests against the hypothesis that the partial correlation is negative.
- methodstring
Correlation type:
'pearson'
: Pearson \(r\) product-moment correlation'spearman'
: Spearman \(\rho\) rank-order correlation
- data
- Returns:
- stats
pandas.DataFrame
'n'
: Sample size (after removal of missing values)'r'
: Partial correlation coefficient'CI95'
: 95% parametric confidence intervals around \(r\)'p-val'
: p-value
- stats
See also
Notes
Partial correlation [1] measures the degree of association between
x
andy
, after removing the effect of one or more controlling variables (covar
, or \(Z\)). Practically, this is achieved by calculating the correlation coefficient between the residuals of two linear regressions:\[x \sim Z, y \sim Z\]Like the correlation coefficient, the partial correlation coefficient takes on a value in the range from –1 to 1, where 1 indicates a perfect positive association.
The semipartial correlation is similar to the partial correlation, with the exception that the set of controlling variables is only removed for either
x
ory
, but not both.Pingouin uses the method described in [2] to calculate the (semi)partial correlation coefficients and associated p-values. This method is based on the inverse covariance matrix and is significantly faster than the traditional regression-based method. Results have been tested against the ppcor R package.
Important
Rows with missing values are automatically removed from data.
References
Examples
Partial correlation with one covariate
>>> import pingouin as pg >>> df = pg.read_dataset('partial_corr') >>> pg.partial_corr(data=df, x='x', y='y', covar='cv1').round(3) n r CI95% p-val pearson 30 0.568 [0.25, 0.77] 0.001
Spearman partial correlation with several covariates
>>> # Partial correlation of x and y controlling for cv1, cv2 and cv3 >>> pg.partial_corr(data=df, x='x', y='y', covar=['cv1', 'cv2', 'cv3'], ... method='spearman').round(3) n r CI95% p-val spearman 30 0.521 [0.18, 0.75] 0.005
Same but one-sided test
>>> pg.partial_corr(data=df, x='x', y='y', covar=['cv1', 'cv2', 'cv3'], ... alternative="greater", method='spearman').round(3) n r CI95% p-val spearman 30 0.521 [0.24, 1.0] 0.003
>>> pg.partial_corr(data=df, x='x', y='y', covar=['cv1', 'cv2', 'cv3'], ... alternative="less", method='spearman').round(3) n r CI95% p-val spearman 30 0.521 [-1.0, 0.72] 0.997
As a pandas method
>>> df.partial_corr(x='x', y='y', covar=['cv1'], method='spearman').round(3) n r CI95% p-val spearman 30 0.578 [0.27, 0.78] 0.001
Partial correlation matrix (returns only the correlation coefficients)
>>> df.pcorr().round(3) x y cv1 cv2 cv3 x 1.000 0.493 -0.095 0.130 -0.385 y 0.493 1.000 -0.007 0.104 -0.002 cv1 -0.095 -0.007 1.000 -0.241 -0.470 cv2 0.130 0.104 -0.241 1.000 -0.118 cv3 -0.385 -0.002 -0.470 -0.118 1.000
Semi-partial correlation on x
>>> pg.partial_corr(data=df, x='x', y='y', x_covar=['cv1', 'cv2', 'cv3']).round(3) n r CI95% p-val pearson 30 0.463 [0.1, 0.72] 0.015