pingouin.ptests#

pingouin.ptests(self, paired=False, decimals=3, padjust=None, stars=True, pval_stars={0.001: '***', 0.01: '**', 0.05: '*'}, **kwargs)[source]#

Pairwise T-test between columns of a dataframe.

T-values are reported on the lower triangle of the output pairwise matrix and p-values on the upper triangle. This method is a faster, but less exhaustive, matrix-version of the pingouin.pairwise_test() function. Missing values are automatically removed from each pairwise T-test.

Added in version 0.5.3.

Parameters:

selfpandas.DataFrame

Input dataframe.

pairedboolean

Specify whether the two observations are related (i.e. repeated measures) or independent.

decimalsint

Number of decimals to display in the output matrix.

padjuststring or None

P-values adjustment for multiple comparison

'none': no correction
'bonf': one-step Bonferroni correction
'sidak': one-step Sidak correction
'holm': step-down method using Bonferroni adjustments
'fdr_bh': Benjamini/Hochberg FDR correction
'fdr_by': Benjamini/Yekutieli FDR correction

starsboolean

If True, only significant p-values are displayed as stars using the pre-defined thresholds of pval_stars. If False, all the raw p-values are displayed.

pval_starsdict

Significance thresholds. Default is 3 stars for p-values <0.001, 2 stars for p-values <0.01 and 1 star for p-values <0.05.

**kwargsoptional

Optional argument(s) passed to the lower-level scipy functions, i.e. scipy.stats.ttest_ind() for independent T-test and scipy.stats.ttest_rel() for paired T-test.

Returns:

matpandas.DataFrame: Pairwise T-test matrix, of dtype str, with T-values on the lower triangle and p-values on the upper triangle.

Examples

>>> import numpy as np
>>> import pandas as pd
>>> import pingouin as pg
>>> # Load an example dataset of personality dimensions
>>> df = pg.read_dataset('pairwise_corr').iloc[:30, 1:]
>>> df.columns = ["N", "E", "O", 'A', "C"]
>>> # Add some missing values
>>> df.iloc[[2, 5, 20], 2] = np.nan
>>> df.iloc[[1, 4, 10], 3] = np.nan
>>> df.head().round(2)
      N     E     O     A     C
0  2.48  4.21  3.94  3.96  3.46
1  2.60  3.19  3.96   NaN  3.23
2  2.81  2.90   NaN  2.75  3.50
3  2.90  3.56  3.52  3.17  2.79
4  3.02  3.33  4.02   NaN  2.85

Independent pairwise T-tests

>>> df.ptests()
        N       E      O      A    C
N       -     ***    ***    ***  ***
E  -8.397       -                ***
O  -8.332  -0.596      -         ***
A  -8.804    0.12   0.72      -  ***
C  -4.759   3.753  4.074  3.787    -

Let’s compare with SciPy

>>> from scipy.stats import ttest_ind
>>> np.round(ttest_ind(df["N"], df["E"]), 3)
array([-8.397,  0.   ])

Passing custom parameters to the lower-level scipy.stats.ttest_ind() function

>>> df.ptests(alternative="greater", equal_var=True)
        N       E      O      A    C
N       -
E  -8.397       -                ***
O  -8.332  -0.596      -         ***
A  -8.804    0.12   0.72      -  ***
C  -4.759   3.753  4.074  3.787    -

Paired T-test, showing the actual p-values instead of stars

>>> df.ptests(paired=True, stars=False, decimals=4)
        N        E       O       A       C
N        -   0.0000  0.0000  0.0000  0.0002
E  -7.0773        -  0.8776  0.7522  0.0012
O  -8.0568  -0.1555       -  0.8137  0.0008
A  -8.3994   0.3191  0.2383       -  0.0009
C  -4.2511   3.5953  3.7849  3.7652       -

Adjusting for multiple comparisons using the Holm-Bonferroni method

>>> df.ptests(paired=True, stars=False, padjust="holm")
        N       E      O      A      C
N       -   0.000  0.000  0.000  0.001
E  -7.077       -     1.     1.  0.005
O  -8.057  -0.155      -     1.  0.005
A  -8.399   0.319  0.238      -  0.005
C  -4.251   3.595  3.785  3.765      -