pingouin.sphericity#

pingouin.sphericity(data, dv=None, within=None, subject=None, method='mauchly', alpha=0.05)[source]#

Mauchly and JNS test for sphericity.

Parameters:

datapandas.DataFrame

DataFrame containing the repeated measurements. Both wide and long-format dataframe are supported for this function. To test for an interaction term between two repeated measures factors with a wide-format dataframe, data must have a two-levels pandas.MultiIndex columns.

dvstring

Name of column containing the dependent variable (only required if data is in long format).

withinstring

Name of column containing the within factor (only required if data is in long format). If within is a list with two strings, this function computes the epsilon factor for the interaction between the two within-subject factor.

subjectstring

Name of column containing the subject identifier (only required if data is in long format).

methodstr

Method to compute sphericity:

‘jns’: John, Nagao and Sugiura test.
‘mauchly’: Mauchly test (default).

alphafloat

Significance level

Returns:

spherboolean: True if data have the sphericity property.
Wfloat: Test statistic.
chi2float: Chi-square statistic.
dofint: Degrees of freedom.
pvalfloat: P-value.

Raises:

ValueError: When testing for an interaction, if both within-subject factors have more than 2 levels (not yet supported in Pingouin).

See also

epsilon: Epsilon adjustement factor for repeated measures.
homoscedasticity: Test equality of variance.
normality: Univariate normality test.

Notes

The Mauchly \(W\) statistic [1] is defined by:

\[W = \frac{\prod \lambda_j}{(\frac{1}{k-1} \sum \lambda_j)^{k-1}}\]

where \(\lambda_j\) are the eigenvalues of the population covariance matrix (= double-centered sample covariance matrix) and \(k\) is the number of conditions.

From then, the \(W\) statistic is transformed into a chi-square score using the number of observations per condition \(n\)

\[f = \frac{2(k-1)^2+k+1}{6(k-1)(n-1)}\]

\[\chi_w^2 = (f-1)(n-1) \text{log}(W)\]

The p-value is then approximated using a chi-square distribution:

\[\chi_w^2 \sim \chi^2(\frac{k(k-1)}{2}-1)\]

The JNS \(V\) statistic ([2], [3], [4]) is defined by:

\[V = \frac{(\sum_j^{k-1} \lambda_j)^2}{\sum_j^{k-1} \lambda_j^2}\]

\[\chi_v^2 = \frac{n}{2} (k-1)^2 (V - \frac{1}{k-1})\]

and the p-value approximated using a chi-square distribution

\[\chi_v^2 \sim \chi^2(\frac{k(k-1)}{2}-1)\]

Missing values are automatically removed from data (listwise deletion).

References

[1]

Mauchly, J. W. (1940). Significance test for sphericity of a normal n-variate distribution. The Annals of Mathematical Statistics, 11(2), 204-209.

[2]

Nagao, H. (1973). On some test criteria for covariance matrix. The Annals of Statistics, 700-709.

[3]

Sugiura, N. (1972). Locally best invariant test for sphericity and the limiting distributions. The Annals of Mathematical Statistics, 1312-1316.

[4]

John, S. (1972). The distribution of a statistic used for testing sphericity of normal distributions. Biometrika, 59(1), 169-173.

Examples

Mauchly test for sphericity using a wide-format dataframe

>>> import pandas as pd
>>> import pingouin as pg
>>> data = pd.DataFrame({'A': [2.2, 3.1, 4.3, 4.1, 7.2],
...                      'B': [1.1, 2.5, 4.1, 5.2, 6.4],
...                      'C': [8.2, 4.5, 3.4, 6.2, 7.2]})
>>> spher, W, chisq, dof, pval = pg.sphericity(data)
>>> print(spher, round(W, 3), round(chisq, 3), dof, round(pval, 3))
True 0.21 4.677 2 0.096

John, Nagao and Sugiura (JNS) test

>>> round(pg.sphericity(data, method='jns')[-1], 3)  # P-value only
0.046

Now using a long-format dataframe

>>> data = pg.read_dataset('rm_anova2')
>>> data.head()
   Subject Time   Metric  Performance
0        1  Pre  Product           13
1        2  Pre  Product           12
2        3  Pre  Product           17
3        4  Pre  Product           12
4        5  Pre  Product           19

Let’s first test sphericity for the Time within-subject factor

>>> pg.sphericity(data, dv='Performance', subject='Subject',
...            within='Time')
(True, nan, nan, 1, 1.0)

Since Time has only two levels (Pre and Post), the sphericity assumption is necessarily met.

The Metric factor, however, has three levels:

>>> round(pg.sphericity(data, dv='Performance', subject='Subject',
...                     within=['Metric'])[-1], 3)
0.878

The p-value value is very large, and the test therefore indicates that there is no violation of sphericity.

Now, let’s calculate the epsilon for the interaction between the two repeated measures factor. The current implementation in Pingouin only works if at least one of the two within-subject factors has no more than two levels.

>>> spher, _, chisq, dof, pval = pg.sphericity(data, dv='Performance',
...                                            subject='Subject',
...                                            within=['Time', 'Metric'])
>>> print(spher, round(chisq, 3), dof, round(pval, 3))
True 3.763 2 0.152

Here again, there is no violation of sphericity acccording to Mauchly’s test.

Alternatively, we could use a wide-format dataframe with two column levels:

>>> # Pivot from long-format to wide-format
>>> piv = data.pivot(index='Subject', columns=['Time', 'Metric'], values='Performance')
>>> piv.head()
Time        Pre                  Post
Metric  Product Client Action Product Client Action
Subject
1            13     12     17      18     30     34
2            12     19     18       6     18     30
3            17     19     24      21     31     32
4            12     25     25      18     39     40
5            19     27     19      18     28     27

>>> spher, _, chisq, dof, pval = pg.sphericity(piv)
>>> print(spher, round(chisq, 3), dof, round(pval, 3))
True 3.763 2 0.152

which gives the same output as the long-format dataframe.