pingouin.sphericity#
- pingouin.sphericity(data, dv=None, within=None, subject=None, method='mauchly', alpha=0.05)[source]#
Mauchly and JNS test for sphericity.
- Parameters:
- data
pandas.DataFrame
DataFrame containing the repeated measurements. Both wide and long-format dataframe are supported for this function. To test for an interaction term between two repeated measures factors with a wide-format dataframe,
data
must have a two-levelspandas.MultiIndex
columns.- dvstring
Name of column containing the dependent variable (only required if
data
is in long format).- withinstring
Name of column containing the within factor (only required if
data
is in long format). Ifwithin
is a list with two strings, this function computes the epsilon factor for the interaction between the two within-subject factor.- subjectstring
Name of column containing the subject identifier (only required if
data
is in long format).- methodstr
Method to compute sphericity:
‘jns’: John, Nagao and Sugiura test.
‘mauchly’: Mauchly test (default).
- alphafloat
Significance level
- data
- Returns:
- spherboolean
True if data have the sphericity property.
- Wfloat
Test statistic.
- chi2float
Chi-square statistic.
- dofint
Degrees of freedom.
- pvalfloat
P-value.
- Raises:
- ValueError
When testing for an interaction, if both within-subject factors have more than 2 levels (not yet supported in Pingouin).
See also
epsilon
Epsilon adjustement factor for repeated measures.
homoscedasticity
Test equality of variance.
normality
Univariate normality test.
Notes
The Mauchly \(W\) statistic [1] is defined by:
\[W = \frac{\prod \lambda_j}{(\frac{1}{k-1} \sum \lambda_j)^{k-1}}\]where \(\lambda_j\) are the eigenvalues of the population covariance matrix (= double-centered sample covariance matrix) and \(k\) is the number of conditions.
From then, the \(W\) statistic is transformed into a chi-square score using the number of observations per condition \(n\)
\[f = \frac{2(k-1)^2+k+1}{6(k-1)(n-1)}\]\[\chi_w^2 = (f-1)(n-1) \text{log}(W)\]The p-value is then approximated using a chi-square distribution:
\[\chi_w^2 \sim \chi^2(\frac{k(k-1)}{2}-1)\]The JNS \(V\) statistic ([2], [3], [4]) is defined by:
\[V = \frac{(\sum_j^{k-1} \lambda_j)^2}{\sum_j^{k-1} \lambda_j^2}\]\[\chi_v^2 = \frac{n}{2} (k-1)^2 (V - \frac{1}{k-1})\]and the p-value approximated using a chi-square distribution
\[\chi_v^2 \sim \chi^2(\frac{k(k-1)}{2}-1)\]Missing values are automatically removed from
data
(listwise deletion).References
[1]Mauchly, J. W. (1940). Significance test for sphericity of a normal n-variate distribution. The Annals of Mathematical Statistics, 11(2), 204-209.
[2]Nagao, H. (1973). On some test criteria for covariance matrix. The Annals of Statistics, 700-709.
[3]Sugiura, N. (1972). Locally best invariant test for sphericity and the limiting distributions. The Annals of Mathematical Statistics, 1312-1316.
[4]John, S. (1972). The distribution of a statistic used for testing sphericity of normal distributions. Biometrika, 59(1), 169-173.
See also http://www.real-statistics.com/anova-repeated-measures/sphericity/
Examples
Mauchly test for sphericity using a wide-format dataframe
>>> import pandas as pd >>> import pingouin as pg >>> data = pd.DataFrame({'A': [2.2, 3.1, 4.3, 4.1, 7.2], ... 'B': [1.1, 2.5, 4.1, 5.2, 6.4], ... 'C': [8.2, 4.5, 3.4, 6.2, 7.2]}) >>> spher, W, chisq, dof, pval = pg.sphericity(data) >>> print(spher, round(W, 3), round(chisq, 3), dof, round(pval, 3)) True 0.21 4.677 2 0.096
John, Nagao and Sugiura (JNS) test
>>> round(pg.sphericity(data, method='jns')[-1], 3) # P-value only 0.046
Now using a long-format dataframe
>>> data = pg.read_dataset('rm_anova2') >>> data.head() Subject Time Metric Performance 0 1 Pre Product 13 1 2 Pre Product 12 2 3 Pre Product 17 3 4 Pre Product 12 4 5 Pre Product 19
Let’s first test sphericity for the Time within-subject factor
>>> pg.sphericity(data, dv='Performance', subject='Subject', ... within='Time') (True, nan, nan, 1, 1.0)
Since Time has only two levels (Pre and Post), the sphericity assumption is necessarily met.
The Metric factor, however, has three levels:
>>> round(pg.sphericity(data, dv='Performance', subject='Subject', ... within=['Metric'])[-1], 3) 0.878
The p-value value is very large, and the test therefore indicates that there is no violation of sphericity.
Now, let’s calculate the epsilon for the interaction between the two repeated measures factor. The current implementation in Pingouin only works if at least one of the two within-subject factors has no more than two levels.
>>> spher, _, chisq, dof, pval = pg.sphericity(data, dv='Performance', ... subject='Subject', ... within=['Time', 'Metric']) >>> print(spher, round(chisq, 3), dof, round(pval, 3)) True 3.763 2 0.152
Here again, there is no violation of sphericity acccording to Mauchly’s test.
Alternatively, we could use a wide-format dataframe with two column levels:
>>> # Pivot from long-format to wide-format >>> piv = data.pivot(index='Subject', columns=['Time', 'Metric'], values='Performance') >>> piv.head() Time Pre Post Metric Product Client Action Product Client Action Subject 1 13 12 17 18 30 34 2 12 19 18 6 18 30 3 17 19 24 21 31 32 4 12 25 25 18 39 40 5 19 27 19 18 28 27
>>> spher, _, chisq, dof, pval = pg.sphericity(piv) >>> print(spher, round(chisq, 3), dof, round(pval, 3)) True 3.763 2 0.152
which gives the same output as the long-format dataframe.