pingouin.epsilon#

pingouin.epsilon(data, dv=None, within=None, subject=None, correction='gg')[source]#

Epsilon adjustement factor for repeated measures.

Parameters:

datapandas.DataFrame

DataFrame containing the repeated measurements. Both wide and long-format dataframe are supported for this function. To test for an interaction term between two repeated measures factors with a wide-format dataframe, data must have a two-levels pandas.MultiIndex columns.

dvstring

Name of column containing the dependent variable (only required if data is in long format).

withinstring

Name of column containing the within factor (only required if data is in long format). If within is a list with two strings, this function computes the epsilon factor for the interaction between the two within-subject factor.

subjectstring

Name of column containing the subject identifier (only required if data is in long format).

correctionstring

Specify the epsilon version:

'gg': Greenhouse-Geisser
'hf': Huynh-Feldt
'lb': Lower bound

Returns:

epsfloat: Epsilon adjustement factor.

See also

sphericity: Mauchly and JNS test for sphericity.
homoscedasticity: Test equality of variance.

Notes

The lower bound epsilon is:

\[lb = \frac{1}{\text{dof}},\]

where the degrees of freedom \(\text{dof}\) is the number of groups \(k\) minus 1 for one-way design and \((k_1 - 1)(k_2 - 1)\) for two-way design

The Greenhouse-Geisser epsilon is given by:

\[\epsilon_{GG} = \frac{k^2(\overline{\text{diag}(S)} - \overline{S})^2}{(k-1)(\sum_{i=1}^{k}\sum_{j=1}^{k}s_{ij}^2 - 2k\sum_{j=1}^{k}\overline{s_i}^2 + k^2\overline{S}^2)}\]

where \(S\) is the covariance matrix, \(\overline{S}\) the grandmean of S and \(\overline{\text{diag}(S)}\) the mean of all the elements on the diagonal of S (i.e. mean of the variances).

The Huynh-Feldt epsilon is given by:

\[\epsilon_{HF} = \frac{n(k-1)\epsilon_{GG}-2}{(k-1) (n-1-(k-1)\epsilon_{GG})}\]

where \(n\) is the number of observations.

Missing values are automatically removed from data (listwise deletion).

Examples

Using a wide-format dataframe

>>> import pandas as pd
>>> import pingouin as pg
>>> data = pd.DataFrame({'A': [2.2, 3.1, 4.3, 4.1, 7.2],
...                      'B': [1.1, 2.5, 4.1, 5.2, 6.4],
...                      'C': [8.2, 4.5, 3.4, 6.2, 7.2]})
>>> gg = pg.epsilon(data, correction='gg')
>>> hf = pg.epsilon(data, correction='hf')
>>> lb = pg.epsilon(data, correction='lb')
>>> print("%.2f %.2f %.2f" % (lb, gg, hf))
0.50 0.56 0.62

Now using a long-format dataframe

>>> data = pg.read_dataset('rm_anova2')
>>> data.head()
   Subject Time   Metric  Performance
0        1  Pre  Product           13
1        2  Pre  Product           12
2        3  Pre  Product           17
3        4  Pre  Product           12
4        5  Pre  Product           19

Let’s first calculate the epsilon of the Time within-subject factor

>>> pg.epsilon(data, dv='Performance', subject='Subject',
...            within='Time')
1.0

Since Time has only two levels (Pre and Post), the sphericity assumption is necessarily met, and therefore the epsilon adjustement factor is 1.

The Metric factor, however, has three levels:

>>> round(pg.epsilon(data, dv='Performance', subject='Subject',
...                  within=['Metric']), 3)
0.969

The epsilon value is very close to 1, meaning that there is no major violation of sphericity.

Now, let’s calculate the epsilon for the interaction between the two repeated measures factor:

>>> round(pg.epsilon(data, dv='Performance', subject='Subject',
...                  within=['Time', 'Metric']), 3)
0.727

Alternatively, we could use a wide-format dataframe with two column levels:

>>> # Pivot from long-format to wide-format
>>> piv = data.pivot(index='Subject', columns=['Time', 'Metric'], values='Performance')
>>> piv.head()
Time        Pre                  Post
Metric  Product Client Action Product Client Action
Subject
1            13     12     17      18     30     34
2            12     19     18       6     18     30
3            17     19     24      21     31     32
4            12     25     25      18     39     40
5            19     27     19      18     28     27

>>> round(pg.epsilon(piv), 3)
0.727

which gives the same epsilon value as the long-format dataframe.