pingouin.epsilon#
- pingouin.epsilon(data, dv=None, within=None, subject=None, correction='gg')[source]#
Epsilon adjustement factor for repeated measures.
- Parameters:
- data
pandas.DataFrame
DataFrame containing the repeated measurements. Both wide and long-format dataframe are supported for this function. To test for an interaction term between two repeated measures factors with a wide-format dataframe,
data
must have a two-levelspandas.MultiIndex
columns.- dvstring
Name of column containing the dependent variable (only required if
data
is in long format).- withinstring
Name of column containing the within factor (only required if
data
is in long format). Ifwithin
is a list with two strings, this function computes the epsilon factor for the interaction between the two within-subject factor.- subjectstring
Name of column containing the subject identifier (only required if
data
is in long format).- correctionstring
Specify the epsilon version:
'gg'
: Greenhouse-Geisser'hf'
: Huynh-Feldt'lb'
: Lower bound
- data
- Returns:
- epsfloat
Epsilon adjustement factor.
See also
sphericity
Mauchly and JNS test for sphericity.
homoscedasticity
Test equality of variance.
Notes
The lower bound epsilon is:
\[lb = \frac{1}{\text{dof}},\]where the degrees of freedom \(\text{dof}\) is the number of groups \(k\) minus 1 for one-way design and \((k_1 - 1)(k_2 - 1)\) for two-way design
The Greenhouse-Geisser epsilon is given by:
\[\epsilon_{GG} = \frac{k^2(\overline{\text{diag}(S)} - \overline{S})^2}{(k-1)(\sum_{i=1}^{k}\sum_{j=1}^{k}s_{ij}^2 - 2k\sum_{j=1}^{k}\overline{s_i}^2 + k^2\overline{S}^2)}\]where \(S\) is the covariance matrix, \(\overline{S}\) the grandmean of S and \(\overline{\text{diag}(S)}\) the mean of all the elements on the diagonal of S (i.e. mean of the variances).
The Huynh-Feldt epsilon is given by:
\[\epsilon_{HF} = \frac{n(k-1)\epsilon_{GG}-2}{(k-1) (n-1-(k-1)\epsilon_{GG})}\]where \(n\) is the number of observations.
Missing values are automatically removed from data (listwise deletion).
Examples
Using a wide-format dataframe
>>> import pandas as pd >>> import pingouin as pg >>> data = pd.DataFrame({'A': [2.2, 3.1, 4.3, 4.1, 7.2], ... 'B': [1.1, 2.5, 4.1, 5.2, 6.4], ... 'C': [8.2, 4.5, 3.4, 6.2, 7.2]}) >>> gg = pg.epsilon(data, correction='gg') >>> hf = pg.epsilon(data, correction='hf') >>> lb = pg.epsilon(data, correction='lb') >>> print("%.2f %.2f %.2f" % (lb, gg, hf)) 0.50 0.56 0.62
Now using a long-format dataframe
>>> data = pg.read_dataset('rm_anova2') >>> data.head() Subject Time Metric Performance 0 1 Pre Product 13 1 2 Pre Product 12 2 3 Pre Product 17 3 4 Pre Product 12 4 5 Pre Product 19
Let’s first calculate the epsilon of the Time within-subject factor
>>> pg.epsilon(data, dv='Performance', subject='Subject', ... within='Time') 1.0
Since Time has only two levels (Pre and Post), the sphericity assumption is necessarily met, and therefore the epsilon adjustement factor is 1.
The Metric factor, however, has three levels:
>>> round(pg.epsilon(data, dv='Performance', subject='Subject', ... within=['Metric']), 3) 0.969
The epsilon value is very close to 1, meaning that there is no major violation of sphericity.
Now, let’s calculate the epsilon for the interaction between the two repeated measures factor:
>>> round(pg.epsilon(data, dv='Performance', subject='Subject', ... within=['Time', 'Metric']), 3) 0.727
Alternatively, we could use a wide-format dataframe with two column levels:
>>> # Pivot from long-format to wide-format >>> piv = data.pivot(index='Subject', columns=['Time', 'Metric'], values='Performance') >>> piv.head() Time Pre Post Metric Product Client Action Product Client Action Subject 1 13 12 17 18 30 34 2 12 19 18 6 18 30 3 17 19 24 21 31 32 4 12 25 25 18 39 40 5 19 27 19 18 28 27
>>> round(pg.epsilon(piv), 3) 0.727
which gives the same epsilon value as the long-format dataframe.