What’s new#
v0.5.5 (September 2024)#
This is a minor release with several bugfixes, and major updates to the internal structure and sphinx documentation.
See the full changelog for 0.5.5.
v0.5.4 (January 2024)#
This is a minor release with several bugfixes and no new features. The new version is tested for Python 3.8-3.11 (but should also work with Python 3.12). See the full changelog for 0.5.4.
This release requires pandas≥1.5. We recommend scipy≥1.11.0.
v0.5.3 (December 2022)#
Bugfixes
Fixed a bug where the boolean value returned by
pingouin.anderson()
was inverted. It returned True when the data was NOT coming from the tested distribution, and vice versa. PR 308.Fixed misleading documentation and
input_type
in thepingouin.convert_effsize()
function. When converting from a Cohen’s d effect size to a correlation coefficient, the resulting correlation is not a Pearson correlation but instead a point-biserial correlation. To avoid any confusion,input_type='r'
has been deprecated and replaced withinput_type='pointbiserialr'
. For more details, see issue 302.
New function
We have added the pingouin.ptests()
function to calculate a T-test (T- and p-values) between all pairs of columns in a given dataframe. This is the T-test equivalent of pingouin.rcorr()
. It can only be used as a pandas.DataFrame
method, not as a standalone function. The output is a square dataframe with the T-values on the lower triangle and the p-values on the upper triangle.
>>> import pingouin as pg
>>> df = pg.read_dataset('pairwise_corr').iloc[:30, 1:]
>>> df.columns = ["N", "E", "O", "A", "C"]
>>> df.ptests()
N E O A C
N - *** *** *** ***
E -8.397 - ***
O -8.585 -0.483 - ***
A -9.026 0.278 0.786 - ***
C -4.759 3.753 4.128 3.802 -
Improvements
Effect sizes are now calculated using an exact method instead of an approximation based on T-values in
pingouin.pairwise_tukey()
andpingouin.pairwise_gameshowell()
. PR 328.pingouin.normality()
does not raise an AssertionError anymore if one of the groups ingroup
has ≤ 3 samples. PR 324.Added customization options to
pingouin.plot_rm_corr()
, which now takes optional keyword arguments to pass through toseaborn.regplot()
andseaborn.scatterplot()
. PR 312.Changed some plotting functions to increase compatibility with
seaborn.FacetGrid
. As explained in issue 306, the major change is to generate matplotlib.axes using default parameters instead of acceptingfig
anddpi
keyword arguments. This change applies topingouin.plot_blandaltman()
,pingouin.plot_paired()
,pingouin.plot_circmean()
, andpingouin.qqplot()
. In the future, open a matplotlib.axes and pass it through using theax
parameter to use custom figure settings with these functions. Other minor changes include the addition of thesquare
keyword argument topingouin.plot_circmean()
andpingouin.qqplot()
to ensure equal aspect ratios, and the removal ofscatter_kws
as a keyword argument inpingouin.plot_blandaltmann()
(now alter the scatter parameters using general**kwargs
). PR 314.
v0.5.2 (June 2022)#
Bugfixes
The eta-squared (
n2
) effect size was not properly calculated in one-way and two-way repeated measures ANOVAs. Specifically, Pingouin followed the same behavior as JASP, i.e. the eta-squared was the same as the partial eta-squared. However, as explained in issue 251, this behavior is not valid. In one-way ANOVA design, the eta-squared should be equal to the generalized eta-squared. Note that, as of March 2022, this bug is also present in JASP. We have therefore updated the unit tests to use JAMOVI instead.
Warning
Please double check any effect sizes previously obtained with the pingouin.rm_anova()
function.
Fixed invalid resampling behavior for bivariate functions in
pingouin.compute_bootci()
when x and y were not paired. PR 281.Fixed bug where
confidence
(previouslyci
) was ignored when calculating the bootstrapped confidence intervals inpingouin.plot_shift()
. PR 282.
Enhancements
The
pingouin.pairwise_ttests()
has been renamed topingouin.pairwise_tests()
. Non-parametric tests are also supported in this function with the parametric=False argument, and thus the name “ttests” was misleading (see issue 209).Allow
pingouin.bayesfactor_binom()
to take Beta alternative model. PR 252.Allow keyword arguments for logistic regression in
pingouin.mediation_analysis()
. PR 245.Speed improvements for the Holm and FDR correction in
pingouin.multicomp()
. PR 271.Speed improvements univariate functions in
pingouin.compute_bootci()
(e.g.func="mean"
is now vectorized).Rename
eta
toeta_squared
inpingouin.power_anova()
andpingouin.power_rm_anova()
to avoid any confusion. PR 280.Use black code formatting.
Add support for DataMatrix objects. PR 286.
Dependencies
Force scikit-learn<1.1.0 to avoid bug in
pingouin.logistic_regression()
. PR 272.
v0.5.1 (February 2022)#
This is a minor release, with several bugfixes and improvements. This release is compatible with SciPy 1.8 and Pandas 1.4.
Bugfixes
Added support for SciPy 1.8 and Pandas 1.4. PR 234.
Fixed bug where
pingouin.rm_anova()
andpingouin.mixed_anova()
changed the dtypes of categorical columns in-place (issue 224).
Enhancements
Faster implementation of
pingouin.gzscore()
, adding all options available in zscore: axis, ddof and nan_policy. Warning: this functions is deprecated and will be removed in pingouin 0.7.0 (usescipy.stats.gzscore()
instead). PR 210.Replace use of statsmodels’ studentized range distribution functions with more SciPy’s more accurate
scipy.stats.studentized_range()
. PR 229.Add support for optional keywords argument in the
pingouin.homoscedasticity()
function (issue 218).Add support for the Jarque-Bera test in
pingouin.normality()
(issue 216).
Lastly, we have also deprecated the Gitter forum in favor of GitHub Discussions. Please use Discussions to ask questions, share ideas / tips and engage with the Pingouin community!
v0.5.0 (October 2021)#
This is a MAJOR RELEASE with several important bugfixes. We recommend all users to upgrade to this new version.
BUGFIX - Repeated measurements
This release fixes several critical issues related to how Pingouin handles missing values in repeated measurements. The following functions have been corrected:
pingouin.pairwise_ttests()
, only for mixed design or two-way repeated measures design.
A full description of the issue, with code and example, can be found at: raphaelvallat/pingouin#206. In short, in Pingouin <0.5.0, listwise deletion of subjects (or rows) with missing values was not strictly enforced in repeated measures or mixed ANOVA, depending on the input data format (if missing values were explicit or implicit). Pingouin 0.5.0 now uses a stricter complete-case analysis regardless of the input data format, which is the same behavior as JASP.
Furthermore, the pingouin.remove_rm_na()
has been deprecated. Instead, listwise deletion of rows with missing values in repeated measurements is now performed using:
>>> data_piv = data.pivot_table(index=subject, columns=within, values=dv)
>>> data_piv = data_piv.dropna() # Listwise deletion
>>> data = data_piv.melt(ignore_index=False, value_name=dv).reset_index()
BUGFIX - Strict listwise deletion in pairwise_ttests when repeated measures are present
This is related to the previous issue. In mixed design, listwise deletion (complete-case analysis) was not strictly enforced in pingouin.pairwise_ttests()
for the between-subject and interaction T-tests. In other words, the between-subject and interaction T-tests were calculated using a pairwise-deletion approach, even with nan_policy="pairwise"
.
The same issue occured in two-way repeated measures design, in which no strict listwise deletion was performed prior to calculating the T-tests, even with nan_policy="pairwise"
.
This has now been fixed such that Pingouin will always perform a strict listwise deletion whenever repeated measurements are present when nan_policy="listwise"
(default). This complete-case analysis behavior can be disabled with nan_policy="pairwise"
, in which case missing values will be removed separately for each contrast. This may not be appropriate for post-hoc analysis following a repeated measures or mixed ANOVA, which is always conducted on complete-case data.
BUGFIX - Homoscedasticity
The pingouin.homoscedasticity()
gave WRONG results for wide-format dataframes because the test was incorrectly calculated on the transposed data. See issue 204.
Enhancements
Partial correlation functions (
pingouin.pcorr()
andpingouin.partial_corr()
) now usenumpy.linalg.pinv()
with hermitian=True, which improves numerical stability. See issue 198.Added support for integer column names in most functions. Previously, Pingouin raised an error if the column names were integers. See issue 201.
pingouin.pairwise_corr()
now works when the column names of the dataframe are integer, and better support numpy.arrays in thecolumns
argument.Added support for wide-format dataframe in
pingouin.friedman()
andpingouin.cochran()
v0.4.0 (August 2021)#
Major upgrade of the dependencies. This release requires Python 3.7+, SciPy 1.7+, NumPy 1.19+ and Pandas 1.0+. Pingouin uses the alternative
argument that has been added to several statistical functions of Scipy 1.7+ (see below). However, SciPy 1.7+ requires Python 3.7+. We recommend all users to upgrade to the latest version of Pingouin.
Major enhancements#
Directional testing
The tail
argument has been renamed to alternative
in all Pingouin functions to be consistent with SciPy and R (#185). Furthermore, "alternative='one-sided'"
has now been deprecated. Instead, alternative
must be one of “two-sided” (default), “greater” or “less”. Again, this is the same behavior as SciPy and R.
Added support for directional testing with "alternative='greater'"
and "alternative='less'"
in pingouin.corr()
(#176). As a result, the p-value, confidence intervals and power of the correlation will change depending on the directionality of the test. Support for directional testing has also been added to pingouin.power_corr()
and pingouin.compute_esci()
.
Finally, the tail
argument has been removed from pingouin.rm_corr()
, pingouin.circ_corrcc()
and pingouin.circ_corrcl()
to be consistent with the original R / Matlab implementations.
Partial correlation
Major refactoring of pingouin.partial_corr()
, which now uses the same method as the R ppcor package, i.e. based on the inverse covariance matrix rather than the residuals of a linear regression. This new approach is faster and works better in some cases (such as Spearman partial correlation with binary variables, see issue 147).
One caveat is that only the Pearson and Spearman correlation methods are now supported in partial/semi-partial correlation.
Box M test
Added the pingouin.box_m()
function to calculate Box’s M test for equality of covariance matrices (#175).
Minor enhancements#
pingouin.wilcoxon()
now supports a pre-computed array of differences, similar toscipy.stats.wilcoxon()
(issue 186).pingouin.mwu()
andpingouin.wilcoxon()
now support keywords arguments that are passed to the lower-level scipy functions.Added warning in
pingouin.partial_corr()
withmethod="skipped"
: the MCD algorithm does not give the same output in Python (scikit-learn) than in the original Matlab library (LIBRA), and this can lead to skipped correlations that are different in Pingouin than in the Matlab robust correlation toolbox (see issue 164).pingouin.ancova()
always uses statsmodels, regardless of the number of covariates. This fixes LinAlg errors inpingouin.ancova()
andpingouin.rm_corr()
(see issue 184).Avoid RuntimeWarning when calculating CI and power of a perfect correlation in
pingouin.corr()
(see issue 183).Use
scipy.linalg.lstsq()
instead ofnumpy.linalg.lstsq()
whenever possible to better check for NaN and Inf in input (see issue 184).flake8 requirements for max line length has been changed from 80 to 100 characters.
v0.3.12 (May 2021)#
Bugfixes
This release fixes a critical error in pingouin.partial_corr()
: the number of covariates was not taken into account when calculating the degrees of freedom of the partial correlation, thus leading to incorrect results (except for the correlation coefficient which remained unaffected). For more details, please see issue 171.
In addition to fixing the p-values and 95% confidence intervals, the statistical power and Bayes Factor have been removed from the output of pingouin.partial_corr()
, at least temporary until we can make sure that these give exact results.
We have also fixed a minor bug in the robust skipped and shepherd correlation (see pingouin.corr()
), for which the calculation of the confidence intervals and statistical power did not take into account the number of outliers. These are now calculated only on the cleaned data.
Warning
We therefore strongly recommend that all users UPDATE Pingouin (pip install -U pingouin
) and CHECK ANY RESULTS obtained with the pingouin.partial_corr()
function.
Enhancements
Major refactoring of
pingouin.plot_blandaltman()
, which now has many additional parameters. It also uses a T distribution instead of a normal distribution to estimate the 95% confidence intervals of the mean difference and agreement limits. See issue 167.For clarity, the z, r2 and adj_r2 have been removed from the output of
pingouin.corr()
andpingouin.pairwise_corr()
, as these can be readily calculated from the correlation coefficient.Better testing against R for
pingouin.partial_corr()
andpingouin.corr()
.
v0.3.11 (April 2021)#
Bugfixes
Fix invalid computation of the robust skipped correlation in
pingouin.corr()
(see issue 164).Passing a wrong
tail
argument topingouin.corr()
now always raises an error (see PR 160). In previous versions of pingouin, using anymethod
other than"pearson"
and a wrongtail
argument such as"two-tailed"
or"both"
(instead of the correct"two-sided"
) may have resulted in silently returning a one-sided p-value.Reverted changes made in
pingouin.pairwise_corr()
which led to Pingouin calculating the correlations between the DV columns and the covariates, thus artificially increasing the number of pairwise comparisons (see issue 162).
v0.3.10 (February 2021)#
Bugfix
This release fixes an error in the calculation of the p-values in the pingouin.pairwise_tukey()
and pingouin.pairwise_gameshowell()
functions (see PR156). Old versions of Pingouin used an incorrect algorithm for the studentized range approximation, which resulted in (slightly) incorrect p-values. In most cases, the error did not seem to affect the significance of the p-values. The new version of Pingouin now uses statsmodels internal implementation of the Gleason (1999) algorithm to estimate the p-values.
Please note that the Pingouin p-values may be slightly different than R (and JASP), because it uses a different algorithm. However, this does not seem to affect the significance levels of the p-values (i.e. a p-value below 0.05 in JASP is likely to be below 0.05 in Pingouin, and vice versa).
We therefore recommend that all users UPDATE Pingouin (pip install -U pingouin
) and CHECK ANY RESULTS obtained with the pingouin.pairwise_tukey()
and pingouin.pairwise_gameshowell()
functions.
v0.3.9 (January 2021)#
Bugfix
This release fixes a CRITICAL ERROR in the pingouin.pairwise_ttests()
function (see issue 151). The bug concerns one-way and two-way repeated measures pairwise T-tests. Until now, Pingouin implicitly assumed that the dataframe was sorted such that the ordering of the subject was the same across all repeated measurements (e.g. the third values in the repeated measurements always belonged to the same subject).
This led to incorrect results when the dataframe was not sorted in such a way.
We therefore strongly recommend that all users UPDATE Pingouin (pip install -U pingouin
) and CHECK ANY RESULTS obtained with the pingouin.pairwise_ttests()
function. Note that the bug does not concern non-repeated measures pairwise T-test, since the ordering of the values does not matter in this case.
Furthermore, and to prevent a similar issue, we have now disabled marginal=False
in two-way repeated measure design. As of this release, marginal=False
will therefore only have an impact on the between-factor T-test(s) of a mixed design.
Deprecation
a. Removed the Glass delta effect size. Until now, Pingouin invalidly assumed that the control group was always the one with the lowest standard deviation. Since this cannot be verified, and to avoid any confusion, the Glass delta effect size has been completely removed from Pingouin. See issue 139.
Enhancements
pingouin.plot_paired()
now supports an arbitrary number of within-levels as well as horizontal plotting. See PR 133.pingouin.linear_regression()
now handles a rank deficient design matrix X by producing a warning and trying to calculate the sum of squared residuals without relying onnp.linalg.lstsq()
. See issue 130.pingouin.friedman()
now has an option to choose between Chi square test or F test method.Several minor improvements to the documentation and GitHub Actions. See PR150.
Added support for
kwargs
inpingouin.corr()
(see issue 138).Added
confidence
argument inpingouin.ttest()
to allow for custom CI (see issue 152).
v0.3.8 (September 2020)#
Bugfixes
Fix a bug in in
pingouin.ttest()
in which the confidence intervals for one-sample T-test with y != 0 were invalid (e.g.pg.ttest(x=[4, 6, 7, 4], y=4)
). See issue 119.
New features
Added a pingouin.options module which can be used to set default options. For example, one can set the default decimal rounding of the output dataframe, either for the entire dataframe, per column, per row, or per cell. See PR120. For more details, please refer to notebooks/06_others.ipynb.
import pingouin as pg pg.options['round'] = None # Default: no rounding pg.options['round'] = 4 pg.options['round.column.CI95%'] = 2 pg.options['round.row.T-test'] = 2 pg.options['round.cell.[T-test]x[CI95%]'] = 2
Enhancements
pingouin.linear_regression()
now returns the processed X and y variables (Xw and yw for WLS) and the predicted values ifas_dataframe=False
. See issue 112.The Common Language Effect Size (CLES) in
pingouin.mwu()
is now calculated using the formula given by Vargha and Delaney 2000, which works better when ties are present in data. This is consistent with thepingouin.wilcoxon()
andpingouin.compute_effsize()
functions. See issue 114.Better handling of kwargs arguments in
pingouin.plot_paired()
(see PR 116).Added
boxplot_in_front
argument to thepingouin.plot_paired()
. When set to True, the boxplot is displayed in front of the lines with a slight transparency. This can make the overall plot more readable when plotting data from a large number of subjects. (see PR 117).Better handling of Categorical columns in several functions (e.g. ANOVA). See issue 122.
multivariate_normality()
now also returns the test statistic. This function also comes with better unit testing against the MVN R package.pingouin.pairwise_corr()
can now control for all covariates by excluding each specific set of column-combinations from the covariates to use for this combination, similar topingouin.pcorr()
. See PR 124.Bayes factor formatting is now handled via the options module. The default behaviour is unchanged (return as formatted string), but can easily be disabled by setting pingouin.options[“round.column.BF10”] = None. See PR 126.
v0.3.7 (July 2020)#
Bugfixes
This hotfix release brings important changes to the pingouin.pairwise_tukey()
and pingouin.pairwise_gameshowell()
functions. These two functions had been implemented soon after Pingouin’s first release and were not as tested as more recent and widely-used functions. These two functions are now validated against JASP.
We strongly recommend that all users upgrade their version of Pingouin (pip install -U pingouin
).
Fixed a bug in
pingouin.pairwise_tukey()
andpingouin.pairwise_gameshowell()
in which the group labels (columns A and B) were incorrect when thebetween
column was encoded as apandas.Categorical
with non-alphabetical categories order. This was caused by a discrepancy in how Numpy and Pandas sorted the categories in thebetween
column. For more details, please refer to issue 111.Fixed a bug in
pingouin.pairwise_gameshowell()
in which the reported standard errors were slightly incorrect because of a typo in the code. However, the T-values and p-values were fortunately calculated using the correct standard errors, so this bug only impacted the values in these
column.Removed the
tail
andalpha
argument from the inpingouin.pairwise_tukey()
andpingouin.pairwise_gameshowell()
functions to be consistent with JASP. Note that thealpha
parameter did not have any impact. One-sided p-values were obtained by halving the two-sided p-values.
Error
Please check all previous code and results that called the pingouin.pairwise_tukey()
or pingouin.pairwise_gameshowell()
functions, especially if the between
column was encoded as a pandas.Categorical
.
Deprecation
We have now removed the
pingouin.plot_skipped_corr()
function, as we felt that it may not be useful or relevant to many users (see issue 105).
v0.3.6 (July 2020)#
Bugfixes
Changed the default scikit-learn solver in
pingouin.logistic_regression()
from ‘lbfgs’ to ‘newton-cg’ in order to get results that are always consistent with R or statsmodels. Previous version of Pingouin were based on the ‘lbfgs’ solver which internally applied a regularization of the intercept that may have led to different coefficients and p-values for the predictors of interest based on the scaling of these predictors (e.g very small or very large values). The new ‘newton-cg’ solver is scaling-independent, i.e. no regularization is applied to the intercept and p-values are therefore unchanged with different scaling of the data. If you prefer to keep the old behavior, just use:pingouin.logistic_regression(..., solver='lbfgs')
.Fixed invalid results in
pingouin.logistic_regression()
whenfit_intercept=False
was passed as a keyword argument to scikit-learn. The standard errors and p-values were still calculated by taking into account an intercept in the model.
Warning
We highly recommend double-checking all previous code and results that called the pingouin.logistic_regression()
function, especially if it involved non-standardized predictors and/or custom keywords arguments passed to scikit-learn.
Enhancements
Added
within_first
boolean argument topingouin.pairwise_ttests()
. This is useful in mixed design when one want to change the order of the interaction. The default behavior of Pingouin is to return the within * between pairwise tests for the interaction. Usingwithin_first=False
, one can now return the between * within pairwise tests. For more details, see issue 102 on GitHub.pingouin.list_dataset()
now returns a dataframe instead of simply printing the output.Added the Palmer Station LTER Penguin dataset, which describes the flipper length and body mass for different species of penguins. It can be loaded with
pingouin.read_dataset('penguins')
.Added the Tips dataset. It can be loaded with
pingouin.read_dataset('tips')
.
v0.3.5 (June 2020)#
Enhancements
Added support for weighted linear regression in
pingouin.linear_regression()
. Users can now pass sample weights using theweights
argument (similar tolm(..., weights)
in R andLinearRegression.fit(X, y, sample_weight)
in scikit-learn).The \(R^2\) in
pingouin.linear_regression()
is now calculated in a similar manner as statsmodels and R, which give different results assklearn.metrics.r2_score()
when, and only when, no constant term (= intercept) is present in the predictor matrix. In that case, scikit-learn (and previous versions of Pingouin) uses the standard \(R^2\) formula, which assumes a reference model that only includes an intercept:\[R^2 = 1 - \frac{\sum_i (y_i - \hat y_i)^2}{\sum_i (y_i - \bar y)^2}\]However, statsmodels, R, and newer versions of Pingouin use a modified formula, which uses a reference model corresponding to noise only (i.e. no intercept, as explained in this post):
\[R_0^2 = 1 - \frac{\sum_i (y_i - \hat y_i)^2}{\sum_i y_i^2}\]Note that this only affects the (rare) cases when no intercept is present in the predictor matrix. Remember that Pingouin automatically add a constant term in
pingouin.linear_regression()
, a behavior that can be disabled usingadd_intercept=False
.Added support for robust biweight midcorrelation (
'bicor'
) inpingouin.corr()
andpingouin.pairwise_corr()
.The Common Language Effect Size (CLES) is now calculated using the formula given by Vargha and Delaney 2000, which works better when ties are present in data.
\[\text{CL} = P(X > Y) + .5 \times P(X = Y)\]This applies to the
pingouin.wilcoxon()
andpingouin.compute_effsize()
functions. Furthermore, the CLES is now tail-sensitive in the former, but not in the latter since tail is not a valid argument. Inpingouin.compute_effsize()
, the CLES thus always corresponds to the proportion of pairs where x is higher than y. For more details, please refer to PR #94.Confidence intervals around a Cohen d effect size are now calculated using a central T distribution instead of a standard normal distribution in the
pingouin.compute_esci()
function. This is consistent with the effsize R package.
Code
Added support for unsigned integers in dtypes safety checks (see issue #93).
v0.3.4 (May 2020)#
Bugfixes
The Cohen \(d_{avg}\) for paired samples was previously calculated using eq. 10 in Lakens 2013. However, this equation was slightly different from the original proposed by Cumming 2012, and Lakens has since updated the equation in his effect size conversion spreadsheet. Pingouin now uses the correct formula, which is \(d_{avg} = \frac{\overline{X} - \overline{Y}}{\sqrt{\frac{(\sigma_1^2 + \sigma_2^2)}{2}}}\).
Fixed minor bug in internal function pingouin.utils._flatten_list that could lead to TypeError in
pingouin.pairwise_ttests()
with within/between factors encoded as integers (see issue #91).
New functions
Added
pingouin.convert_angles()
function to convert circular data in arbitrary units to radians (\([-\pi, \pi)\) range).
Enhancements
Better documentation and testing for descriptive circular statistics functions.
Added safety checks that
angles
is expressed in radians in circular statistics function.pingouin.circ_mean()
andpingouin.circ_r()
now perform calculations omitting missing values.Pingouin no longer changes the default matplotlib style to a Seaborn-default (see issue #85).
Disabled rounding of float in most Pingouin functions in order to reduce numerical imprecision. For more details, please refer to issue #87. Users can still round the output using the
pandas.DataFrame.round()
method, or changing the default precision of Pandas DataFrame with pandas.set_option.Disabled filling of missing values by
'-'
in some ANOVAs functions, which may have lead to dtypes issues.Added partial eta-squared (
np2
column) to the output ofpingouin.ancova()
andpingouin.welch_anova()
.Added the
effsize
option topingouin.anova()
andpingouin.ancova()
to return different effect sizes. Must be one of'np2'
(partial eta-squared, default) or'n2'
(eta-squared).Added the
effsize
option topingouin.rm_anova()
andpingouin.mixed_anova()
to return different effect sizes. Must be one of'np2'
(partial eta-squared, default),'n2'
(eta-squared) orng2
(generalized eta-squared).
Code and dependencies
Compatibility with Python 3.9 (see PR by tirkarthi).
To avoid any confusion, the
alpha
argument has been renamed toangles
in all circular statistics functions.Updated flake8 guidelines and added continuous integration for Python 3.8.
Added the tabulate package as dependency. The tabulate package is used by the
pingouin.print_table()
function as well as thepandas.DataFrame.to_markdown()
function.
v0.3.3 (February 2020)#
Bugfixes
Fixed a bug in
pingouin.pairwise_corr()
caused by the deprecation ofpandas.core.index
in the new version of Pandas (1.0). For now, both Pandas 0.25 and Pandas 1.0 are supported.The standard deviation in
pingouin.pairwise_ttests()
when usingreturn_desc=True
is now calculated withnp.nanstd(ddof=1)
to be consistent with Pingouin/Pandas default unbiased standard deviation.
New functions
Added
pingouin.plot_circmean()
function to plot the circular mean and circular vector length of a set of angles (in radians) on the unit circle.
v0.3.2 (January 2020)#
Hotfix release to fix a critical issue with pingouin.pairwise_ttests()
(see below). We strongly recommend that you update to the newest version of Pingouin and double-check your previous results if you’ve ever used the pairwise T-tests with more than one factor (e.g. mixed, factorial or 2-way repeated measures design).
Bugfixes
MAJOR: Fixed a bug in
pingouin.pairwise_ttests()
when using mixed or two-way repeated measures design. Specifically, the T-tests were performed without averaging over repeated measurements first (i.e. without calculating the marginal means). Note that for mixed design, this only impacts the between-subject T-test(s). Practically speaking, this led to higher degrees of freedom (because they were conflated with the number of repeated measurements) and ultimately incorrect T and p-values because the assumption of independence was violated. Pingouin now averages over repeated measurements in mixed and two-way repeated measures design, which is the same behavior as JASP or JAMOVI. As a consequence, and when the data has only two groups, the between-subject p-value of the pairwise T-test should be (almost) equal to the p-value of the same factor in thepingouin.mixed_anova()
function. The old behavior of Pingouin can still be obtained using themarginal=False
argument.Minor: Added a check in
pingouin.mixed_anova()
to ensure that thesubject
variable has a unique set of values for each between-subject group defined in thebetween
variable. For instance, the subject IDs for group1 are [1, 2, 3, 4, 5] and for group2 [6, 7, 8, 9, 10]. The function will throw an error if there are one or more overlapping subject IDs between groups (e.g. the subject IDs for group1 AND group2 are both [1, 2, 3, 4, 5]).Minor: Fixed a bug which caused the
pingouin.plot_rm_corr()
andpingouin.ancova()
(with >1 covariates) to throw an error if any of the input variables started with a number (because of statsmodels / Patsy formula formatting).
Enhancements
Upon loading, Pingouin will now use the outdated package to check and warn the user if a newer stable version is available.
Globally removed the
export_filename
parameter, which allowed to export the output table to a .csv file. This helps simplify the API and testing. As an alternative, one can simply use pandas.to_csv() to export the output dataframe generated by Pingouin.Added the
correction
argument topingouin.pairwise_ttests()
to enable or disable Welch’s correction for independent T-tests.
v0.3.1 (December 2019)#
Bugfixes
Fixed a bug in which missing values were removed from all columns in the dataframe in
pingouin.kruskal()
, even columns that were unrelated. See raphaelvallat/pingouin#74.The
pingouin.power_corr()
function now throws a warning and return a np.nan when the sample size is too low (and not an error like in previous version). This is to improve compatibility with thepingouin.pairwise_corr()
function.Fixed quantile direction in the
pingouin.plot_shift()
function. In v0.3.0, the quantile subplot was incorrectly labelled as Y - X, but it was in fact calculating X - Y. See raphaelvallat/pingouin#73
v0.3.0 (November 2019)#
New functions
Added
pingouin.plot_rm_corr()
to plot a repeated measures correlation
Enhancements
Added the
relimp
argument topingouin.linear_regression()
to return the relative importance (= contribution) of each individual predictor to the \(R^2\) of the full model.Complete refactoring of
pingouin.intraclass_corr()
to closely match the R implementation in the psych package. Pingouin now returns the 6 types of ICC, together with F values, p-values, degrees of freedom and confidence intervals.The
pingouin.plot_shift()
now 1) uses the Harrel-Davis robust quantile estimator in conjunction with a bias-corrected bootstrap confidence intervals, and 2) support paired samples.Added the
axis
argument topingouin.harrelldavis()
to support 2D arrays.
Older versions#
v0.2.9 (September 2019)
Bugfixes
Disabled default l2 regularization of coefficients in
pingouin.logistic_regression()
. As pointed out by Eshin Jolly in PR54, scikit-learn automatically applies a penalization of coefficients, which in turn makes the estimation of standard errors and p-values not totally correct/interpretable. This regularization behavior is now disabled, resulting in the same behavior as Rglm(..., family=binomial)
.
Code and dependencies
Pandas methods are now internally defined using the pandas_flavor package package.
Internal code refactoring of the
pingouin.pairwise_ttests()
(to slightly speed up computation and improve memory usage).The first argument of the
pingouin.anova()
,pingouin.ancova()
,pingouin.welch_anova()
,pingouin.pairwise_ttests()
,pingouin.pairwise_tukey()
,pingouin.pairwise_gameshowell()
,pingouin.welch_anova()
,pingouin.kruskal()
,pingouin.friedman()
,pingouin.cochran()
,pingouin.remove_rm_na()
functions is nowdata
instead ofdv
(to be consistent with other Pingouin functions). This will cause error if the user runs previous Pingouin code with positional-only arguments. As a general rule, you should always pass keywords arguments (read more here).For clarity,
pingouin.fdr()
,pingouin.bonf()
,pingouin.holm()
have been deprecated from the API and must be called viapingouin.multicomp()
.pingouin.pairwise_ttests()
output does not include theCLES
column by default anymore. Users must explicitly passeffsize='CLES'
.The
remove_na
argument ofpingouin.cronbach_alpha()
has been replaced withnan_policy
(‘pairwise’, or ‘listwise’).Disabled Travis / AppVeyor testing for Python 3.5 While most functions should work just fine, please note that only Python >3.6 is supported now.
New functions
Added
pingouin.harrelldavis()
, a robust quantile estimation method (to be used in a future version of thepingouin.plot_shift()
function). See PR63 by Nicolas Legrand.The
pingouin.ancova()
can now directly be used a Pandas method, e.g.data.ancova(...)
.The
pingouin.pairwise_tukey()
can now directly be used a Pandas method, e.g.data.pairwise_tukey(...)
.Added Sidak one-step correction to
pingouin.multicomp()
(method='sidak'
).
Enhancements
Added support for pairwise deletion in
pingouin.pairwise_ttests()
(default is listwise deletion), using thenan_policy
argument.Added support for listwise deletion in
pingouin.pairwise_corr()
(default is pairwise deletion), using thenan_policy
argument.Added the
interaction
boolean argument topingouin.pairwise_ttests()
, useful if one is only interested in the main effects.Added
correction_uniform
boolean argument topingouin.circ_corrcc()
. See PR64 by Dominik Straub.
Contributors
Nicolas Legrand
Dominik Straub
v0.2.8 (July 2019)
Dependencies
Pingouin now requires SciPy >= 1.3.0 (better handling of tails in
pingouin.wilcoxon()
function) and Pandas >= 0.24 (fixes a minor bug with 2-way within factor interaction inpingouin.epsilon()
with previous version)
New functions
Added
pingouin.rcorr()
Pandas method to calculate a correlation matrix with r-values on the lower triangle and p-values (or sample size) on the upper triangle.Added
pingouin.tost()
function to calculate the two one-sided test (TOST) for equivalence. See PR51 by Antoine Weill–Duflos.
Enhancements
pingouin.anova()
now works with three or more between factors (requiring statsmodels). One-way ANOVA and balanced two-way ANOVA are computed in pure Pingouin (Python + Pandas) style, while ANOVA with three or more factors, or unbalanced two-way ANOVA are computed using statsmodels.pingouin.anova()
now accepts different sums of squares calculation method for unbalanced N-way design (type 1, 2, or 3).pingouin.linear_regression()
now includes several safety checks to remove duplicate predictors, predictors with only zeros, and predictors with only one unique value (excluding the intercept). This comes at the cost, however, of longer computation time, which is evident when using thepingouin.mediation_analysis()
function.pingouin.mad()
now automatically removes missing values and can calculate the mad over the entire array usingaxis=None
if array is multidimensional.Better handling of alternative hypotheses in
pingouin.wilcoxon()
.Better handling of alternative hypotheses in
pingouin.bayesfactor_ttest()
(support for ‘greater’ and ‘less’).Better handling of alternative hypotheses in
pingouin.ttest()
(support for ‘greater’ and ‘less’). This is also taken into account when calculating the Bayes Factor and power of the test.Better handling of alternative hypotheses in
pingouin.power_ttest()
andpingouin.power_ttest2n()
(support for ‘greater’ and ‘less’, and removed ‘one-sided’).Implemented a new method to calculate the matched pair rank biserial correlation effect size for
pingouin.wilcoxon()
, which gives results almost identical to JASP.
v0.2.7 (June 2019)
Dependencies
Pingouin now requires statsmodels>=0.10.0 (latest release June 2019) and is compatible with SciPy 1.3.0.
Enhancements
Added support for long-format dataframe in
pingouin.sphericity()
andpingouin.epsilon()
.Added support for two within-factors interaction in
pingouin.sphericity()
andpingouin.epsilon()
(for the former, granted that at least one of them has no more than two levels.)
New functions
Added
pingouin.power_rm_anova()
function.
v0.2.6 (June 2019)
Bugfixes
Fixed major error in two-sided p-value for Wilcoxon test (
pingouin.wilcoxon()
), the p-values were accidentally squared, and therefore smaller. Make sure to always use the latest release of Pingouin.pingouin.wilcoxon()
now uses the continuity correction by default (the documentation was saying that the correction was applied but it was not applied in the code.)The
show_median
argument of thepingouin.plot_shift()
function was not working properly when the percentiles were different that the default parameters.
Dependencies
The current release of statsmodels (0.9.0) is not compatible with the newest release of Scipy (1.3.0). In order to avoid compatibility issues in the
pingouin.ancova()
andpingouin.anova()
functions (which rely on statsmodels for certain cases), Pingouin will require SciPy < 1.3.0 until a new stable version of statsmodels is released.
New functions
Added
pingouin.chi2_independence()
tests.Added
pingouin.chi2_mcnemar()
tests.Added
pingouin.power_chi2()
function.Added
pingouin.bayesfactor_binom()
function.
Enhancements
pingouin.linear_regression()
now returns the residuals.Completely rewrote
pingouin.normality()
function, which now support pandas DataFrame (wide & long format), multiple normality tests (scipy.stats.shapiro()
,scipy.stats.normaltest()
), and an automatic casewise removal of missing values.Completely rewrote
pingouin.homoscedasticity()
function, which now support pandas DataFrame (wide & long format).Faster and more accurate algorithm in
pingouin.bayesfactor_pearson()
(same algorithm as JASP).Support for one-sided Bayes Factors in
pingouin.bayesfactor_pearson()
.Better handling of required parameters in
pingouin.qqplot()
.The epsilon value for the interaction term in
pingouin.rm_anova()
are now computed using the Greenhouse-Geisser method instead of the lower bound. A warning message has been added to the documentation to alert the user that the value might slightly differ than from R or JASP.
Note that d. and e. also affect the behavior of the pingouin.corr()
and pingouin.pairwise_corr()
functions.
Contributors
v0.2.5 (May 2019)
MAJOR BUG FIXES
Fixed error in p-values for one-sample one-sided T-test (
pingouin.ttest()
), the two-sided p-value was divided by 4 and not by 2, resulting in inaccurate (smaller) one-sided p-values.Fixed global error for unbalanced two-way ANOVA (
pingouin.anova()
), the sums of squares were wrong, and as a consequence so were the F and p-values. In case of unbalanced design, Pingouin now computes a type II sums of squares via a call to the statsmodels package.The epsilon factor for the interaction term in two-way repeated measures ANOVA (
pingouin.rm_anova()
) is now computed using the lower bound approach. This is more conservative than the Greenhouse-Geisser approach and therefore give (slightly) higher p-values. The reason for choosing this is that the Greenhouse-Geisser values for the interaction term differ than the ones returned by R and JASP. This will be hopefully fixed in future releases.
New functions
Added
pingouin.multivariate_ttest()
(Hotelling T-squared) test.Added
pingouin.cronbach_alpha()
function.Added
pingouin.plot_shift()
function.Several functions of pandas can now be directly used as
pandas.DataFrame
methods.Added
pingouin.pcorr()
method to compute the partial Pearson correlation matrix of apandas.DataFrame
(similar to the pcor function in the ppcor package).The
pingouin.partial_corr()
now supports semi-partial correlation.
Enhancements
The
pingouin.rm_corr()
function now returns apandas.DataFrame
with the r-value, degrees of freedom, p-value, confidence intervals and power.pingouin.compute_esci()
now works for paired and one-sample Cohen d.pingouin.bayesfactor_ttest()
andpingouin.bayesfactor_pearson()
now return a formatted str and not a float.pingouin.pairwise_ttests()
now returns the degrees of freedom (dof).Better rounding of float in
pingouin.pairwise_ttests()
.Support for wide-format data in
pingouin.rm_anova()
pingouin.ttest()
now returns the confidence intervals around the difference in means.
Missing values
pingouin.remove_na()
andpingouin.remove_rm_na()
are now external function documented in the API.pingouin.remove_rm_na()
now works with multiple within-factors.pingouin.remove_na()
now works with 2D arrays.Removed the remove_na argument in
pingouin.rm_anova()
andpingouin.mixed_anova()
, an automatic listwise deletion of missing values is applied (same behavior as JASP). Note that this was also the default behavior of Pingouin, but the user could also specify not to remove the missing values, which most likely returned inaccurate results.The
pingouin.ancova()
function now applies an automatic listwise deletion of missing values.Added remove_na argument (default = False) in
pingouin.linear_regression()
andpingouin.logistic_regression()
functionsMissing values are automatically removed in the
pingouin.anova()
function.
Contributors
Raphael Vallat
Nicolas Legrand
v0.2.4 (April 2019)
Correlation
Added
pingouin.distance_corr()
(distance correlation) function.pingouin.rm_corr()
now requires at least 3 unique subjects (same behavior as the original R package).The
pingouin.pairwise_corr()
is faster and returns the number of outlier if a robust correlation is used.Added support for 2D level in the
pingouin.pairwise_corr()
. See Jupyter notebooks for examples.Added support for partial correlation in the
pingouin.pairwise_corr()
function.Greatly improved execution speed of
pingouin.correlation.skipped()
function.Added default random state to compute the Min Covariance Determinant in the
pingouin.correlation.skipped()
function.The default number of bootstrap samples for the
pingouin.correlation.shepherd()
function is now set to 200 (previously 2000) to increase computation speed.pingouin.partial_corr()
now automatically drops rows with missing values.
Datasets
Renamed
pingouin.read_dataset()
andpingouin.list_dataset()
(before one needed to call these functions by calling pingouin.datasets)
Pairwise T-tests and multi-comparisons
Added support for non-parametric pairwise tests in
pingouin.pairwise_ttests()
function.Common language effect size (CLES) is now reported by default in
pingouin.pairwise_ttests()
function.CLES is now implemented in the
pingouin.compute_effsize()
function.Better code, doc and testing for the functions in multicomp.py.
P-values adjustment methods now do not take into account NaN values (same behavior as the R function p.adjust)
Plotting
Added
pingouin.plot_paired()
function.
Regression
NaN are now automatically removed in
pingouin.mediation_analysis()
.The
pingouin.linear_regression()
andpingouin.logistic_regression()
now fail if NaN / Inf are present in the target or predictors variables. The user must remove then before running these functions.Added support for multiple parallel mediator in
pingouin.mediation_analysis()
.Added support for covariates in
pingouin.mediation_analysis()
.Added seed argument to
pingouin.mediation_analysis()
for reproducible results.pingouin.mediation_analysis()
now returns two-sided p-values computed with a permutation test.Added
pingouin.utils._perm_pval()
to compute p-value from a permutation test.
Bugs and tests
Travis and AppVeyor test for Python 3.5, 3.6 and 3.7.
Better doctest & improved examples for many functions.
Fixed bug with
pingouin.mad()
when axis was not 0.
v0.2.3 (February 2019)
Correlation
shepherd now also returns the outlier vector (same behavior as skipped).
The corr function returns the number of outliers for shepherd and skipped.
Removed mahal function.
Licensing
Pingouin is now released under the GNU General Public Licence 3.
Added licenses files of external modules (qsturng and tabulate).
Plotting
NaN are automatically removed in qqplot function
v0.2.2 (December 2018)
Plotting
Started working on Pingouin’s plotting module
Added Seaborn and Matplotlib to dependencies
Added plot_skipped_corr function (PR from Nicolas Legrand)
Added qqplot function (Quantile-Quantile plot)
Added plot_blandaltman function (Bland-Altman plot)
Power
Added power_corr, based on the R pwr package.
Renamed anova_power and ttest_power to power_anova and power_ttest.
Added power column to corr() and pairwise_corr()
power_ttest function can now solve for sample size, alpha and d
power_ttest2n for two-sample T-test with unequal n.
power_anova can now solve for sample size, number of groups, alpha and eta
v0.2.1 (November 2018)
Effect size
Separated compute_esci and compute_bootci
Added corrected percentile method and normal approximation to bootstrap
Fixed bootstrapping method
v0.2.0 (November 2018)
ANOVA
Added Welch ANOVA
Added Games-Howell post-hoc test for one-way ANOVA with unequal variances
Pairwise T-tests now accepts two within or two between factors
Fixed error in padjust correction in the pairwise_ttests function: correction was applied on all p-values at the same time.
Correlation/Regression
Added linear_regression function.
Added logistic_regression function.
Added mediation_analysis function.
Support for advanced indexing (product / combination) in pairwise_corr function.
Documentation
Added Guidelines section with flow charts
Renamed API section to Functions
Major improvements to the documentation of several functions
Added Gitter channel
v0.1.10 (October 2018)
Bug
Fixed dataset names in MANIFEST.in (.csv files were not copy-pasted with pip)
Circular
Added circ_vtest function
Distribution
Added multivariate_normality function (Henze-Zirkler’s Multivariate Normality Test)
Renamed functions test_normality, test_sphericity and test_homoscedasticity to normality, sphericity and homoscedasticity to avoid bugs with pytest.
Moved distribution tests from parametric.py to distribution.py
v0.1.9 (October 2018)
Correlation
Added partial_corr function (partial correlation)
Doc
Minor improvements in docs and binder notebooks
v0.1.8 (October 2018)
ANOVA
Added support for multiple covariates in ANCOVA function (requires statsmodels).
Documentation
Major re-organization in API category
Added equations and references for effect sizes and Bayesian functions.
Non-parametric
Added cochran function (Cochran Q test)
v0.1.7 (September 2018)
ANOVA
Added rm_anova2 function (two-way repeated measures ANOVA).
Added ancova function (Analysis of covariance)
Correlations
Added intraclass_corr function (intraclass correlation).
The rm_corr function uses the new ancova function instead of statsmodels.
Datasets
Added ancova and icc datasets
Effect size
Fixed bug in Cohen d: now use unbiased standard deviation (np.std(ddof=1)) for paired and one-sample Cohen d. Please make sure to use pingouin >= 0.1.7 to avoid any mistakes on the paired effect sizes.
v0.1.6 (September 2018)
ANOVA
Added JNS method to compute sphericity.
Bug
Added .csv datasets files to python site-packages folder
Fixed error in test_sphericity when ddof == 0.
v0.1.5 (August 2018)
ANOVA
rm_anova, friedman and mixed_anova now require a subject identifier. This avoids improper collapsing when multiple repeated measures factors are present in the dataset.
rm_anova, friedman and mixed_anova now support the presence of other repeated measures factors in the dataset.
Fixed error in test_sphericity
Better output of ANOVA summary
Added epsilon function
Code
Added AppVeyor CI (Windows)
Cleaned some old functions
Correlation
Added repeated measures correlation (Bakdash and Marusich 2017).
Added robust skipped correlation (Rousselet and Pernet 2012).
Pairwise_corr function now automatically delete non-numeric columns.
Dataset
Added pingouin.datasets module (read_dataset & list_dataset functions)
Added datasets: bland1995, berens2009, dolan2009, mcclave1991
Doc
Examples are now Jupyter Notebooks.
Binder integration
Misc
Added median absolute deviation (mad)
Added mad median rule (Wilcox 2012)
Added mahal function (equivalent of Matlab mahal function)
Parametric
Added two-way ANOVA.
Added pairwise_tukey function
v0.1.4 (July 2018)
Installation
Fix bug with pip install caused by pingouin.external
Circular statistics
Added circ_corrcc, circ_corrcl, circ_r, circ_rayleigh
v0.1.3 (June 2018)
Documentation
Added several tutorials
Improved doc of several functions
Bayesian
T-test now reports the Bayes factor of the alternative hypothesis (BF10)
Pearson correlation now reports the Bayes factor of the alternative hypothesis (BF10)
Non-parametric
Kruskal-Wallis test
Friedman test
Correlations
Added Shepherd’s pi correlation (Schwarzkopf et al. 2012)
Fixed bug in confidence intervals of correlation coefficients
Parametric 95% CI are returned by default when calling corr
v0.1.2 (June 2018)
Correlation
Pearson
Spearman
Kendall
Percentage bend (robust)
Pairwise correlations between all columns of a pandas dataframe
Non-parametric
Mann-Whitney U
Wilcoxon signed-rank
Rank-biserial correlation effect size
Common language effect size
v0.1.1 (April 2018)
ANOVA
One-way
One-way repeated measures
Two-way split-plot (one between factor and one within factor)
Miscellaneous statistical functions
T-tests
Power of T-tests and one-way ANOVA
v0.1.0 (April 2018)
Initial release.
Pairwise comparisons
FDR correction (BH / BY)
Bonferroni
Holm
Effect sizes:
Cohen’s d (independent and repeated measures)
Hedges g
Glass delta
Eta-square
Odds-ratio
Area Under the Curve
Miscellaneous statistical functions
Geometric Z-score
Normality, sphericity homoscedasticity and distributions tests
Code
PEP8 and Flake8
Tests and code coverage