pingouin.gzscore#

pingouin.gzscore(x, *, axis=0, ddof=1, nan_policy='propagate')[source]#

Geometric standard (Z) score.

Parameters:
xarray_like

Array of raw values.

axisint or None, optional

Axis along which to operate. Default is 0. If None, compute over the whole array x.

ddofint, optional

Degrees of freedom correction in the calculation of the standard deviation. Default is 1.

nan_policy{‘propagate’, ‘raise’, ‘omit’}, optional

Defines how to handle when input contains nan. ‘propagate’ returns nan, ‘raise’ throws an error, ‘omit’ performs the calculations ignoring nan values. Default is ‘propagate’. Note that when the value is ‘omit’, nans in the input also propagate to the output, but they do not affect the geometric z scores computed for the non-nan values.

Returns:
gzscorearray_like

Array of geometric z-scores (same shape as x).

Notes

Geometric Z-scores are better measures of dispersion than arithmetic z-scores when the sample data come from a log-normally distributed population [1].

Given the raw scores \(x\), the geometric mean \(\mu_g\) and the geometric standard deviation \(\sigma_g\), the standard score is given by the formula:

\[z = \frac{log(x) - log(\mu_g)}{log(\sigma_g)}\]

References

Examples

Standardize a lognormal-distributed vector:

>>> import numpy as np
>>> from pingouin import gzscore
>>> np.random.seed(123)
>>> raw = np.random.lognormal(size=100)
>>> z = gzscore(raw)
>>> print(round(z.mean(), 3), round(z.std(), 3))
-0.0 0.995