pingouin.qqplot#

pingouin.qqplot(x, dist='norm', sparams=(), confidence=0.95, square=True, ax=None, **kwargs)[source]#

Quantile-Quantile plot.

Parameters:
xarray_like

Sample data.

diststr or stats.distributions instance, optional

Distribution or distribution function name. The default is ‘norm’ for a normal probability plot.

sparamstuple, optional

Distribution-specific shape parameters (shape parameters, location, and scale). See scipy.stats.probplot() for more details.

confidencefloat

Confidence level (.95 = 95%) for point-wise confidence envelope. Can be disabled by passing False.

square: bool

If True (default), ensure equal aspect ratio between X and Y axes.

axmatplotlib axes

Axis on which to draw the plot

**kwargsoptional

Optional argument(s) passed to matplotlib.pyplot.scatter().

Returns:
axMatplotlib Axes instance

Returns the Axes object with the plot for further tweaking.

Raises:
ValueError

If sparams does not contain the required parameters for dist. (e.g. scipy.stats.t has a mandatory degrees of freedom parameter df.)

Notes

This function returns a scatter plot of the quantile of the sample data x against the theoretical quantiles of the distribution given in dist (default = ‘norm’).

The points plotted in a Q–Q plot are always non-decreasing when viewed from left to right. If the two distributions being compared are identical, the Q–Q plot follows the 45° line y = x. If the two distributions agree after linearly transforming the values in one of the distributions, then the Q–Q plot follows some line, but not necessarily the line y = x. If the general trend of the Q–Q plot is flatter than the line y = x, the distribution plotted on the horizontal axis is more dispersed than the distribution plotted on the vertical axis. Conversely, if the general trend of the Q–Q plot is steeper than the line y = x, the distribution plotted on the vertical axis is more dispersed than the distribution plotted on the horizontal axis. Q–Q plots are often arced, or “S” shaped, indicating that one of the distributions is more skewed than the other, or that one of the distributions has heavier tails than the other.

In addition, the function also plots a best-fit line (linear regression) for the data and annotates the plot with the coefficient of determination \(R^2\). Note that the intercept and slope of the linear regression between the quantiles gives a measure of the relative location and relative scale of the samples.

Warning

Be extra careful when using fancier distributions with several parameters. Always double-check your results with another software or package.

References

  • cran/car

  • Fox, J. (2008), Applied Regression Analysis and Generalized Linear Models, 2nd Ed., Sage Publications, Inc.

Examples

Q-Q plot using a normal theoretical distribution:

>>> import numpy as np
>>> import pingouin as pg
>>> np.random.seed(123)
>>> x = np.random.normal(size=50)
>>> ax = pg.qqplot(x, dist='norm')
../_images/pingouin-qqplot-1.png

Two Q-Q plots using two separate axes:

>>> import numpy as np
>>> import pingouin as pg
>>> import matplotlib.pyplot as plt
>>> np.random.seed(123)
>>> x = np.random.normal(size=50)
>>> x_exp = np.random.exponential(size=50)
>>> fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(9, 4))
>>> ax1 = pg.qqplot(x, dist='norm', ax=ax1, confidence=False)
>>> ax2 = pg.qqplot(x_exp, dist='expon', ax=ax2)
../_images/pingouin-qqplot-2.png

Using custom location / scale parameters as well as another Seaborn style

>>> import numpy as np
>>> import seaborn as sns
>>> import pingouin as pg
>>> import matplotlib.pyplot as plt
>>> np.random.seed(123)
>>> x = np.random.normal(size=50)
>>> mean, std = 0, 0.8
>>> sns.set_style('darkgrid')
>>> ax = pg.qqplot(x, dist='norm', sparams=(mean, std))
../_images/pingouin-qqplot-3.png