Skip to content

Add statistics.median_absolute_deviation#152227

Open
AnandSundar wants to merge 5 commits into
python:mainfrom
AnandSundar:feat/median-absolute-deviation
Open

Add statistics.median_absolute_deviation#152227
AnandSundar wants to merge 5 commits into
python:mainfrom
AnandSundar:feat/median-absolute-deviation

Conversation

@AnandSundar

Copy link
Copy Markdown

Summary

Add statistics.median_absolute_deviation(data, *, scale=1.4826) to the CPython standard library. The median absolute deviation (MAD) is the textbook robust measure of statistical dispersion: the median of the absolute deviations from the median.

For normally distributed data, the default scale=1.4826 (the consistency constant) produces an estimator of the population standard deviation that is consistent with statistics.stdev. Pass scale=1.0 for the raw value.

Why

The statistics module currently provides traditional spread measures (variance, stdev) but lacks a robust counterpart. Standard deviation is well known to be sensitive to outliers; MAD is the textbook alternative. It is the default "robust scale" in R (mad()) and NumPy (median_abs_deviation). For users who don't want a third-party dependency for a single stat — educators, stdlib-only scripts, Python-first learners — stdlib MAD closes an obvious gap.

What changes

  • Lib/statistics.py — new median_absolute_deviation(data, *, scale=1.4826) function. Result type follows input data (int/float → float; Decimal → Decimal; Fraction → Fraction). scale must be int or float (Decimal/Fraction raise TypeError). All-NaN input raises StatisticsError; partial NaN propagates. Top docstring autosummary table and "Measures of spread" table widened to fit the 25-character name. __all__ updated.
  • Lib/test/test_statistics.py — new TestMedianAbsoluteDeviation class with 28 test methods covering happy path, edge cases, error paths, type acceptance, and scale type guard. Follows TestMedian/TestStdev patterns.
  • Doc/library/statistics.rst — autosummary table entry + full function directive with int/Decimal/Fraction doctest examples + .. versionadded:: 3.16.
  • Doc/whatsnew/3.16.rst — new "statistics" section under "Improved modules".

Out of scope (deliberately)

  • No Lib/statistics.pyi — that file does not exist in CPython main. Type stubs for stdlib modules live in the separate typeshed repo and will be filed there as a follow-up. (grep -L '\.pyi' Lib/*.pyi returns zero matches in Lib/.)
  • No C-level optimization — pure Python, matching the surrounding spread-measure functions.
  • No NormalDist integration — could be added as NormalDist.from_mad() in a follow-up PR if requested.

TODO before merge

  1. Replace [REPLACE WITH CONTRIBUTOR NAME] and [REPLACE WITH PR NUMBER] placeholders in the Doc/whatsnew/3.16.rst entry.
  2. Update the gh-XXXXXX placeholders in the four commit messages with the actual issue/PR number (via git rebase -i origin/main and reword).
  3. File the corresponding PR against typeshed to add a type stub for median_absolute_deviation.

Add the median absolute deviation (MAD) function to the statistics module.
MAD is a robust measure of statistical dispersion: the median of the absolute
deviations from the median, optionally scaled by a consistency constant.

The default scale=1.4826 (the consistency constant for the normal distribution)
produces an estimator of the population standard deviation that is consistent
with statistics.stdev. Pass scale=1.0 for the raw value.

* data: a sequence or iterable of real-valued numbers
* scale: int or float (Decimal/Fraction raise TypeError)

Result type follows the data type (int/float input yields float; Decimal
input yields Decimal; Fraction input yields Fraction). NaN propagates
when at least one non-NaN value is present; all-NaN input raises
StatisticsError (matching statistics.median()).

Includes module docstring updates and __all__ entry.
Add TestMedianAbsoluteDeviation class in Lib/test/test_statistics.py,
following the TestMedian / TestStdev pattern.

Coverage:
- Happy path: known answers for ints and floats, default / scale=1.0 /
  scale=2.0 / scale=3.0 / negative scale
- Edge cases: empty (StatisticsError), single value, two-value symmetric,
  all-same, even-count averaging, generator input, tuple input
- Error paths: non-numeric data (TypeError), all-NaN (StatisticsError),
  partial NaN (propagates)
- Type acceptance: Decimal input -> Decimal result; Fraction input ->
  Fraction result (preserving precision); int input -> float result;
  mixed int+float -> float result
- Scale type guard: Decimal / Fraction / str / list / None scale all
  raise TypeError

Reuses NumericTestCase.assertApproxEqual for floating-point comparisons.
Document the new median_absolute_deviation function in Doc/library/statistics.rst:

- Add :func:`median_absolute_deviation` to the 'Measures of spread'
  autosummary table
- Add a new .. function:: directive with full description, doctest examples
  (int, Decimal, Fraction), and a .. versionadded:: 3.16 annotation
Add a new 'statistics' section under 'Improved modules' in
Doc/whatsnew/3.16.rst, between shlex and tkinter (alphabetical order).

Bullet describes the new statistics.median_absolute_deviation function,
its default scale=1.4826 consistency constant, and the alternative
scale=1 for the raw value.

TODO: replace [REPLACE WITH CONTRIBUTOR NAME] and [REPLACE WITH PR NUMBER]
placeholders with the actual contributor attribution when the PR is filed.
@AnandSundar AnandSundar requested a review from AA-Turner as a code owner June 25, 2026 18:20
@bedevere-app

bedevere-app Bot commented Jun 25, 2026

Copy link
Copy Markdown

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@python-cla-bot

python-cla-bot Bot commented Jun 25, 2026

Copy link
Copy Markdown

All commit authors signed the Contributor License Agreement.

CLA signed

Replace [REPLACE WITH CONTRIBUTOR NAME] and [REPLACE WITH PR NUMBER]
placeholders with Anand Sundar and pythongh-152227 now that the PR is open.
AnandSundar added a commit to AnandSundar/cpython that referenced this pull request Jun 25, 2026
Replace [REPLACE WITH CONTRIBUTOR NAME] and [REPLACE WITH PR NUMBER]
placeholders with Anand Sundar and pythongh-152227 now that the PR is open.
@bedevere-app

bedevere-app Bot commented Jun 25, 2026

Copy link
Copy Markdown

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@AnandSundar AnandSundar force-pushed the feat/median-absolute-deviation branch from 0fd689c to e2d8233 Compare June 25, 2026 18:24
@bedevere-app

bedevere-app Bot commented Jun 25, 2026

Copy link
Copy Markdown

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant