Sughosh Dixit
Sughosh P Dixit
2025-11-056 min read

Day 5 — Robust Location and Scale: Median & MAD (Simple Guide + Worked Example)

Article Header Image

TL;DR

Quick summary

The mean and standard deviation (SD) can be swayed by outliers like reeds in the wind — a single extreme value can pull them off course. The median and MAD (Median Absolute Deviation), on the other hand, are sturdy rocks in the statistical stream. They resist distortion and give reliable 'center' and 'spread' estimates, even when your data are skewed or heavy-tailed.

Key takeaways
  • Day 5 — Robust Location and Scale: Median & MAD (Simple Guide + Worked Example)
Preview

Day 5 — Robust Location and Scale: Median & MAD (Simple Guide + Worked Example)

The mean and standard deviation (SD) can be swayed by outliers like reeds in the wind — a single extreme value can pull them off course. The median and MAD (Median Absolute Deviation), on the other hand, are sturdy rocks in the statistical stream. They resist distortion and give reliable 'center' and 'spread' estimates, even when your data are skewed or heavy-tailed.

Day 5 — Robust Location and Scale: Median & MAD (Simple Guide + Worked Example)

Robust statistics that resist outliers! 💪

💡 Note: This article uses technical terms and abbreviations. For definitions, check out the Key Terms & Glossary page.


🎯 Introduction

When data contains outliers, traditional measures like mean and SD can be misleading. Robust statistics like the median and MAD provide stable estimates that resist distortion from extreme values.

TL;DR:

The mean and standard deviation (SD) can be swayed by outliers like reeds in the wind 🌾 — a single extreme value can pull them off course.

The median and MAD (Median Absolute Deviation), on the other hand, are sturdy rocks in the statistical stream. 💪

They resist distortion and give reliable "center" and "spread" estimates, even when your data are skewed or heavy-tailed.

Use them to compute robust z-scores, which catch anomalies without being fooled by outliers. 🚨

Median & MAD Concept


💡 Why Robust Statistics?

⚠️ Mean/SD are fragile: a single large value can shift both.

🧱 Median/MAD are robust: they focus on the central tendency and typical deviation.

🎯 Robust z-scores highlight genuine outliers even when your data contain a few extremes.

Why Robust Statistics


📘 Key Definitions

Median: the "middle" value after sorting your data (half below, half above).

MAD (Median Absolute Deviation):

  • Compute the median m.
  • Take absolute deviations |xᵢ − m|.
  • Take the median of those deviations → that's MAD.

Robust z-score:

zᵣ = 0.6745 × (x − median) / MAD

Why 0.6745? Because for a Normal distribution, MAD ≈ 0.6745 × SD.

That scaling ensures robust z-scores align roughly with classical z-scores.

Key Definitions Visualization


🧮 Worked Example — Step by Step

Data with one big outlier:

[10, 12, 13, 13, 14, 15, 100]

Step 1️⃣ — Median

n = 7 → 4th value = 13

Step 2️⃣ — Absolute deviations from the median

x |x − 13|
10 3
12 1
13 0
13 0
14 1
15 2
100 87

Sorted deviations: [0, 0, 1, 1, 2, 3, 87]

MAD = 1 (the 4th value)

Step 3️⃣ — Robust z-scores

zᵣ = 0.6745 × (x − 13) / 1

x z
10 −2.02
12 −0.67
13 0.00
14 +0.67
15 +1.35
100 +58.68 🚨

Interpretation:

  • Most points sit around ±2 — normal variation.
  • The outlier (100) explodes to +58.7 — unmistakably extreme.

Worked Example


🔍 Compare with Mean & SD

Let's see how classical stats behave on the same data:

  • Mean ≈ 25.29
  • SD ≈ 30.54

Classical z-score for x = 100:

(100 − 25.29) / 30.54 ≈ +2.45

🧩 Observation:

  • The classical z-score barely flags 100 as unusual (+2.45).
  • The robust z-score screams "outlier!" (+58.68).

That's the power of robust measures: they don't let one big number distort the story.

Mean vs Median Comparison


🧭 When to Use Median/MAD

Use when:

  • Your data have outliers or long tails.
  • You need stable estimates of center/spread.
  • You're building anomaly detectors or control charts that must resist distortion.

🚫 Avoid when:

  • Data are clean, symmetric, and close to Normal — mean/SD are slightly more efficient there.

When to Use Robust Statistics


🍳 Quick Recipe (Ready to Copy)

1️⃣ Sort your data → compute median.

2️⃣ Compute absolute deviations → find MAD.

3️⃣ Compute for each x:

z_robust = 0.6745 × (x − median) / MAD

4️⃣ Flag outliers if |zᵣₒbᵤₛₜ| > threshold (common thresholds: 3.5 or 4.5).

Simple. Explainable. Powerful.


📈 Visual Idea (Optional Plot)

Create a scatterplot of classical z vs robust z.

On skewed data, classical z flattens the extremes, while robust z exposes them.

A picture that says a thousand outliers. 😉

Classical vs Robust Z-Scores


📋 Tiny Recap Table

x Classical z Robust z
10 −0.50 −2.02
12 −0.43 −0.67
13 −0.40 0.00
14 −0.37 +0.67
15 −0.34 +1.35
100 +2.45 +58.68 🚨

Robust z tells the truth — and the truth is loud. 📣

Recap Table Visualization


🌟 Takeaway

  • Median + MAD = the sturdier cousins of mean/SD.
  • They stay centered when outliers appear.
  • Robust z-scores reveal what classical z-scores often hide.
  • Use them when your data aren't "nice and Normal."

They'll never overreact — or underreact — to the wild ones. 🔍💪


📚 References

  1. Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., & Stahel, W. A. (2011). Robust Statistics: The Approach Based on Influence Functions. John Wiley & Sons.

  2. Huber, P. J., & Ronchetti, E. M. (2009). Robust Statistics (2nd ed.). John Wiley & Sons.

  3. Rousseeuw, P. J., & Croux, C. (1993). Alternatives to the median absolute deviation. Journal of the American Statistical Association, 88(424), 1273-1283.

  4. Leys, C., Ley, C., Klein, O., Bernard, P., & Licata, L. (2013). Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median. Journal of Experimental Social Psychology, 49(4), 764-766.

  5. Mosteller, F., & Tukey, J. W. (1977). Data Analysis and Regression: A Second Course in Statistics. Addison-Wesley.

  6. Hoaglin, D. C., Mosteller, F., & Tukey, J. W. (Eds.). (1983). Understanding Robust and Exploratory Data Analysis. John Wiley & Sons.

  7. Wilcox, R. R. (2012). Introduction to Robust Estimation and Hypothesis Testing (3rd ed.). Academic Press.

  8. Maronna, R. A., Martin, R. D., & Yohai, V. J. (2019). Robust Statistics: Theory and Methods (2nd ed.). John Wiley & Sons.

  9. Iglewicz, B., & Hoaglin, D. C. (1993). How to detect and handle outliers. ASQ Basic References in Quality Control: Statistical Techniques, 16, 87-88.

  10. Rousseeuw, P. J., & Leroy, A. M. (2005). Robust Regression and Outlier Detection. John Wiley & Sons.


Day 5 Complete! 🎉

This is Day 5 of my 30-day challenge documenting my Data Science journey at Oracle! Stay tuned for more insights and mathematical foundations of data science. 🚀

Next: Day 6 - Coming Tomorrow!
Sughosh P Dixit
Sughosh P Dixit
Data Scientist & Tech Writer
6 min read
Previous Post

Day 4 — Percentile Rank and Stratifications

Percentile ranks turn any numeric feature into a simple score in [0,1] that says 'what fraction of the data is at or below this value.' Learn how to combine ranks with min/max and create strata for sampling, prioritization, or analysis.

Next Post

Day 6 — Distribution Shape: Skewness and Kurtosis (Simple Guide + Visuals)

Skewness tells you if data lean left or right (asymmetry). Kurtosis tells you how heavy the tails are (how many extremes you see). Two datasets can share the same mean and variance but look completely different — shape features reveal the hidden story.