Day 9 — Z-Scores vs Robust Z-Scores ⚔️📊
When one wild point can topple the mean, it is time to switch to statistics that fight back.
Classical z-scores rely on the mean and standard deviation, while robust z-scores lean on the median and MAD.
💡 Note: This article uses technical terms and abbreviations. For definitions, check out the Key Terms & Glossary page.
🎯 Introduction
The classical z-score normalizes data by subtracting the mean and dividing by the standard deviation. It shines when data are clean and approximately normal. Yet a single wild observation can warp both statistics, hiding true outliers the moment you need detection most.
TL;DR:
- Classical z-scores depend on the mean and standard deviation, both of which have a 0% breakdown point and an unbounded influence function.
- Robust z-scores swap in the median and MAD (Median Absolute Deviation) so half the data must be corrupted before things collapse.
- Use classical z-scores for clean, well-behaved measurements; prefer robust z-scores as your default for messy, real-world datasets.

💕➡️💔 The Classic Z-Score: Powerful Until It Isn't
The z-score answers a timeless question: “How many standard deviations away from the mean is this point?”
z = (x - μ) / σ
Where x is the data value, μ is the sample mean, and σ is the sample standard deviation. Analysts flag |z| > 3 (sometimes 2.5) as suspicious.
With clean values like [10, 12, 11, 13, 12, 14, 11, 13, 12, 11], it works brilliantly:
μ = 11.9
σ = 1.2
z(20) = (20 - 11.9) / 1.2 = 6.75 ✅ obvious outlier
Add a single rogue point and everything crumbles:
Contaminated data: [10, 12, 11, 13, 12, 14, 11, 13, 12, 1000]
μ = 110.8
σ = 295.7
z(20) = (20 - 110.8) / 295.7 = -0.31 ❌ hidden
z(1000) = (1000 - 110.8) / 295.7 = 3.01 😬 barely flagged
The outlier throws off the very yardsticks being used to detect it.
💣 Breakdown Points & Influence: Why Collapse Happens
The mean and standard deviation topple instantly, while the median and MAD stand firm until half the data fall.
The breakdown point measures how much contamination a statistic can tolerate. The mean and standard deviation have a 0% breakdown point—one extreme value can drive them to infinity. Their influence functions are unbounded, meaning the further a point is from the center, the more it drags the statistic along.

By contrast, the median and MAD boast a 50% breakdown point and bounded influence. Half the sample must be corrupted before they crumble.
🛡️ Median & MAD: The Robust Duo
The median is the middle value of sorted data. It takes a concerted attack on at least half the sample to move it.
The MAD (Median Absolute Deviation) measures spread without trusting the mean:
MAD = median(|xᵢ - median(x)|)
MAD* = 1.4826 × MAD
The constant 1.4826 rescales MAD so that MAD* ≈ σ for normally distributed samples. Together, the median and MAD supply a resilient center and scale for the same z-score formula.

🧮 Classical vs Robust Formulas
| Step | Classical z-score | Robust z-score | |------|-------------------|----------------| | Center | μ = (1/n) Σ xᵢ | median(x) | | Spread | σ = √[(1/n) Σ (xᵢ − μ)²] | MAD* = 1.4826 × median(|xᵢ − median(x)|) | | Score | zᵢ = (xᵢ − μ) / σ | zᵢᵣ = (xᵢ − median) / MAD* | | Typical threshold | |zᵢ| > 3 | |zᵢᵣ| > 3.5 |
Classical statistics are optimal for pristine Gaussian data; robust statistics remain useful when normality is a pipe dream.
🔬 Walkthrough: Student Scores Gone Wrong
Dataset with a fat-fingered entry:
[78, 82, 85, 79, 91, 88, 76, 84, 89, 87,
83, 80, 86, 92, 81, 85, 999, 88, 84, 90]
Classical z-score:
μ = 92.05
σ ≈ 203.8
z(76) ≈ -0.08 → “perfectly normal”
z(999) ≈ 4.45 → barely past the 3σ rule
Robust z-score:
median = 85
MAD = 4
MAD* = 5.93
zᵣ(76) = (76 - 85) / 5.93 ≈ -1.52
zᵣ(999) = (999 - 85) / 5.93 ≈ 154.1 🚨
The robust approach instantly isolates the malformed record while keeping authentic scores unflagged.

📈 Visual Diagnostics Matter
Quantile-Quantile (QQ) plots reveal how a single outlier bends the distribution away from normality. Before winsorizing or trimming, the top-right point rockets off the diagonal. After robust cleaning, the plot settles back onto the line, signaling that classical techniques may again be appropriate.

🧭 Choosing the Right Tool
Start with robust methods, then switch to classical once the data behave.
Favour classical z-scores when:
- Instrumentation enforces quality and the dataset is demonstrably clean.
- You are confirming findings with large, well-behaved samples.
- Industry regulations require the traditional formula.
Favour robust z-scores when:
- Data provenance is unknown or messy (the default in the wild).
- Sample sizes are modest and every point counts.
- Skewed, heavy-tailed, or multi-modal distributions appear in diagnostics.
- Missing a real anomaly would be catastrophic.

🛠️ Code Snippets
Show code (15 lines)
import numpy as np
def zscore_outliers(data, threshold=3.0):
mean = np.mean(data)
std = np.std(data)
scores = (data - mean) / std
return data[np.abs(scores) > threshold]
def robust_zscore_outliers(data, threshold=3.5):
median = np.median(data)
mad = np.median(np.abs(data - median))
mad_star = 1.4826 * mad
scores = (data - median) / mad_star
return data[np.abs(scores) > threshold]
Run both; large disagreements are an immediate red flag that classical assumptions failed.
🌟 Takeaway
- Classical z-scores crumble in the face of even one contaminated point.
- Robust z-scores powered by the median and MAD withstand up to 50% corruption.
- Use diagnostics, iterate, and document which yardstick you chose (and why).
🔭 Coming Up Next
Day 10 dives into Local Outlier Factor (LOF) to catch anomalies hiding in low-density pockets of high-dimensional space. 🔍
📚 References
- Rousseeuw, P.J., & Croux, C. (1993). Alternatives to the Median Absolute Deviation. Journal of the American Statistical Association.
- Hampel, F.R. (1974). The Influence Curve and Its Role in Robust Estimation. Journal of the American Statistical Association.
- Wilcox, R.R. (2017). Introduction to Robust Estimation and Hypothesis Testing.
✨ Thanks for geeking out with me on robust statistics—see you tomorrow! ✨



