Sughosh Dixit
Sughosh P Dixit
2025-11-085 min read

Day 8 — Adjusted Boxplots & Medcouple

Article Header Image

TL;DR

Quick summary

Adjusted boxplots combine Tukey fences with the medcouple skewness measure so long tails do not trigger false outliers.

Key takeaways
  • Day 8 — Adjusted Boxplots & Medcouple
Preview

Day 8 — Adjusted Boxplots & Medcouple

Adjusted boxplots combine Tukey fences with the medcouple skewness measure so long tails do not trigger false outliers.

Day 8 — Adjusted Boxplot & Medcouple 📊✨

Taming skewed distributions without crying wolf on legitimate extremes.

Adjusted boxplots act like smart guards that know which tail naturally stretches.

💡 Note: This article uses technical terms and abbreviations. For definitions, check out the Key Terms & Glossary page.


🎯 Introduction

Regular boxplots assume symmetric data, so a long tail looks suspicious. Real-world datasets—salaries, housing prices, reaction times—are often skewed. The adjusted boxplot fixes this by combining Tukey-style fences with the medcouple, a robust skewness statistic.

TL;DR:

  • Medcouple measures skewness on a scale from −1 → +1.
  • Adjusted fences use exponential factors so the long tail gets extra space.
  • Positively skewed data widens the upper fence and tightens the lower fence (and vice versa).
  • Use adjusted boxplots when histograms look lopsided or domain knowledge says “long tail is normal”.

Adjusted Boxplot Overview


🤔 The Problem with Regular Boxplots

Imagine company salaries: most lie between ₹40k–₹80k, yet one executive earns ₹1600k. A traditional boxplot calls that an outlier because it places equal weight on both tails. The result: normal tail behavior is mislabelled as noise.

Why symmetric fences fail

  • Traditional fences: Lower = Q₁ − 1.5 × [IQR](/key) and Upper = Q₃ + 1.5 × IQR
  • Works brilliantly when the distribution is balanced.
  • Breaks when a tail is naturally long—think incomes, clicks, insurance claims.

Regular boxplots think “extreme on either side” is equally likely; skewed data disagrees.


🎯 Enter the Adjusted Boxplot

The adjusted boxplot tweaks Tukey’s fences with an exponential factor driven by the medcouple. More skew means more asymmetry in the allowable range.

Adjusted fences stretch toward the natural tail and tighten the quiet side.

Smart guard analogy

  • Regular guard: “Tall or short? Either way you look suspicious.”
  • Adjusted guard: “Most folks are tall today; short people stand out more than tall ones.”

📐 Meet the Medcouple (MC)

The medcouple is a robust skewness statistic that compares symmetric pairs around the median. It ignores extreme values and captures how one tail spreads relative to the other.

  • MC ≈ 0 → roughly symmetric.
  • MC > 0 → positively skewed (long right tail).
  • MC < 0 → negatively skewed (long left tail).
MC = median of h(xᵢ, xⱼ)

where h(xᵢ, xⱼ) = ((xⱼ - median) - (median - xᵢ)) / (xⱼ - xᵢ)

Medcouple Concept

Robust means the statistic resists the influence of single extreme values—perfect for skewed data.


🚧 How the Adjusted Fences Work

Traditional fences use a fixed multiplier. Adjusted fences multiply 1.5 × IQR by an exponential function of the medcouple:

Lower fence = Q₁ − 1.5 × exp(−3.5 × MC) × [IQR](/key)
Upper fence = Q₃ + 1.5 × exp(+4.0 × MC) × [IQR](/key)
  • Positive MC (right skew): exp(+4.0 × MC) explodes, stretching the upper fence; exp(−3.5 × MC) shrinks, tightening the lower fence.
  • Negative MC (left skew): the behavior flips—lower fence loosens, upper fence tightens.

Skew-Based Fence Adjustments


🏡 Worked Example — House Prices

Dataset (₹ in thousands): [150, 180, 200, 220, 250, 280, 320, 400, 650, 1200]

[Q₁](/key) = 200,  [Q₂](/key) = 265,  [Q₃](/key) = 400
[IQR](/key) = 200
Medcouple ≈ 0.35 (right skew)

Traditional fences

Lower = 200 − 1.5 × 200 = −100
Upper = 400 + 1.5 × 200 = 700 → flags 1200 as an outlier ❌

Adjusted fences

Lower = 200 − 1.5 × exp(−3.5 × 0.35) × 200 ≈ 112
Upper = 400 + 1.5 × exp(4.0 × 0.35) × 200 ≈ 1618

No outliers detected ✅ — the long right tail is normal for property values.

House Price Analysis


🔧 Pseudocode Implementation

Show code (10 lines)
def adjusted_boxplot_outliers(data):
    Q1, Q3 = compute_quartiles(data)
    IQR = Q3 - Q1
    MC = compute_medcouple(data)

    lower = Q1 - 1.5 * math.exp(-3.5 * MC) * IQR
    upper = Q3 + 1.5 * math.exp(4.0 * MC) * IQR

    return [x for x in data if x < lower or x > upper]

Tools like adjusted_boxplot_outliers in our toolkit automate the math while you focus on interpretation.


🎯 When to Switch Boxplots

Use adjusted boxplots when:

  • Histograms or density plots reveal skewness.
  • Domain knowledge screams “long tail is normal” (salaries, prices, insurance claims).
  • Traditional boxplots call too many valid values outliers.

Stick with regular boxplots when:

  • Data is roughly symmetric or sample size is tiny (<20).
  • You need a quick-and-dirty check for any extreme value.

🌟 Takeaway

  • The medcouple captures skewness without being tricked by outliers.
  • Adjusted boxplots expand and contract fences intelligently.
  • Long tails stop masquerading as anomalies; true anomalies still pop.

Balanced fences keep watch where it matters, so real anomalies stand out.


🔭 Coming Up Next

Day 9 — Local Outlier Factor (LOF) explores density-based detection so you can catch anomalies hiding in neighborhoods.


📚 References

  • Annick Brys, Mia Hubert, and Peter Rousseeuw (2004). “A Robust Measure of Skewness.” Journal of Computational and Graphical Statistics.
  • Rousseeuw, P. J., & Hubert, M. (2011). “Robust Statistics for Outlier Detection.” Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery.
  • Hubert, M., Vandervieren, E. (2008). “An Adjusted Boxplot for Skewed Distributions.” Computational Statistics & Data Analysis.

🎉 You Made It — Day 8

Thanks for sticking with the journey! Every day adds another robust tool to your kit.


✨ In one line: Adjusted boxplots plus the medcouple let you respect skewed data while still catching the truly weird points.

Sughosh P Dixit
Sughosh P Dixit
Data Scientist & Tech Writer
5 min read
Previous Post

Day 7 — Boxplots, IQR, and Tukey Fences

Boxplots are the simplest visual way to spot outliers. They rely on the IQR (Interquartile Range) — the middle 50% of your data — and build 'fences' around it. Points outside these fences are suspected outliers. It's simple, robust, and doesn't assume your data are Normal.

Next Post

Day 9 — Z-Scores vs Robust Z-Scores

Compare classical z-scores built on mean and standard deviation with robust z-scores powered by the median and MAD to see why robustness matters when data gets messy.