Sughosh Dixit
Sughosh P Dixit
2025-11-2411 min read

Day 24: Risk Segmentation - Priority Tiers as Priors and Costs

Article Header Image

TL;DR

Quick summary

Learn to interpret priority tiers as prior beliefs or cost weights. Understand cost-sensitive thresholding, Bayes optimal decision rules, and how per-tier thresholds change labeling geometry through iso-cost analysis.

Key takeaways
  • Day 24: Risk Segmentation - Priority Tiers as Priors and Costs
Preview

Day 24: Risk Segmentation - Priority Tiers as Priors and Costs

Learn to interpret priority tiers as prior beliefs or cost weights. Understand cost-sensitive thresholding, Bayes optimal decision rules, and how per-tier thresholds change labeling geometry through iso-cost analysis.

Day 24: Risk Segmentation - Priority Tiers as Priors and Costs ⚖️📊

Interpret priority tiers as prior beliefs or cost weights. Learn cost-sensitive thresholding and Bayes optimal decisions.

Risk segmentation assigns different decision thresholds based on prior beliefs and misclassification costs, optimizing overall decision quality.

When different types of errors have different costs, a single threshold isn't optimal. Risk segmentation lets us tune decisions per segment, balancing precision and recall based on business needs.

💡 Note: This article uses technical terms and abbreviations. For definitions, check out the Key Terms & Glossary page.


The Problem: One Threshold Doesn't Fit All 🎯

Scenario: You have an anomaly detection system with four priority tiers:

  • Critical: High-priority entities requiring immediate attention
  • High: Elevated concern, needs investigation
  • Standard: Normal entities, low anomaly rate
  • Baseline: Verified trusted entities, minimal intervention

Question: Should you use the same decision threshold for all tiers?

Answer: No! Different tiers have different:

  • Prior probabilities (base rates of anomalies)
  • Misclassification costs (cost of false positives vs false negatives)

The insight: Priority tiers encode our prior beliefs and cost preferences! 💡


Priority Tiers as Prior Beliefs 🎲

What are Priors?

Prior probability (π) is our belief about class proportions before seeing the score.

Notation:

  • π₁ = P(Anomaly) = Prior probability of positive class
  • π₀ = P(Normal) = Prior probability of negative class
  • π₀ + π₁ = 1

Different Priors by Priority Tier

Example:

Tier       π₁ (Anomaly Rate)    π₀ (Normal)
─────────────────────────────────────────────
Critical   0.30 (30%)           0.70 (70%)
High       0.10 (10%)           0.90 (90%)
Standard   0.02 (2%)            0.98 (98%)
Baseline   0.001 (0.1%)         0.999 (99.9%)

Interpretation:

  • Critical: 1 in 3 events is an anomaly (high prior)
  • High: 1 in 10 events is an anomaly
  • Standard: 1 in 50 events is an anomaly
  • Baseline: 1 in 1000 events is an anomaly (very low prior)

Visual Example:

Prior Probabilities by Priority Tier

Prior probabilities reflect our baseline expectations for each priority tier before examining individual scores.


Priority Tiers as Cost Weights 💰

Misclassification Costs

Not all errors are equal! We define:

  • C₀₁ (False Positive Cost): Cost of flagging a normal entity
    • Wasted resources, unnecessary investigation, user friction
  • C₁₀ (False Negative Cost): Cost of missing an anomaly
    • Missed opportunity, undetected issue, downstream impact

Cost Matrix

                    Predicted
                 Negative  Positive
Actual  Negative    0        C₀₁
        Positive   C₁₀        0

Different Costs by Priority Tier

Example:

Tier       C₁₀ (Miss Anomaly)    C₀₁ (False Alarm)    Ratio C₁₀/C₀₁
───────────────────────────────────────────────────────────────────
Critical   $10,000               $50                  200:1
High       $5,000                $100                 50:1
Standard   $1,000                $200                 5:1
Baseline   $500                  $500                 1:1

Interpretation:

  • Critical: Missing anomaly is 200× worse than false alarm → Be aggressive!
  • High: Missing anomaly is 50× worse → Still favor catching anomalies
  • Standard: Missing anomaly is 5× worse → More balanced
  • Baseline: Costs are equal → Standard threshold

Visual Example:

Cost Weights by Priority Tier

Cost-Sensitive Thresholding ⚖️

The Standard Threshold

Without cost considerations, we typically use τ = 0.5:

  • If P(Anomaly|x) ≥ 0.5, predict Positive
  • If P(Anomaly|x) < 0.5, predict Negative

The Bayes Optimal Threshold

With unequal costs and priors, the optimal threshold shifts!

Bayes Decision Rule:

Predict Positive (Anomaly) if:

P(Anomaly|x) · C₁₀ > P(Normal|x) · C₀₁

Rearranging:

P(Anomaly|x) / P(Normal|x) > C₀₁ / C₁₀

Using odds notation:

Odds(Anomaly|x) > C₀₁ / C₁₀

Deriving the Threshold

Optimal Threshold Formula:

τ* = C₀₁ / (C₀₁ + C₁₀)

Or equivalently, incorporating priors:

τ* = 1 / (1 + (C₁₀/C₀₁) · (π₁/π₀))

Alternative form (likelihood ratio threshold):

τ_LR = (C₀₁/C₁₀) · (π₀/π₁)

Visual Example:

Bayes Optimal Threshold

The Bayes optimal threshold balances the costs of different types of errors, shifting based on cost ratios and prior probabilities.


Per-Tier Thresholds: Changing Labeling Geometry 📐

Computing Tier-Specific Thresholds

Using the formula τ* = C₀₁ / (C₀₁ + C₁₀):

Tier       C₁₀      C₀₁     τ* = C₀₁/(C₀₁+C₁₀)
─────────────────────────────────────────────────
Critical   $10,000  $50     50/(50+10000) = 0.005
High       $5,000   $100    100/(100+5000) = 0.020
Standard   $1,000   $200    200/(200+1000) = 0.167
Baseline   $500     $500    500/(500+500) = 0.500

Interpretation:

  • Critical (τ = 0.005):* Flag anything with >0.5% anomaly probability
  • High (τ = 0.020):* Flag anything with >2% anomaly probability
  • Standard (τ = 0.167):* Flag anything with >16.7% anomaly probability
  • Baseline (τ = 0.500):* Standard 50% threshold

Impact on Labeling

Lower threshold → More events flagged (higher recall, lower precision) Higher threshold → Fewer events flagged (higher precision, lower recall)

Visual Example:

Per-Tier Thresholds

Visualizing: PR Curves at Different Costs 📈

What Changes with Cost?

The same model produces different operating points when costs change.

On a Precision-Recall curve:

  • Each threshold corresponds to a (Precision, Recall) point
  • Different costs select different optimal points
  • Higher C₁₀/C₀₁ ratio → Move toward higher recall

Three Operating Points

Example with three cost scenarios:

Scenario         C₁₀/C₀₁    Optimal τ    Precision    Recall
─────────────────────────────────────────────────────────────
Conservative     1:1        0.50         85%          40%
Balanced         10:1       0.09         60%          75%
Aggressive       100:1      0.01         30%          95%

Visual Example:

PR Curve with Different Cost Points

Iso-Cost Lines: Visualizing Cost Trade-offs 📊

What are Iso-Cost Lines?

Iso-cost lines connect points with equal expected cost on the ROC or PR space.

Formula (ROC space):

Expected Cost = C₀₁ · π₀ · FPR + C₁₀ · π₁ · FNR
              = C₀₁ · π₀ · FPR + C₁₀ · π₁ · (1 - TPR)

For constant cost C:

TPR = 1 - (C - C₀₁ · π₀ · FPR) / (C₁₀ · π₁)

This is a linear equation with slope:

slope = (C₀₁ · π₀) / (C₁₀ · π₁)

Reading Iso-Cost Lines

Properties:

  • Lines closer to top-left corner = lower cost
  • Slope depends on cost ratio and priors
  • Optimal point = where ROC curve is tangent to iso-cost line

Different slopes for different tiers:

Tier       Slope = (C₀₁·π₀)/(C₁₀·π₁)
─────────────────────────────────────
Critical   (50·0.70)/(10000·0.30) = 0.012
High       (100·0.90)/(5000·0.10) = 0.18
Standard   (200·0.98)/(1000·0.02) = 9.8
Baseline   (500·0.999)/(500·0.001) = 999

Visual Example:

Iso-Cost Lines on ROC Space

Iso-cost lines help visualize how different cost structures lead to different optimal operating points on the ROC curve.


Baseline Tier Handling: Special Case 🔒

What Makes Baseline Different?

Baseline tier entities often receive special treatment:

  1. Very low prior: π₁ ≈ 0.001 or less
  2. Equal or inverted costs: C₀₁ ≥ C₁₀ (false positives hurt more)
  3. Business rules: May bypass scoring entirely

Baseline Handling Strategies

Strategy 1: High Threshold

τ_baseline = 0.90  (Only flag very high scores)

Strategy 2: Bypass

if tier == "Baseline":
    return "Auto-Approve"  # Skip scoring

Strategy 3: Score but Don't Flag

# Score for monitoring, but don't take action
if tier == "Baseline":
    tag = "Monitor-Only"

Impact on Labeling Geometry

When Baseline uses different rules:

  • The decision boundary shifts dramatically
  • Creates discontinuity in the overall labeling
  • Requires careful tracking of tier-specific metrics

Visual Example:

Baseline Tier Handling Strategies

Real-World Application: Per-Tier Threshold Configuration 🔧

Configuration Structure

Show code (31 lines)
tier_config = {
    'Critical': {
        'prior': 0.30,
        'cost_fn': 10000,  # C₁₀: cost of missing anomaly
        'cost_fp': 50,     # C₀₁: cost of false positive
        'threshold': 0.005,
        'action': 'flag_for_review'
    },
    'High': {
        'prior': 0.10,
        'cost_fn': 5000,
        'cost_fp': 100,
        'threshold': 0.020,
        'action': 'flag_for_review'
    },
    'Standard': {
        'prior': 0.02,
        'cost_fn': 1000,
        'cost_fp': 200,
        'threshold': 0.167,
        'action': 'flag_if_above'
    },
    'Baseline': {
        'prior': 0.001,
        'cost_fn': 500,
        'cost_fp': 500,
        'threshold': 0.500,
        'action': 'auto_approve'
    }
}

Applying Per-Tier Thresholds

Show code (23 lines)
def apply_tier_threshold(event, score, tier_config):
    """
    Apply tier-specific threshold to determine action.
    
    Parameters:
    - event: Event dictionary with 'priority_tier'
    - score: Model score (probability)
    - tier_config: Configuration dictionary
    
    Returns:
    - action: 'flag', 'approve', or 'monitor'
    """
    tier = event['priority_tier']
    config = tier_config[tier]
    threshold = config['threshold']
    
    if config['action'] == 'auto_approve':
        return 'approve'
    elif score >= threshold:
        return 'flag'
    else:
        return 'approve'

Exercise: Deriving the Decision Threshold 🎓

The Problem

Given:

  • Two classes with Gaussian score distributions
  • Class 0 (Normal): N(μ₀, σ²)
  • Class 1 (Anomaly): N(μ₁, σ²) where μ₁ > μ₀
  • Priors: π₀, π₁
  • Costs: C₀₁ (false positive), C₁₀ (false negative)

Derive: The optimal decision threshold τ*

Solution

Step 1: Bayes Decision Rule

Predict Class 1 (Anomaly) if:

π₁ · P(x|Class 1) · C₁₀ > π₀ · P(x|Class 0) · C₀₁

Step 2: Likelihood Ratio

Rearranging:

P(x|Class 1) / P(x|Class 0) > (C₀₁/C₁₀) · (π₀/π₁)

Let λ = (C₀₁/C₁₀) · (π₀/π₁) be the threshold likelihood ratio.

Step 3: Gaussian Likelihood Ratio

For Gaussian distributions with equal variance:

P(x|Class 1) / P(x|Class 0) = exp[-(x-μ₁)²/2σ² + (x-μ₀)²/2σ²]
                            = exp[(μ₁-μ₀)(2x-μ₁-μ₀) / 2σ²]

Step 4: Solve for Threshold

Setting likelihood ratio = λ:

(μ₁-μ₀)(2x-μ₁-μ₀) / 2σ² = ln(λ)

Solving for x:

τ* = (μ₁ + μ₀)/2 + σ² · ln(λ) / (μ₁ - μ₀)

Step 5: Simplify

Substituting λ = (C₀₁/C₁₀) · (π₀/π₁):

τ* = (μ₁ + μ₀)/2 + σ² · ln[(C₀₁/C₁₀) · (π₀/π₁)] / (μ₁ - μ₀)

Interpretation

The optimal threshold has two components:

  1. Midpoint: (μ₁ + μ₀)/2 — where distributions cross
  2. Shift: Adjustment based on costs and priors

When C₁₀ > C₀₁ (missing anomaly is worse):

  • λ < 1, ln(λ) < 0
  • Threshold shifts left (lower)
  • More events flagged

When π₁ > π₀ (anomalies are common):

  • λ smaller
  • Threshold shifts left
  • More events flagged

Visual Example:

Gaussian Score Distributions and Optimal Threshold

Best Practices for Tier-Based Thresholding ✅

1. Estimate Priors Carefully

  • Use historical data to estimate tier-specific anomaly rates
  • Update priors regularly as patterns change
  • Consider seasonal variations

2. Quantify Costs Explicitly

  • Work with business stakeholders to define costs
  • Include both direct and indirect costs
  • Document assumptions

3. Validate Thresholds Empirically

  • Backtest on historical data
  • Monitor performance by tier
  • Adjust based on observed outcomes

4. Handle Edge Cases

  • Define behavior for missing tiers
  • Plan for new/unknown priority categories
  • Set sensible defaults

5. Document Everything

  • Record threshold derivations
  • Track cost assumptions
  • Log threshold changes over time

6. Monitor Tier Drift

  • Watch for changes in tier composition
  • Alert on significant prior shifts
  • Recalibrate periodically

Summary Table 📋

Concept Formula Description
Prior Probability π₁ = P(Anomaly) Base rate of positive class
Cost Ratio C₁₀ / C₀₁ How much worse is missing anomaly vs false alarm
Bayes Threshold τ* = C₀₁ / (C₀₁ + C₁₀) Cost-optimal decision boundary
Likelihood Ratio λ = (C₀₁/C₁₀) · (π₀/π₁) Threshold in likelihood space
Iso-Cost Slope (C₀₁ · π₀) / (C₁₀ · π₁) Slope of equal-cost lines in ROC
Gaussian Threshold (μ₁+μ₀)/2 + adjustment Optimal threshold for Gaussian scores

Final Thoughts 🌟

Risk segmentation transforms a one-size-fits-all approach into a nuanced, cost-aware decision system. By treating priority tiers as carriers of prior beliefs and cost weights, we can:

  • Optimize decisions per tier
  • Balance trade-offs explicitly
  • Allocate resources efficiently
  • Communicate rationale clearly

Key Takeaways:

Priority tiers encode priors — different base rates for different tiers ✅ Costs drive thresholds — higher stakes → more aggressive flagging ✅ Bayes optimal threshold — τ* = C₀₁ / (C₀₁ + C₁₀) ✅ Iso-cost lines — visualize cost trade-offs in ROC/PR space ✅ Baseline handling — special rules for trusted tiers ✅ Gaussian derivation — τ* = midpoint + cost/prior adjustment

Match your thresholds to your costs! ⚖️🎯

Tomorrow's Preview: Day 25 - Configuration Pairing Logic and Equivalence Classes 🔗🎯


Sughosh P Dixit
Sughosh P Dixit
Data Scientist & Tech Writer
11 min read
Previous Post

Day 23: Label Post-Processing: Partitioning Flagged vs Passed Mathematically

Learn to view event tagging as rule-based classification. Understand indicator functions, piecewise partitions, and priority-level conditioning—essential tools for mathematically partitioning events into Flagged and Passed categories.

Next Post

Day 25: Configuration Pairing Logic and Equivalence Classes

Learn to pair semantically complementary configuration sets like Premium/Standard and Verified/Unverified. Understand equivalence relations, pairing consistency, and how mapping functions ensure aligned parameters across pairs.