Sughosh Dixit
Sughosh P Dixit
2025-11-2311 min read

Day 23: Label Post-Processing: Partitioning Flagged vs Passed Mathematically

Article Header Image

TL;DR

Quick summary

Learn to view event tagging as rule-based classification. Understand indicator functions, piecewise partitions, and priority-level conditioning—essential tools for mathematically partitioning events into Flagged and Passed categories.

Key takeaways
  • Day 23: Label Post-Processing: Partitioning Flagged vs Passed Mathematically
Preview

Day 23: Label Post-Processing: Partitioning Flagged vs Passed Mathematically

Learn to view event tagging as rule-based classification. Understand indicator functions, piecewise partitions, and priority-level conditioning—essential tools for mathematically partitioning events into Flagged and Passed categories.

Day 23: Label Post-Processing: Partitioning Flagged vs Passed Mathematically 🏷️📊

See event tagging as rule-based classification. Learn to mathematically partition events using indicator functions and piecewise rules.

Label post-processing transforms raw predictions into actionable categories through mathematical rules and threshold-based partitioning.

When working with scores and event classifications, we often need to partition data into distinct categories. Understanding the mathematical foundations of these partitions helps us reason about rule behavior and ensure consistency.

💡 Note: This article uses technical terms and abbreviations. For definitions, check out the Key Terms & Glossary page.


The Problem: Event Tagging as Classification 🎯

Scenario: You have a scoring system that produces continuous scores (0-100). You need to decide:

  • Which events get Flagged → Require human review
  • Which events get Passed → Auto-approved or processed

Challenge: How do you mathematically define and reason about these partitions?

Example:

Raw scores: [23, 67, 45, 89, 12, 55, 78, 34]
Threshold: 50

Flagged (score ≥ 50): [67, 89, 55, 78] → 4 events (50%)
Passed (score < 50):  [23, 45, 12, 34] → 4 events (50%)

The question: How do we formalize this mathematically? 🤔


Indicator Functions: The Building Blocks 🧱

What is an Indicator Function?

An indicator function (also called a characteristic function) is a function that returns 1 if a condition is true, and 0 otherwise.

Notation:

𝟙{condition} = {
    1, if condition is true
    0, if condition is false
}

Also written as:

  • I(condition)
  • 1_A (indicator of set A)
  • [condition] (Iverson bracket)

Examples of Indicator Functions

Example 1: Simple Threshold

𝟙{x ≥ 50} = {
    1, if x ≥ 50
    0, if x < 50
}

For x = 67: 𝟙{67 ≥ 50} = 1 ✓
For x = 23: 𝟙{23 ≥ 50} = 0 ✗

Example 2: Range Check

𝟙{30 ≤ x < 70} = {
    1, if 30 ≤ x < 70
    0, otherwise
}

For x = 45: 𝟙{30 ≤ 45 < 70} = 1 ✓
For x = 80: 𝟙{30 ≤ 80 < 70} = 0 ✗

Visual Example:

Indicator Function Visualization

Indicator functions transform continuous values into binary decisions, forming the foundation of threshold-based classification.


Indicator Functions for Inequalities 📐

Common Inequality Patterns

Greater Than or Equal:

𝟙{x ≥ t} = {
    1, if x ≥ t
    0, if x < t
}

Less Than:

𝟙{x < t} = {
    1, if x < t
    0, if x ≥ t
}

Complement Property:

𝟙{x < t} = 1 - 𝟙{x ≥ t}

Double-Sided (Range):

𝟙{a ≤ x < b} = 𝟙{x ≥ a} · 𝟙{x < b}

Flagged vs Passed with Indicator Functions

Definition:

Flagged(x) = 𝟙{x ≥ threshold}
Passed(x) = 𝟙{x < threshold} = 1 - Flagged(x)

Key Property: Partition

Flagged(x) + Passed(x) = 1  (for all x)

This means every event is either Flagged or Passed—never both, never neither!

Visual Example:

Flagged vs Passed Indicator Functions

Piecewise Partitions: Multiple Categories 📊

What is a Piecewise Partition?

A piecewise partition divides the domain into multiple non-overlapping regions, each handled by a different rule.

Example: Priority Levels

Priority Level(x) = {
    "Low",      if x < 30
    "Medium",   if 30 ≤ x < 70
    "High",     if x ≥ 70
}

Mathematical Representation

Using indicator functions:

Low(x)    = 𝟙{x < 30}
Medium(x) = 𝟙{30 ≤ x < 70}
High(x)   = 𝟙{x ≥ 70}

Partition Property:

Low(x) + Medium(x) + High(x) = 1  (for all x)

General Form

For thresholds t₁ < t₂ < ... < tₙ:

Region₀(x) = 𝟙{x < t₁}
Region₁(x) = 𝟙{t₁ ≤ x < t₂}
Region₂(x) = 𝟙{t₂ ≤ x < t₃}
...
Regionₙ(x) = 𝟙{x ≥ tₙ}

Visual Example:

Piecewise Partition

Piecewise partitions divide the score space into distinct regions, each with its own handling logic.


Priority-Level Conditioning 🎲

What is Priority-Level Conditioning?

Priority-level conditioning means applying different rules based on the priority level of an event.

Example:

Flag_rule(x, priority) = {
    𝟙{x ≥ 30},  if priority = "Low"     (lower threshold for low priority)
    𝟙{x ≥ 50},  if priority = "Medium"  (standard threshold)
    𝟙{x ≥ 70},  if priority = "High"    (higher threshold for high priority)
}

Mathematical Formulation

Let P(x) be the priority level function. The conditioned Flag rule is:

Flagged(x) = 𝟙{P(x) = "Low"} · 𝟙{x ≥ 30}
           + 𝟙{P(x) = "Medium"} · 𝟙{x ≥ 50}
           + 𝟙{P(x) = "High"} · 𝟙{x ≥ 70}

Interpretation:

  • Low-priority events: Flagged if score ≥ 30
  • Medium-priority events: Flagged if score ≥ 50
  • High-priority events: Flagged if score ≥ 70

Why Different Thresholds?

Rationale:

  • Low-priority segments: More conservative, flag more events
  • High-priority segments: More aggressive, focus on highest scores
  • Resource allocation: Direct review resources where they matter most

Visual Example:

Priority-Level Conditioning

Real-World Application: partition_events_by_thresholds 🔧

Purpose

The partition_events_by_thresholds function applies tuned rules and updates tags for each event.

Use Cases:

  • Classify events into Flagged/Passed categories
  • Apply priority-level specific thresholds
  • Update event labels based on post-processing rules
  • Track before/after tag distributions

Function Concept

Show code (26 lines)
def partition_events_by_thresholds(events, thresholds, priority_levels):
    """
    Partition events into Flagged/Passed based on thresholds and priority levels.
    
    Parameters:
    - events: List of event dictionaries with 'score' and 'priority_level'
    - thresholds: Dict mapping priority levels to thresholds
    - priority_levels: List of priority levels to consider
    
    Returns:
    - events: Updated with 'tag' field (Flagged or Passed)
    - stats: Before/after class proportions per priority level
    """
    for event in events:
        priority = event['priority_level']
        score = event['score']
        threshold = thresholds.get(priority, 50)  # Default threshold
        
        # Apply indicator function
        if score >= threshold:
            event['tag'] = 'Flagged'
        else:
            event['tag'] = 'Passed'
    
    return events, compute_stats(events, priority_levels)

Example Application

Input:

Show code (14 lines)
events = [
    {'id': 1, 'score': 45, 'priority_level': 'Low'},
    {'id': 2, 'score': 55, 'priority_level': 'Medium'},
    {'id': 3, 'score': 65, 'priority_level': 'High'},
    {'id': 4, 'score': 35, 'priority_level': 'Low'},
    {'id': 5, 'score': 75, 'priority_level': 'High'},
]

thresholds = {
    'Low': 30,
    'Medium': 50,
    'High': 70
}

Output:

Show code (10 lines)
# Event 1: score=45, priority=Low, threshold=30 → 45 ≥ 30 → Flagged
# Event 2: score=55, priority=Medium, threshold=50 → 55 ≥ 50 → Flagged
# Event 3: score=65, priority=High, threshold=70 → 65 < 70 → Passed
# Event 4: score=35, priority=Low, threshold=30 → 35 ≥ 30 → Flagged
# Event 5: score=75, priority=High, threshold=70 → 75 ≥ 70 → Flagged

# Results:
# Flagged: [Event 1, Event 2, Event 4, Event 5] → 4 events (80%)
# Passed: [Event 3] → 1 event (20%)

Visual Example:

Partition Function Application

Before/After Class Proportions 📈

Visualizing the Impact

When you apply post-processing rules, it's crucial to visualize the before/after class proportions.

Example Scenario:

Before (Uniform Threshold = 50):

Priority Level  Total  Flagged  Passed  Flagged%
─────────────────────────────────────────────────
Low             1000     400     600      40%
Medium          2000     900    1100      45%
High            1500     800     700      53%
─────────────────────────────────────────────────
Total           4500    2100    2400      47%

After (Priority-Conditioned Thresholds):

Show code (10 lines)
Thresholds: Low=30, Medium=50, High=70

Priority Level  Total  Flagged  Passed  Flagged%
─────────────────────────────────────────────────
Low             1000     700     300      70%  ↑ More review
Medium          2000     900    1100      45%  = Same
High            1500     500    1000      33%  ↓ Less review
─────────────────────────────────────────────────
Total           4500    2100    2400      47%  = Same total!

Key Insight: Priority-level conditioning redistributes review load without changing total volume!

Visual Example:

Before/After Class Proportions

Comparing before/after distributions helps you understand the impact of rule changes on each priority segment.


Exercise: Monotonicity of Conjunctive Clauses 🎓

The Problem

Theorem: Adding a conjunctive clause can only shrink the Flagged set.

Claim: If we define:

Flag₁(x) = 𝟙{x ≥ t₁}
Flag₂(x) = 𝟙{x ≥ t₁} · 𝟙{x ≥ t₂}  (where t₂ > t₁)

Then: Flag₂(x) ≤ Flag₁(x) for all x.

Prove this monotonicity property.

Solution

Step 1: Analyze Flag₁

Flag₁(x) = 𝟙{x ≥ t₁} = {
    1, if x ≥ t₁
    0, if x < t₁
}

Step 2: Analyze Flag₂

Flag₂(x) = 𝟙{x ≥ t₁} · 𝟙{x ≥ t₂}

Since t₂ > t₁, the condition x ≥ t₂ implies x ≥ t₁.

Therefore:

Flag₂(x) = {
    1, if x ≥ t₂ (which implies x ≥ t₁)
    0, otherwise
}

Step 3: Compare

Case analysis for all x:

| Case | x < t₁ | t₁ ≤ x < t₂ | x ≥ t₂ | |------|--------|-------------|--------| | Flag₁ | 0 | 1 | 1 | | Flag₂ | 0 | 0 | 1 | | Flag₂ ≤ Flag₁? | ✓ (0 ≤ 0) | ✓ (0 ≤ 1) | ✓ (1 ≤ 1) |

Step 4: Conclusion

For all x: Flag₂(x) ≤ Flag₁(x)

Interpretation: Adding the extra condition x ≥ t₂ (where t₂ > t₁) can only make it harder to be Flagged, never easier. This is the monotonicity property of conjunctive clauses.

General Principle

Monotonicity of AND:

If C = A ∧ B, then:
- C can only be true when BOTH A and B are true
- C ⊆ A and C ⊆ B
- |C| ≤ min(|A|, |B|)

Practical Implication:

  • Adding more conditions to a rule can only reduce the set of matching events
  • Never increase it
  • This is useful for understanding rule refinement

Visual Example:

Monotonicity Proof

Practical Applications 🔧

1. Rule Refinement

Scenario: Current Flagged rule catches too many false positives.

Solution: Add conjunctive clauses to narrow down:

Before: Flagged = 𝟙{score ≥ 50}
After:  Flagged = 𝟙{score ≥ 50} · 𝟙{amount ≥ 1000}

Guaranteed: New Flagged ⊆ Old Flagged (monotonicity)

2. A/B Testing Rules

Scenario: Testing a new threshold.

Approach:

  1. Define Flag_control and Flag_treatment
  2. Use indicator functions to track differences
  3. Measure |Flag_control ⊕ Flag_treatment| (symmetric difference)

3. Threshold Calibration

Scenario: Adjusting thresholds per priority level.

Method:

  1. Start with uniform threshold
  2. Apply priority-level conditioning
  3. Monitor before/after proportions
  4. Iterate until balanced

Best Practices for Label Post-Processing ✅

1. Document Your Rules Mathematically

Write rules as indicator functions to:

  • Ensure clarity
  • Enable formal reasoning
  • Detect logical errors

2. Verify Partition Properties

Always check that:

∑ Category_i(x) = 1 for all x

Every event belongs to exactly one category.

3. Track Before/After Proportions

Monitor class distributions:

  • Per priority level
  • Per segment
  • Over time

4. Understand Monotonicity

When adding conditions:

  • Conjunctive (AND): Shrinks the set
  • Disjunctive (OR): Expands the set

5. Test Edge Cases

Check behavior at:

  • Exact threshold values
  • Boundary conditions
  • Missing data scenarios

6. Version Control Your Rules

Track rule changes:

  • What changed
  • When it changed
  • Why it changed
  • Impact on proportions

Summary Table 📋

Concept Notation Description Use Case
Indicator Function 𝟙{condition} Returns 1 if true, 0 if false Binary classification
Flagged Indicator 𝟙{x ≥ t} Above threshold check Review flagging
Passed Indicator 1 - 𝟙{x ≥ t} Below threshold (complement) Auto-processing
Piecewise Partition ∑ Region_i = 1 Non-overlapping regions Multi-category classification
Priority Conditioning Flagged(x, priority) Different thresholds per priority Segment-specific rules
Monotonicity A ∧ B ⊆ A AND shrinks sets Rule refinement

Final Thoughts 🌟

Label post-processing is fundamentally about applying mathematical rules to partition events into categories. By formalizing these rules with indicator functions, we gain:

  • Clarity: Precise definitions that eliminate ambiguity
  • Reasoning: Ability to prove properties like monotonicity
  • Consistency: Guaranteed partitions where every event has exactly one category
  • Insight: Before/after comparisons that reveal rule impact

Key Takeaways:

Indicator functions are the building blocks of rule-based classification ✅ Piecewise partitions divide the score space into non-overlapping regions ✅ Priority-level conditioning allows segment-specific thresholds ✅ partition_events_by_thresholds applies tuned rules and updates tags ✅ Before/after proportions reveal the impact of rule changes ✅ Conjunctive clauses can only shrink the Flagged set (monotonicity)

Think mathematically about your rules! 🧮🎯

Tomorrow's Preview: Day 24 - Risk Segmentation: Priority Tiers as Priors and Costs ⚖️📊


Sughosh P Dixit
Sughosh P Dixit
Data Scientist & Tech Writer
11 min read
Previous Post

Day 22: Set Theory and Venn Diagrams for Comparisons

Learn to measure overlap between sets using set theory fundamentals. Understand cardinalities, intersections, unions, and the Jaccard index—essential tools for comparing versions, thresholds, and events captured.

Next Post

Day 24: Risk Segmentation - Priority Tiers as Priors and Costs

Learn to interpret priority tiers as prior beliefs or cost weights. Understand cost-sensitive thresholding, Bayes optimal decision rules, and how per-tier thresholds change labeling geometry through iso-cost analysis.