Day 23: Label Post-Processing: Partitioning Flagged vs Passed Mathematically 🏷️📊
See event tagging as rule-based classification. Learn to mathematically partition events using indicator functions and piecewise rules.
Label post-processing transforms raw predictions into actionable categories through mathematical rules and threshold-based partitioning.
When working with scores and event classifications, we often need to partition data into distinct categories. Understanding the mathematical foundations of these partitions helps us reason about rule behavior and ensure consistency.
💡 Note: This article uses technical terms and abbreviations. For definitions, check out the Key Terms & Glossary page.
The Problem: Event Tagging as Classification 🎯
Scenario: You have a scoring system that produces continuous scores (0-100). You need to decide:
- Which events get Flagged → Require human review
- Which events get Passed → Auto-approved or processed
Challenge: How do you mathematically define and reason about these partitions?
Example:
Raw scores: [23, 67, 45, 89, 12, 55, 78, 34]
Threshold: 50
Flagged (score ≥ 50): [67, 89, 55, 78] → 4 events (50%)
Passed (score < 50): [23, 45, 12, 34] → 4 events (50%)
The question: How do we formalize this mathematically? 🤔
Indicator Functions: The Building Blocks 🧱
What is an Indicator Function?
An indicator function (also called a characteristic function) is a function that returns 1 if a condition is true, and 0 otherwise.
Notation:
𝟙{condition} = {
1, if condition is true
0, if condition is false
}
Also written as:
I(condition)1_A(indicator of set A)[condition](Iverson bracket)
Examples of Indicator Functions
Example 1: Simple Threshold
𝟙{x ≥ 50} = {
1, if x ≥ 50
0, if x < 50
}
For x = 67: 𝟙{67 ≥ 50} = 1 ✓
For x = 23: 𝟙{23 ≥ 50} = 0 ✗
Example 2: Range Check
𝟙{30 ≤ x < 70} = {
1, if 30 ≤ x < 70
0, otherwise
}
For x = 45: 𝟙{30 ≤ 45 < 70} = 1 ✓
For x = 80: 𝟙{30 ≤ 80 < 70} = 0 ✗
Visual Example:
Indicator functions transform continuous values into binary decisions, forming the foundation of threshold-based classification.
Indicator Functions for Inequalities 📐
Common Inequality Patterns
Greater Than or Equal:
𝟙{x ≥ t} = {
1, if x ≥ t
0, if x < t
}
Less Than:
𝟙{x < t} = {
1, if x < t
0, if x ≥ t
}
Complement Property:
𝟙{x < t} = 1 - 𝟙{x ≥ t}
Double-Sided (Range):
𝟙{a ≤ x < b} = 𝟙{x ≥ a} · 𝟙{x < b}
Flagged vs Passed with Indicator Functions
Definition:
Flagged(x) = 𝟙{x ≥ threshold}
Passed(x) = 𝟙{x < threshold} = 1 - Flagged(x)
Key Property: Partition
Flagged(x) + Passed(x) = 1 (for all x)
This means every event is either Flagged or Passed—never both, never neither!
Visual Example:
Piecewise Partitions: Multiple Categories 📊
What is a Piecewise Partition?
A piecewise partition divides the domain into multiple non-overlapping regions, each handled by a different rule.
Example: Priority Levels
Priority Level(x) = {
"Low", if x < 30
"Medium", if 30 ≤ x < 70
"High", if x ≥ 70
}
Mathematical Representation
Using indicator functions:
Low(x) = 𝟙{x < 30}
Medium(x) = 𝟙{30 ≤ x < 70}
High(x) = 𝟙{x ≥ 70}
Partition Property:
Low(x) + Medium(x) + High(x) = 1 (for all x)
General Form
For thresholds t₁ < t₂ < ... < tₙ:
Region₀(x) = 𝟙{x < t₁}
Region₁(x) = 𝟙{t₁ ≤ x < t₂}
Region₂(x) = 𝟙{t₂ ≤ x < t₃}
...
Regionₙ(x) = 𝟙{x ≥ tₙ}
Visual Example:
Piecewise partitions divide the score space into distinct regions, each with its own handling logic.
Priority-Level Conditioning 🎲
What is Priority-Level Conditioning?
Priority-level conditioning means applying different rules based on the priority level of an event.
Example:
Flag_rule(x, priority) = {
𝟙{x ≥ 30}, if priority = "Low" (lower threshold for low priority)
𝟙{x ≥ 50}, if priority = "Medium" (standard threshold)
𝟙{x ≥ 70}, if priority = "High" (higher threshold for high priority)
}
Mathematical Formulation
Let P(x) be the priority level function. The conditioned Flag rule is:
Flagged(x) = 𝟙{P(x) = "Low"} · 𝟙{x ≥ 30}
+ 𝟙{P(x) = "Medium"} · 𝟙{x ≥ 50}
+ 𝟙{P(x) = "High"} · 𝟙{x ≥ 70}
Interpretation:
- Low-priority events: Flagged if score ≥ 30
- Medium-priority events: Flagged if score ≥ 50
- High-priority events: Flagged if score ≥ 70
Why Different Thresholds?
Rationale:
- Low-priority segments: More conservative, flag more events
- High-priority segments: More aggressive, focus on highest scores
- Resource allocation: Direct review resources where they matter most
Visual Example:
Real-World Application: partition_events_by_thresholds 🔧
Purpose
The partition_events_by_thresholds function applies tuned rules and updates tags for each event.
Use Cases:
- Classify events into Flagged/Passed categories
- Apply priority-level specific thresholds
- Update event labels based on post-processing rules
- Track before/after tag distributions
Function Concept
Show code (26 lines)
def partition_events_by_thresholds(events, thresholds, priority_levels):
"""
Partition events into Flagged/Passed based on thresholds and priority levels.
Parameters:
- events: List of event dictionaries with 'score' and 'priority_level'
- thresholds: Dict mapping priority levels to thresholds
- priority_levels: List of priority levels to consider
Returns:
- events: Updated with 'tag' field (Flagged or Passed)
- stats: Before/after class proportions per priority level
"""
for event in events:
priority = event['priority_level']
score = event['score']
threshold = thresholds.get(priority, 50) # Default threshold
# Apply indicator function
if score >= threshold:
event['tag'] = 'Flagged'
else:
event['tag'] = 'Passed'
return events, compute_stats(events, priority_levels)
Example Application
Input:
Show code (14 lines)
events = [
{'id': 1, 'score': 45, 'priority_level': 'Low'},
{'id': 2, 'score': 55, 'priority_level': 'Medium'},
{'id': 3, 'score': 65, 'priority_level': 'High'},
{'id': 4, 'score': 35, 'priority_level': 'Low'},
{'id': 5, 'score': 75, 'priority_level': 'High'},
]
thresholds = {
'Low': 30,
'Medium': 50,
'High': 70
}
Output:
Show code (10 lines)
# Event 1: score=45, priority=Low, threshold=30 → 45 ≥ 30 → Flagged
# Event 2: score=55, priority=Medium, threshold=50 → 55 ≥ 50 → Flagged
# Event 3: score=65, priority=High, threshold=70 → 65 < 70 → Passed
# Event 4: score=35, priority=Low, threshold=30 → 35 ≥ 30 → Flagged
# Event 5: score=75, priority=High, threshold=70 → 75 ≥ 70 → Flagged
# Results:
# Flagged: [Event 1, Event 2, Event 4, Event 5] → 4 events (80%)
# Passed: [Event 3] → 1 event (20%)
Visual Example:
Before/After Class Proportions 📈
Visualizing the Impact
When you apply post-processing rules, it's crucial to visualize the before/after class proportions.
Example Scenario:
Before (Uniform Threshold = 50):
Priority Level Total Flagged Passed Flagged%
─────────────────────────────────────────────────
Low 1000 400 600 40%
Medium 2000 900 1100 45%
High 1500 800 700 53%
─────────────────────────────────────────────────
Total 4500 2100 2400 47%
After (Priority-Conditioned Thresholds):
Show code (10 lines)
Thresholds: Low=30, Medium=50, High=70
Priority Level Total Flagged Passed Flagged%
─────────────────────────────────────────────────
Low 1000 700 300 70% ↑ More review
Medium 2000 900 1100 45% = Same
High 1500 500 1000 33% ↓ Less review
─────────────────────────────────────────────────
Total 4500 2100 2400 47% = Same total!
Key Insight: Priority-level conditioning redistributes review load without changing total volume!
Visual Example:
Comparing before/after distributions helps you understand the impact of rule changes on each priority segment.
Exercise: Monotonicity of Conjunctive Clauses 🎓
The Problem
Theorem: Adding a conjunctive clause can only shrink the Flagged set.
Claim: If we define:
Flag₁(x) = 𝟙{x ≥ t₁}
Flag₂(x) = 𝟙{x ≥ t₁} · 𝟙{x ≥ t₂} (where t₂ > t₁)
Then: Flag₂(x) ≤ Flag₁(x) for all x.
Prove this monotonicity property.
Solution
Step 1: Analyze Flag₁
Flag₁(x) = 𝟙{x ≥ t₁} = {
1, if x ≥ t₁
0, if x < t₁
}
Step 2: Analyze Flag₂
Flag₂(x) = 𝟙{x ≥ t₁} · 𝟙{x ≥ t₂}
Since t₂ > t₁, the condition x ≥ t₂ implies x ≥ t₁.
Therefore:
Flag₂(x) = {
1, if x ≥ t₂ (which implies x ≥ t₁)
0, otherwise
}
Step 3: Compare
Case analysis for all x:
| Case | x < t₁ | t₁ ≤ x < t₂ | x ≥ t₂ | |------|--------|-------------|--------| | Flag₁ | 0 | 1 | 1 | | Flag₂ | 0 | 0 | 1 | | Flag₂ ≤ Flag₁? | ✓ (0 ≤ 0) | ✓ (0 ≤ 1) | ✓ (1 ≤ 1) |
Step 4: Conclusion
For all x: Flag₂(x) ≤ Flag₁(x) ✓
Interpretation: Adding the extra condition x ≥ t₂ (where t₂ > t₁) can only make it harder to be Flagged, never easier. This is the monotonicity property of conjunctive clauses.
General Principle
Monotonicity of AND:
If C = A ∧ B, then:
- C can only be true when BOTH A and B are true
- C ⊆ A and C ⊆ B
- |C| ≤ min(|A|, |B|)
Practical Implication:
- Adding more conditions to a rule can only reduce the set of matching events
- Never increase it
- This is useful for understanding rule refinement
Visual Example:
Practical Applications 🔧
1. Rule Refinement
Scenario: Current Flagged rule catches too many false positives.
Solution: Add conjunctive clauses to narrow down:
Before: Flagged = 𝟙{score ≥ 50}
After: Flagged = 𝟙{score ≥ 50} · 𝟙{amount ≥ 1000}
Guaranteed: New Flagged ⊆ Old Flagged (monotonicity)
2. A/B Testing Rules
Scenario: Testing a new threshold.
Approach:
- Define Flag_control and Flag_treatment
- Use indicator functions to track differences
- Measure
|Flag_control ⊕ Flag_treatment|(symmetric difference)
3. Threshold Calibration
Scenario: Adjusting thresholds per priority level.
Method:
- Start with uniform threshold
- Apply priority-level conditioning
- Monitor before/after proportions
- Iterate until balanced
Best Practices for Label Post-Processing ✅
1. Document Your Rules Mathematically
Write rules as indicator functions to:
- Ensure clarity
- Enable formal reasoning
- Detect logical errors
2. Verify Partition Properties
Always check that:
∑ Category_i(x) = 1 for all x
Every event belongs to exactly one category.
3. Track Before/After Proportions
Monitor class distributions:
- Per priority level
- Per segment
- Over time
4. Understand Monotonicity
When adding conditions:
- Conjunctive (AND): Shrinks the set
- Disjunctive (OR): Expands the set
5. Test Edge Cases
Check behavior at:
- Exact threshold values
- Boundary conditions
- Missing data scenarios
6. Version Control Your Rules
Track rule changes:
- What changed
- When it changed
- Why it changed
- Impact on proportions
Summary Table 📋
| Concept | Notation | Description | Use Case |
|---|---|---|---|
| Indicator Function | 𝟙{condition} |
Returns 1 if true, 0 if false | Binary classification |
| Flagged Indicator | 𝟙{x ≥ t} |
Above threshold check | Review flagging |
| Passed Indicator | 1 - 𝟙{x ≥ t} |
Below threshold (complement) | Auto-processing |
| Piecewise Partition | ∑ Region_i = 1 | Non-overlapping regions | Multi-category classification |
| Priority Conditioning | Flagged(x, priority) | Different thresholds per priority | Segment-specific rules |
| Monotonicity | A ∧ B ⊆ A | AND shrinks sets | Rule refinement |
Final Thoughts 🌟
Label post-processing is fundamentally about applying mathematical rules to partition events into categories. By formalizing these rules with indicator functions, we gain:
- Clarity: Precise definitions that eliminate ambiguity
- Reasoning: Ability to prove properties like monotonicity
- Consistency: Guaranteed partitions where every event has exactly one category
- Insight: Before/after comparisons that reveal rule impact
Key Takeaways:
✅ Indicator functions are the building blocks of rule-based classification ✅ Piecewise partitions divide the score space into non-overlapping regions ✅ Priority-level conditioning allows segment-specific thresholds ✅ partition_events_by_thresholds applies tuned rules and updates tags ✅ Before/after proportions reveal the impact of rule changes ✅ Conjunctive clauses can only shrink the Flagged set (monotonicity)
Think mathematically about your rules! 🧮🎯
Tomorrow's Preview: Day 24 - Risk Segmentation: Priority Tiers as Priors and Costs ⚖️📊




