Key Terms & Glossary

A comprehensive reference for abbreviations and technical terms used throughout the blog posts.

Algorithms

Infix Notation

The standard mathematical notation where operators are placed between operands, e.g., 'a + b'. Requires parentheses for complex expressions.

Operator Precedence

The order in which operators are evaluated in an expression. For example, multiplication is evaluated before addition.

Postfix Notation

A notation where operators follow their operands, e.g., 'a b +'. Also called Reverse Polish Notation (RPN). Eliminates need for parentheses.

RPN

Reverse Polish Notation (also called Postfix notation) - a mathematical notation where operators follow their operands. It eliminates the need for parentheses and makes evaluation unambiguous.

Shunting Yard Algorithm

An algorithm for converting infix notation to postfix notation, using a stack to manage operators and precedence.

Tokenization

The process of breaking text into meaningful units (tokens) such as words, numbers, and operators for parsing and evaluation.

General

TL;DR

Too Long; Didn't Read - a brief summary of the main points.

Mathematics

Boolean Logic

A branch of algebra that deals with binary values (true/false, 1/0) and logical operations like AND, OR, and NOT.

Fuzzy Logic

An extension of Boolean logic that allows degrees of truth between 0 and 1, enabling reasoning with uncertainty and partial truth.

Gödel t-norm

A specific t-norm that uses the minimum operator to generalize AND. It preserves algebraic properties like commutativity, associativity, and monotonicity.

Monotone Transform

A function that preserves order - if x < y, then f(x) < f(y). Percentiles and ranks are invariant under monotone transforms.

T-conorm

Triangular conorm (or s-norm) - a generalization of logical OR to real-valued [0,1] degrees of truth. The Gödel t-conorm uses the maximum operator: S(x,y) = max(x,y).

T-norm

Triangular norm - a generalization of logical AND to real-valued [0,1] degrees of truth. The Gödel t-norm uses the minimum operator: T(x,y) = min(x,y).

Programming

DSL

Domain-Specific Language - a programming language specialized for a particular application domain, such as feature engineering or rule definition.

Statistics

Boxplot

A visual representation of data distribution showing quartiles, median, and outliers. It uses IQR and Tukey fences to identify outliers.

ECDF

Empirical Cumulative Distribution Function - a step function that shows the proportion of data points at or below each value. It's the foundation for computing percentiles and quantiles.

IQR

Interquartile Range - the middle 50% of your data, calculated as Q₃ − Q₁. It's a robust measure of spread that resists outliers.

Kurtosis

A measure of the tail heaviness of a distribution. High kurtosis means heavy tails with many extremes; low kurtosis means light tails.

Leptokurtic

A distribution with high kurtosis (kurtosis > 3), meaning it has heavy tails and many extreme values.

MAD

Median Absolute Deviation - a robust measure of spread that uses the median of absolute deviations from the median. It's resistant to outliers unlike standard deviation.

Median

The middle value of a sorted dataset. It's a robust measure of central tendency that resists outliers, unlike the mean.

Nonparametric

Statistical methods that don't assume a specific distribution (like Normal). Examples include median, percentiles, and IQR-based methods.

Order Statistics

The sorted values of a dataset. If you sort n values as x(1) ≤ x(2) ≤ ... ≤ x(n), these are the order statistics.

Outlier

A data point that is significantly different from other observations. Can be detected using z-scores, IQR methods, or Tukey fences.

Percentile

A value below which a given percentage of observations fall. For example, the 80th percentile is the value below which 80% of the data lies.

Percentile Rank

The percentage of values in a dataset that are at or below a given value. It maps each observation to a number between 0 and 1.

Platykurtic

A distribution with low kurtosis (kurtosis < 3), meaning it has light tails and few extreme values.

Q₁

First Quartile (25th percentile) - the value below which 25% of the data falls.

Q₂

Second Quartile (50th percentile or Median) - the middle value of the dataset.

Q₃

Third Quartile (75th percentile) - the value below which 75% of the data falls.

Quantile

A generalization of percentiles. The p-th quantile is the value below which p proportion of the data falls. Percentiles are quantiles expressed as percentages.

Robust Statistics

Statistical methods that are resistant to outliers and violations of assumptions. Examples include median (instead of mean) and MAD (instead of SD).

SD

Standard Deviation - a measure of spread that squares deviations from the mean. It's sensitive to outliers and assumes normality.

Skewness

A measure of the asymmetry of a distribution. Positive skew means a long tail to the right; negative skew means a long tail to the left.

Stratification

Dividing a population into subgroups (strata) based on some criteria, such as percentile ranks or quantiles, for sampling or analysis purposes.

Tukey Fences

Boundaries used to identify outliers in boxplots. Inner fence = Q₁ − 1.5×IQR and Q₃ + 1.5×IQR; outer fence = Q₁ − 3×IQR and Q₃ + 3×IQR.

Winsorizing

A method of handling outliers by capping extreme values at a certain percentile (e.g., replacing values above the 95th percentile with the 95th percentile value).

Z-score

A standardized score that measures how many standard deviations a value is from the mean. Robust z-scores use median and MAD instead of mean and SD.