A comprehensive reference for abbreviations and technical terms used throughout the blog posts.
The standard mathematical notation where operators are placed between operands, e.g., 'a + b'. Requires parentheses for complex expressions.
The order in which operators are evaluated in an expression. For example, multiplication is evaluated before addition.
A notation where operators follow their operands, e.g., 'a b +'. Also called Reverse Polish Notation (RPN). Eliminates need for parentheses.
Reverse Polish Notation (also called Postfix notation) - a mathematical notation where operators follow their operands. It eliminates the need for parentheses and makes evaluation unambiguous.
An algorithm for converting infix notation to postfix notation, using a stack to manage operators and precedence.
The process of breaking text into meaningful units (tokens) such as words, numbers, and operators for parsing and evaluation.
Too Long; Didn't Read - a brief summary of the main points.
A branch of algebra that deals with binary values (true/false, 1/0) and logical operations like AND, OR, and NOT.
An extension of Boolean logic that allows degrees of truth between 0 and 1, enabling reasoning with uncertainty and partial truth.
A specific t-norm that uses the minimum operator to generalize AND. It preserves algebraic properties like commutativity, associativity, and monotonicity.
A function that preserves order - if x < y, then f(x) < f(y). Percentiles and ranks are invariant under monotone transforms.
Triangular conorm (or s-norm) - a generalization of logical OR to real-valued [0,1] degrees of truth. The Gödel t-conorm uses the maximum operator: S(x,y) = max(x,y).
Triangular norm - a generalization of logical AND to real-valued [0,1] degrees of truth. The Gödel t-norm uses the minimum operator: T(x,y) = min(x,y).
Domain-Specific Language - a programming language specialized for a particular application domain, such as feature engineering or rule definition.
A visual representation of data distribution showing quartiles, median, and outliers. It uses IQR and Tukey fences to identify outliers.
Empirical Cumulative Distribution Function - a step function that shows the proportion of data points at or below each value. It's the foundation for computing percentiles and quantiles.
Interquartile Range - the middle 50% of your data, calculated as Q₃ − Q₁. It's a robust measure of spread that resists outliers.
A measure of the tail heaviness of a distribution. High kurtosis means heavy tails with many extremes; low kurtosis means light tails.
A distribution with high kurtosis (kurtosis > 3), meaning it has heavy tails and many extreme values.
Median Absolute Deviation - a robust measure of spread that uses the median of absolute deviations from the median. It's resistant to outliers unlike standard deviation.
The middle value of a sorted dataset. It's a robust measure of central tendency that resists outliers, unlike the mean.
Statistical methods that don't assume a specific distribution (like Normal). Examples include median, percentiles, and IQR-based methods.
The sorted values of a dataset. If you sort n values as x(1) ≤ x(2) ≤ ... ≤ x(n), these are the order statistics.
A data point that is significantly different from other observations. Can be detected using z-scores, IQR methods, or Tukey fences.
A value below which a given percentage of observations fall. For example, the 80th percentile is the value below which 80% of the data lies.
The percentage of values in a dataset that are at or below a given value. It maps each observation to a number between 0 and 1.
A distribution with low kurtosis (kurtosis < 3), meaning it has light tails and few extreme values.
First Quartile (25th percentile) - the value below which 25% of the data falls.
Second Quartile (50th percentile or Median) - the middle value of the dataset.
Third Quartile (75th percentile) - the value below which 75% of the data falls.
A generalization of percentiles. The p-th quantile is the value below which p proportion of the data falls. Percentiles are quantiles expressed as percentages.
Statistical methods that are resistant to outliers and violations of assumptions. Examples include median (instead of mean) and MAD (instead of SD).
Standard Deviation - a measure of spread that squares deviations from the mean. It's sensitive to outliers and assumes normality.
A measure of the asymmetry of a distribution. Positive skew means a long tail to the right; negative skew means a long tail to the left.
Dividing a population into subgroups (strata) based on some criteria, such as percentile ranks or quantiles, for sampling or analysis purposes.
Boundaries used to identify outliers in boxplots. Inner fence = Q₁ − 1.5×IQR and Q₃ + 1.5×IQR; outer fence = Q₁ − 3×IQR and Q₃ + 3×IQR.
A method of handling outliers by capping extreme values at a certain percentile (e.g., replacing values above the 95th percentile with the 95th percentile value).
A standardized score that measures how many standard deviations a value is from the mean. Robust z-scores use median and MAD instead of mean and SD.