RagTags for ResCommunes

the machines have taken over

Common Math & ML Symbols Cheat Sheet

If you’re diving into AI or machine learning without a strong math background, the hardest part isn’t always the concepts — it’s the symbols. A page of equations can look like another language: Greek letters, bold vectors, strange operators. At first, you might find yourself calling θ “circle with a dot” before realizing it’s Theta, and that in ML it usually represents model parameters.

Why does this matter? Because in practice, you need both pieces: the name, so you can follow along in papers, tutorials, and discussions; and the function, so you actually understand what role it plays in the math. Without that, equations feel like code you can’t run.

A cheat sheet bridges that gap. Once you recognize common notations — Σ for sum, ∇ for gradient, X for dataset, ŷ for prediction — the fog lifts, and math becomes less about decoding symbols and more about learning ideas.


Greek Letters (names & common roles)

Symbol Name Common ML/Stats role (context)
α alpha Learning rate (optimization); significance level (stats); penalty weight (regularization).
η eta Learning rate (alternate symbol in some texts).
θ theta Model parameters/weights.
λ lambda Regularization strength (e.g., L2/L1); rate parameter (Poisson).
σ sigma Standard deviation; noise scale.
Σ Sigma Summation operator.
μ mu Mean/average.
ε epsilon Small positive constant (stability); error term.
β beta Coefficients (regression/logistic).
γ gamma Discount factor (RL); kernel/RBF width (SVMs).
π pi 3.14159…; class prior probabilities in some texts.
ρ rho Correlation; momentum parameter in some optimizers.

General Math

Symbol Name Meaning / Use
x, y, z Variables A number or value (input, output, etc.)
Real numbers All numbers with decimals (e.g. 3.14, -2)
Integers Whole numbers, positive and negative
“In” Element of. Example: x ∈ ℝ → x is real
{ } Set A collection of elements, e.g. {1,2,3}
|A| Cardinality Size of set A, e.g. |{1,2,3}| = 3
a^b Power a raised to b, e.g. 2^3 = 8
a⁻¹ Inverse Reciprocal 1/a, or matrix inverse
i, j, k Indices Counters (like the j-th element of a vector)
Σ Summation Add things up: Σᵢ₌₁ⁿ xᵢ
Π Product Multiply things: Πᵢ₌₁ⁿ xᵢ

Linear Algebra

Symbol Name Meaning / Use
x (bold) Vector Ordered list of numbers (features of one sample)
X (bold capital) Matrix 2D table of numbers (rows = samples, cols = features)
xᵢ i-th element Example: if x = [2,5,7], then x₂ = 5
Xᵀ Transpose Flip rows and columns of a matrix
X⁻¹ Inverse “Undo” a matrix (if invertible)
‖x‖ Norm Length of a vector
· Dot product Multiply two vectors elementwise & sum

Probability & Statistics

Symbol Name Meaning / Use
P(A) Probability Chance of event A
P(A|B) Conditional probability Probability of A given B
𝔼[X] Expectation Mean of random variable X
Var(X) Variance Spread of X
σ² Variance Same as above
σ Standard deviation Square root of variance
μ Mu Mean (average)
ŷ “y-hat” Predicted value from a model
θ Theta Model parameters (weights)
𝒩(μ, σ²) Normal distribution Bell curve with mean μ and variance σ²

Calculus

Symbol Name Meaning / Use
f(x) Function Maps input to output
f'(x) or df/dx Derivative Rate of change
∇f(x) Gradient Vector of slopes in many dimensions
∂f/∂xᵢ Partial derivative Derivative wrt one variable
∫ f(x) dx Integral Area under curve
limₓ→∞ Limit Value approached as x grows

Machine Learning Conventions

Symbol Name Meaning / Use
y True label Ground truth
ŷ Prediction Model’s predicted label/value
θ, w, β Parameters Model weights
α Alpha (learning rate) Step size in gradient-based optimization
L(θ) Loss function How wrong the model is
argmin Argument of minimum Value of θ that minimizes a function
argmax Argument of maximum Value that maximizes
Likelihood Probability of data given parameters
log Logarithm Common in ML (losses, likelihoods, softmax)

Common Letters (dataset shapes)

Symbol Typical meaning Example
n Number of samples/rows X ∈ ℝ^{n×d} has n rows (observations).
d Number of features/columns (dimension) Each x ∈ ℝ^d has d features.
m Alternative for number of samples m training examples.
k Number of clusters/classes/components k-means, k classes.
K Total number of classes (multiclass) y ∈ {1,…,K}.

Sets & Types you’ll see in ML

Notation Read as Meaning
x ∈ ℝ x is in the reals x is a real number.
x ∈ ℝ^d x is a d-dimensional real vector Feature vector with d numbers.
X ∈ ℝ^{n×d} X is an n-by-d real matrix Dataset with n rows and d columns.
y ∈ {0,1} y is in Binary label.
y ∈ {1,…,K} y is one of 1 through K Multiclass label.

Notes on Exponents

  • x^j → raise x to the j-th power (j is an index/exponent depending on context).
  • x_j → the j-th element of vector x.
  • e^(iπ) = -1 → (complex numbers, mainly in signal processing).