Common Math & ML Symbols Cheat Sheet
If you’re diving into AI or machine learning without a strong math background, the hardest part isn’t always the concepts — it’s the symbols. A page of equations can look like another language: Greek letters, bold vectors, strange operators. At first, you might find yourself calling θ “circle with a dot” before realizing it’s Theta, and that in ML it usually represents model parameters.
Why does this matter? Because in practice, you need both pieces: the name, so you can follow along in papers, tutorials, and discussions; and the function, so you actually understand what role it plays in the math. Without that, equations feel like code you can’t run.
A cheat sheet bridges that gap. Once you recognize common notations — Σ for sum, ∇ for gradient, X for dataset, ŷ for prediction — the fog lifts, and math becomes less about decoding symbols and more about learning ideas.
Greek Letters (names & common roles)
| Symbol |
Name |
Common ML/Stats role (context) |
α |
alpha |
Learning rate (optimization); significance level (stats); penalty weight (regularization). |
η |
eta |
Learning rate (alternate symbol in some texts). |
θ |
theta |
Model parameters/weights. |
λ |
lambda |
Regularization strength (e.g., L2/L1); rate parameter (Poisson). |
σ |
sigma |
Standard deviation; noise scale. |
Σ |
Sigma |
Summation operator. |
μ |
mu |
Mean/average. |
ε |
epsilon |
Small positive constant (stability); error term. |
β |
beta |
Coefficients (regression/logistic). |
γ |
gamma |
Discount factor (RL); kernel/RBF width (SVMs). |
π |
pi |
3.14159…; class prior probabilities in some texts. |
ρ |
rho |
Correlation; momentum parameter in some optimizers. |
General Math
| Symbol |
Name |
Meaning / Use |
x, y, z |
Variables |
A number or value (input, output, etc.) |
ℝ |
Real numbers |
All numbers with decimals (e.g. 3.14, -2) |
ℤ |
Integers |
Whole numbers, positive and negative |
∈ |
“In” |
Element of. Example: x ∈ ℝ → x is real |
{ } |
Set |
A collection of elements, e.g. {1,2,3} |
|A| |
Cardinality |
Size of set A, e.g. |{1,2,3}| = 3 |
a^b |
Power |
a raised to b, e.g. 2^3 = 8 |
a⁻¹ |
Inverse |
Reciprocal 1/a, or matrix inverse |
i, j, k |
Indices |
Counters (like the j-th element of a vector) |
Σ |
Summation |
Add things up: Σᵢ₌₁ⁿ xᵢ |
Π |
Product |
Multiply things: Πᵢ₌₁ⁿ xᵢ |
Linear Algebra
| Symbol |
Name |
Meaning / Use |
| x (bold) |
Vector |
Ordered list of numbers (features of one sample) |
| X (bold capital) |
Matrix |
2D table of numbers (rows = samples, cols = features) |
xᵢ |
i-th element |
Example: if x = [2,5,7], then x₂ = 5 |
Xᵀ |
Transpose |
Flip rows and columns of a matrix |
X⁻¹ |
Inverse |
“Undo” a matrix (if invertible) |
‖x‖ |
Norm |
Length of a vector |
· |
Dot product |
Multiply two vectors elementwise & sum |
Probability & Statistics
| Symbol |
Name |
Meaning / Use |
P(A) |
Probability |
Chance of event A |
P(A|B) |
Conditional probability |
Probability of A given B |
𝔼[X] |
Expectation |
Mean of random variable X |
Var(X) |
Variance |
Spread of X |
σ² |
Variance |
Same as above |
σ |
Standard deviation |
Square root of variance |
μ |
Mu |
Mean (average) |
ŷ |
“y-hat” |
Predicted value from a model |
θ |
Theta |
Model parameters (weights) |
𝒩(μ, σ²) |
Normal distribution |
Bell curve with mean μ and variance σ² |
Calculus
| Symbol |
Name |
Meaning / Use |
f(x) |
Function |
Maps input to output |
f'(x) or df/dx |
Derivative |
Rate of change |
∇f(x) |
Gradient |
Vector of slopes in many dimensions |
∂f/∂xᵢ |
Partial derivative |
Derivative wrt one variable |
∫ f(x) dx |
Integral |
Area under curve |
limₓ→∞ |
Limit |
Value approached as x grows |
Machine Learning Conventions
| Symbol |
Name |
Meaning / Use |
y |
True label |
Ground truth |
ŷ |
Prediction |
Model’s predicted label/value |
θ, w, β |
Parameters |
Model weights |
α |
Alpha (learning rate) |
Step size in gradient-based optimization |
L(θ) |
Loss function |
How wrong the model is |
argmin |
Argument of minimum |
Value of θ that minimizes a function |
argmax |
Argument of maximum |
Value that maximizes |
ℒ |
Likelihood |
Probability of data given parameters |
log |
Logarithm |
Common in ML (losses, likelihoods, softmax) |
Common Letters (dataset shapes)
| Symbol |
Typical meaning |
Example |
n |
Number of samples/rows |
X ∈ ℝ^{n×d} has n rows (observations). |
d |
Number of features/columns (dimension) |
Each x ∈ ℝ^d has d features. |
m |
Alternative for number of samples |
m training examples. |
k |
Number of clusters/classes/components |
k-means, k classes. |
K |
Total number of classes (multiclass) |
y ∈ {1,…,K}. |
Sets & Types you’ll see in ML
| Notation |
Read as |
Meaning |
x ∈ ℝ |
x is in the reals |
x is a real number. |
x ∈ ℝ^d |
x is a d-dimensional real vector |
Feature vector with d numbers. |
X ∈ ℝ^{n×d} |
X is an n-by-d real matrix |
Dataset with n rows and d columns. |
y ∈ {0,1} |
y is in |
Binary label. |
y ∈ {1,…,K} |
y is one of 1 through K |
Multiclass label. |
Notes on Exponents
x^j → raise x to the j-th power (j is an index/exponent depending on context).
x_j → the j-th element of vector x.
e^(iπ) = -1 → (complex numbers, mainly in signal processing).