Common Math & ML Symbols Cheat Sheet

If you’re diving into AI or machine learning without a strong math background, the hardest part isn’t always the concepts — it’s the symbols. A page of equations can look like another language: Greek letters, bold vectors, strange operators. At first, you might find yourself calling θ “circle with a dot” before realizing it’s Theta, and that in ML it usually represents model parameters.

Why does this matter? Because in practice, you need both pieces: the name, so you can follow along in papers, tutorials, and discussions; and the function, so you actually understand what role it plays in the math. Without that, equations feel like code you can’t run.

A cheat sheet bridges that gap. Once you recognize common notations — Σ for sum, ∇ for gradient, X for dataset, ŷ for prediction — the fog lifts, and math becomes less about decoding symbols and more about learning ideas.

Greek Letters (names & common roles)

Symbol	Name	Common ML/Stats role (context)
`α`	alpha	Learning rate (optimization); significance level (stats); penalty weight (regularization).
`η`	eta	Learning rate (alternate symbol in some texts).
`θ`	theta	Model parameters/weights.
`λ`	lambda	Regularization strength (e.g., L2/L1); rate parameter (Poisson).
`σ`	sigma	Standard deviation; noise scale.
`Σ`	Sigma	Summation operator.
`μ`	mu	Mean/average.
`ε`	epsilon	Small positive constant (stability); error term.
`β`	beta	Coefficients (regression/logistic).
`γ`	gamma	Discount factor (RL); kernel/RBF width (SVMs).
`π`	pi	3.14159…; class prior probabilities in some texts.
`ρ`	rho	Correlation; momentum parameter in some optimizers.

General Math

Symbol	Name	Meaning / Use
`x, y, z`	Variables	A number or value (input, output, etc.)
`ℝ`	Real numbers	All numbers with decimals (e.g. 3.14, -2)
`ℤ`	Integers	Whole numbers, positive and negative
`∈`	“In”	Element of. Example: `x ∈ ℝ` → x is real
`{ }`	Set	A collection of elements, e.g. `{1,2,3}`
`\|A\|`	Cardinality	Size of set A, e.g. `\|{1,2,3}\| = 3`
`a^b`	Power	a raised to b, e.g. `2^3 = 8`
`a⁻¹`	Inverse	Reciprocal `1/a`, or matrix inverse
`i, j, k`	Indices	Counters (like the j-th element of a vector)
`Σ`	Summation	Add things up: `Σᵢ₌₁ⁿ xᵢ`
`Π`	Product	Multiply things: `Πᵢ₌₁ⁿ xᵢ`

Linear Algebra

Symbol	Name	Meaning / Use
x (bold)	Vector	Ordered list of numbers (features of one sample)
X (bold capital)	Matrix	2D table of numbers (rows = samples, cols = features)
`xᵢ`	i-th element	Example: if x = [2,5,7], then `x₂ = 5`
`Xᵀ`	Transpose	Flip rows and columns of a matrix
`X⁻¹`	Inverse	“Undo” a matrix (if invertible)
`‖x‖`	Norm	Length of a vector
`·`	Dot product	Multiply two vectors elementwise & sum

Probability & Statistics

Symbol	Name	Meaning / Use
`P(A)`	Probability	Chance of event A
`P(A\|B)`	Conditional probability	Probability of A given B
`𝔼[X]`	Expectation	Mean of random variable X
`Var(X)`	Variance	Spread of X
`σ²`	Variance	Same as above
`σ`	Standard deviation	Square root of variance
`μ`	Mu	Mean (average)
`ŷ`	“y-hat”	Predicted value from a model
`θ`	Theta	Model parameters (weights)
`𝒩(μ, σ²)`	Normal distribution	Bell curve with mean μ and variance σ²

Calculus

Symbol	Name	Meaning / Use
`f(x)`	Function	Maps input to output
`f'(x)` or `df/dx`	Derivative	Rate of change
`∇f(x)`	Gradient	Vector of slopes in many dimensions
`∂f/∂xᵢ`	Partial derivative	Derivative wrt one variable
`∫ f(x) dx`	Integral	Area under curve
`limₓ→∞`	Limit	Value approached as x grows

Machine Learning Conventions

Symbol	Name	Meaning / Use
`y`	True label	Ground truth
`ŷ`	Prediction	Model’s predicted label/value
`θ, w, β`	Parameters	Model weights
`α`	Alpha (learning rate)	Step size in gradient-based optimization
`L(θ)`	Loss function	How wrong the model is
`argmin`	Argument of minimum	Value of θ that minimizes a function
`argmax`	Argument of maximum	Value that maximizes
`ℒ`	Likelihood	Probability of data given parameters
`log`	Logarithm	Common in ML (losses, likelihoods, softmax)

Common Letters (dataset shapes)

Symbol	Typical meaning	Example
`n`	Number of samples/rows	`X ∈ ℝ^{n×d}` has `n` rows (observations).
`d`	Number of features/columns (dimension)	Each `x ∈ ℝ^d` has `d` features.
`m`	Alternative for number of samples	`m` training examples.
`k`	Number of clusters/classes/components	`k`-means, `k` classes.
`K`	Total number of classes (multiclass)	`y ∈ {1,…,K}`.

Sets & Types you’ll see in ML

Notation	Read as	Meaning
`x ∈ ℝ`	x is in the reals	x is a real number.
`x ∈ ℝ^d`	x is a d-dimensional real vector	Feature vector with `d` numbers.
`X ∈ ℝ^{n×d}`	X is an n-by-d real matrix	Dataset with `n` rows and `d` columns.
`y ∈ {0,1}`	y is in	Binary label.
`y ∈ {1,…,K}`	y is one of 1 through K	Multiclass label.

Notes on Exponents

x^j → raise x to the j-th power (j is an index/exponent depending on context).
x_j → the j-th element of vector x.
e^(iπ) = -1 → (complex numbers, mainly in signal processing).