Skip to content

Central Limit Theorem

The central limit theorem explains why normal distributions appear so often in probability. It says that many small independent random effects, when added together and normalized, have an approximately Gaussian distribution.
CENTRAL LIMIT THEOREM
Let X1,X2,X_1,X_2,\ldots be independent identically distributed random variables
with the math expectation EX1=μ\mathbb{E}X_1=\mu and the finite
variation Var(X1)=σ2 \operatorname{Var}(X_1)=\sigma^2. If Sn=X1++XnS_n=X_1+\cdots+X_n, then
SnnμσndN(0,1),n.\begin{equation*}\frac{S_n-n\mu}{\sigma\sqrt{n}}\xrightarrow{d}N(0,1),\qquad n\to\infty.\end{equation*}
Equivalently, for every pair of real numbers a<ba<b,
P(aSnnμσnb)12πabex2/2dx.\begin{equation*}\mathbb{P}\left(a\leq \frac{S_n-n\mu}{\sigma\sqrt{n}}\leq b\right)\to\frac{1}{\sqrt{2\pi}}\int_a^b e^{-x^2/2}\,\mathrm{d}x.\end{equation*}
The normalization subtracts the expected value nμn\mu and divides by the natural scale of fluctuations, σn\sigma\sqrt{n}.
A standard example is the sum of dice rolls. Let Y1,,YnY_1,\ldots,Y_n be independent rolls of a fair six-sided die. Then
EY1=72,Var(Y1)=3512.\begin{equation*}\mathbb{E}Y_1=\frac{7}{2},\qquad\operatorname{Var}(Y_1)=\frac{35}{12}.\end{equation*}
For Tn=Y1++YnT_n=Y_1+\cdots+Y_n, the central limit theorem gives
Tn7n235n/12dN(0,1).\begin{equation*}\frac{T_n-\frac{7n}{2}}{\sqrt{35n/12}}\xrightarrow{d}N(0,1).\end{equation*}
Repeated sums of 1010 dice stack into a bell-shaped histogram centered near 3535.
In the animation, each brick records one sum of ten dice. Individual rolls are discrete and bounded, but the histogram of many sums is already close to the normal curve with mean 3535 and variance 1035/12=175/610\cdot 35/12=175/6.
A Galton board gives another concrete example of the same phenomenon. Each ball makes a sequence of independent left-or-right choices, so the final bucket is determined by a binomial count. With many balls, the bucket heights begin to form the same bell-shaped profile predicted by the central limit theorem.
A Galton board turns repeated independent binary choices into a binomial histogram.
With more rows and many more balls, the same binomial mechanism produces a smoother approximation to the normal density.
A larger Galton board makes the normal approximation visually sharper.
We prove the theorem using characteristic functions. Replacing XiX_i by XiμX_i-\mu, it is enough to prove the result in the centered case μ=0\mu=0. Let
φ(t)=EeitX1\begin{equation*}\varphi(t)=\mathbb{E}e^{itX_1}\end{equation*}
be the characteristic function of X1X_1. Since EX1=0\mathbb{E}X_1=0 and Var(X1)=σ2\operatorname{Var}(X_1)=\sigma^2, the characteristic function has the expansion
φ(t)=1σ2t22+o(t2),t0.\begin{equation*}\varphi(t)=1-\frac{\sigma^2t^2}{2}+o(t^2),\qquad t\to 0.\end{equation*}
For Sn=X1++XnS_n=X_1+\cdots+X_n, independence gives
Eexp(itSnσn)=[φ(tσn)]n=(1t22n+o(1n))n.\begin{equation*}\mathbb{E}\exp\left(it\frac{S_n}{\sigma\sqrt{n}}\right)=\left[\varphi\left(\frac{t}{\sigma\sqrt{n}}\right)\right]^n=\left(1-\frac{t^2}{2n}+o\left(\frac{1}{n}\right)\right)^n.\end{equation*}
Thus the desired limiting characteristic function should be et2/2e^{-t^2/2}, which is the characteristic function of the standard normal distribution. The only point needing care is that the expression above is complex-valued, so we use the following elementary extension of the familiar limit (1+c/n)nec(1+c/n)^n\to e^c.
By the continuity theorem for characteristic functions, this convergence of characteristic functions implies
SnσndN(0,1)\begin{equation*}\frac{S_n}{\sigma\sqrt{n}}\xrightarrow{d}N(0,1)\end{equation*}
in the centered case.