Strong Law of Large numbers

Last updated: 2026-04-12

The Strong Law of Large Numbers (SLLN) is a fundamental result in probability theory that describes the long-term behavior of the average of a sequence of random variables.
The SLLN states that, under certain conditions, the sample average of a sequence of independent and identically distributed (i.i.d.) random variables converges almost surely to the expected value (mean) of those variables as the sample size goes to infinity.
Mathematically, let X1,X2,X_1, X_2, \ldots be a sequence of i.i.d. random variables with a finite first moment, i.e., EX1<\mathbb{E} |X_1| < \infty. The SLLN asserts that the sample average of these variables converges almost surely to the expected value:
X1+X2++Xnna. s.EX1  , as n.\begin{equation*}\frac{X_1 + X_2 + \dots + X_n}{n} \xrightarrow{\text{a. s.}} \mathbb{E} X_1 \;, \text{ as n} \rightarrow \infty.\end{equation*}
This means that as the number of observations nn increases, the sample average will almost surely converge to the true mean EX1\mathbb{E} X_1.

Motivation

Coin Flip Experiment

Consider a fair coin flip experiment where we assign:
Xi={1,if heads,0,if tails.\begin{equation*}X_i= \begin{cases}1, & \text{if heads}, \\ 0, & \text{if tails} .\end{cases}\end{equation*}
The expected value is clearly E[Xi]=12\mathbb{E}[X_i] = \frac{1}{2}. But what happens when we perform this experiment repeatedly?
Fair coin flip experiment
In our animation, we observe:
  • The horizontal line at y=0.5y = 0.5 represents E[Xi]\mathbb{E}[X_i]
  • The blue path shows Snn\frac{S_n}{n} as nn increases
  • Early fluctuations are larger due to the 1n\frac{1}{n} scaling
  • As nn \to \infty, the path stabilizes near μ=0.5\mu = 0.5

Monte-Carlo Method

One of the most important applications of the SLLN is in the Monte Carlo method, which is used to numerically estimate quantities that are difficult or impossible to calculate analytically. The accuracy of Monte Carlo method depends on the convergence of the sample averages to the true values, which is guaranteed by the SLLN.
Let's define a random variable XX based on randomly chosen points (U1,U2)(U_1, U_2) in a 6×66 \times 6 square:
X={1,if U12+U2290,otherwise\begin{equation*}X = \begin{cases} 1, & \text{if } U_1^2 + U_2^2 \leq 9 \\ 0, & \text{otherwise} \end{cases}\end{equation*}
where U1,U2Uniform(3,3)U_1, U_2 \sim \text{Uniform}(-3, 3)
Calculating π\pi with Monte-Carlo
The expected value of XX is:
E[X]=P(U12+U229)=Area of CircleArea of Square=πr2(2r)2=π4\begin{equation*}\mathbb{E}[X] = \mathbb{P}(U_1^2 + U_2^2 \leq 9) = \frac{\text{Area of Circle}}{\text{Area of Square}} = \frac{\pi r^2}{(2r)^2} = \frac{\pi}{4}\end{equation*}
Let Xii=1n{X_i}{i=1}^n be i.i.d. copies of XX. By the Strong Law of Large Numbers:
1ni=1nXia.s.E[X]=π4\begin{equation*}\frac{1}{n}\sum{i=1}^n X_i \xrightarrow{a.s.} \mathbb{E}[X] = \frac{\pi}{4}\end{equation*}
Therefore:
41ni=1nXia.s.π\begin{equation*}4 \cdot \frac{1}{n}\sum_{i=1}^n X_i \xrightarrow{a.s.} \pi\end{equation*}

Strong Law of Large Numbers with finite 4th moment

The foolowing example is Theorem 2.3.5, page 59 from [Durrett2019]

One of the earliest results in this direction is the following theorem, which can be easily proved with Chebyshev inequality and   \; Borel-Cantelli lemma.
Theorem (Cantelli)
Let X1,X2,X_1, X_2, \ldots be i.i.d. with EX=μ\mathbb{E}X = \mu and EX4<\mathbb{E}X^4 < \infty. If Sn=X1++XnS_n = X_1 + \cdots + X_n, then Sn/nμS_n/n \to \mu a.s.
\begin{proof}
\quad Proof. By letting Xi=XiμX_i' = X_i - \mu, we can suppose without loss of generality that μ=0\mu = 0. Now
ESn4=E(i=1nXi)4=E1i,j,k,lnXiXjXkXl\begin{equation*}\mathbb{E}S_n^4 = \mathbb{E}\left( \sum_{i=1}^n X_i \right)^4 = \mathbb{E} \sum_{1 \leq i,j,k,l \leq n} X_i X_j X_k X_l\end{equation*}
Terms in the sum of the form E(Xi3Xj)\mathbb{E}(X_i^3 X_j), E(Xi2XjXk)\mathbb{E}(X_i^2 X_j X_k), and E(XiXjXkXl)\mathbb{E}(X_i X_j X_k X_l) are 00 (if i,j,k,li,j,k,l are distinct) since the expectation of the product is the product of the expectations because of the independence, and in each case one of the terms has expectation 0. The only terms that do not vanish are those of the form EXi4\mathbb{E}X_i'^4 and EXi2Xj2=(EXi2)2\mathbb{E}X_i'^2 X_j'^2 = (\mathbb{E}X_i'^2)^2. There are nn and 3n(n1)3n(n - 1) of these terms, respectively. In the second case, we can pick the two indices in n(n1)/2n(n - 1)/2 ways, and with the indices fixed, the term can arise in a total of six ways: {1,2},{1,3},{1,4},{2,3},{2,4}\{ 1, 2\}, \{ 1, 3\}, \{ 1, 4\}, \{ 2, 3\}, \{ 2, 4\} and {3,4}\{ 3, 4\}. The last observation implies
ESn4=nEXi4+3(n2n)(EXi2)2Cn2\begin{equation*}\mathbb{E}S_n^4 = n\mathbb{E}X_i^4 + 3(n^2 - n)(\mathbb{E}X_i^2)^2 \leq Cn^2\end{equation*}
where C<C < \infty. Chebyshev’s inequality gives us
P(Snn>ϵ)E(Sn4)(nϵ)4C(n2ϵ4)\begin{equation*}P(\frac{|S_n|}{n} > \epsilon) \leq \frac{\mathbb{E}(S_n^4)}{(n\epsilon)^4} \leq \frac{C}{(n^2 \epsilon^4)}\end{equation*}
Summing on nn and using the Borel-Cantelli lemma gives P(Sn>nϵ i.o.)=0P(|S_n| > n\epsilon \text{ i.o.}) = 0 or n0,nn0Snnϵ \exists n_0, \, \forall n \geq n_0 \: \frac{|S_n|}{n} \leq \epsilon. Since ϵ\epsilon is arbitrary, the proof is complete. \Box
\end{proof}

Toeplitz and Kronecker Lemmas

In this section, we introduce the Toeplitz and Kronecker Lemmas, which play crucial roles in proving the Strong Law of Large Numbers with finite absolute first monent.

Toeplitz Lemma

The Toeplitz Lemma provides conditions under which the averages of a sequence {xn}n1\{x_n\}_{n\geq 1} converge to a limit.
Lemma (Toeplitz)
Let {an}\{a_n\} be a sequence of nonnegative numbers, bn=i=1naib_n = \sum_{i=1}^n a_i, b1=a1>0b_1 = a_1 > 0, and bnb_n \nearrow \infty, nn \to \infty. Let {xn}n1\{x_n\}_{n\geq 1} be a sequence of numbers converging to xx. Then
1bnj=1najxjx.\begin{equation*}\frac{1}{b_n} \sum_{j=1}^n a_j x_j \to x. \tag{7}\end{equation*}
\quad Proof. Let ε>0\varepsilon > 0, and let n0=n0(ε)n_0 = n_0(\varepsilon) be such that xnxε/2\left|x_n - x\right| \leq \varepsilon/2 for all nn0n \geq n_0. Choose n1>n0n_1 > n_0 such that
1bn1j=1n0xjx<ε2.\begin{equation*}\frac{1}{b_{n_1}} \sum_{j=1}^{n_0} \left| x_j - x \right| < \frac{\varepsilon}{2}.\end{equation*}
Then, for any n>n1n > n_1,
1bnj=1najxjx=1bnj=1najxjbnxbn=1bnj=1najxjj=1najxbntriangle inequality1bnj=1najxjx=1bnj=1n0ajxjx+1bnj=n0+1najxjxbn1bn1bn1j=1n0ajxjx+1bnj=n0+1najxjxε2+bnbn0bnε2ε.\begin{align*}\left|\frac{1}{b_n} \sum_{j=1}^n a_j x_j - x\right| & = \left|\frac{1}{b_n} \sum_{j=1}^n a_j x_j - \frac{b_n x}{b_n} \right| = \left|\frac{1}{b_n} \sum_{j=1}^n a_j x_j - \frac{\sum_{j=1}^n a_j x}{b_n} \right| \\& \stackrel{\text{triangle inequality}}{\leq} \frac{1}{b_n} \sum_{j=1}^n a_j \left|x_j - x\right| \\&= \frac{1}{b_n} \sum_{j=1}^{n_0} a_j \left|x_j - x\right| + \frac{1}{b_n} \sum_{j=n_0 + 1}^n a_j \left|x_j - x\right| \\& \stackrel{b_{n_1} \leq b_n }{\leq} \frac{1}{b_{n_1}} \sum_{j=1}^{n_0} a_j \left|x_j - x\right| + \frac{1}{b_n} \sum_{j=n_0 + 1}^n a_j \left|x_j - x\right| \\&\leq \frac{\varepsilon}{2} + \frac{b_n - b_{n_0}}{b_n} \frac{\varepsilon}{2} \leq \varepsilon.\end{align*}
This completes the proof of the lemma. \Box
Remark
In particular, if an=1,  n1a_n = 1, \; n \geq 1 and bn=nb_n = n then
x1++xnnx.\begin{equation*}\frac{x_1 + \ldots + x_n}{n} \to x.\end{equation*}
We will further use the Toeplitz Lemma in such setting.

Kronecker Lemma

Lemma (series and averages, Kronecker)
Let {bn}\{b_n\} be a sequence of positive increasing numbers, bnb_n \nearrow \infty as nn \to \infty, and let {xn}\{x_n\} be a sequence of numbers such that xn\sum x_n converges. Then
1bnj=1nbjxj0,n.\begin{equation*}\frac{1}{b_n} \sum_{j=1}^n b_j x_j \to 0, \quad n \to \infty.\end{equation*}
\quad Proof. Let b0=0b_0 = 0, S0=0S_0 = 0, Sn=j=1nxjS_n = \sum_{j=1}^n x_j. Then (by summation by parts)
j=1nbjxj=j=1nbj(SjSj1)=bnSnb0S0j=1nSj1(bjbj1)\begin{equation*}\sum_{j=1}^n b_j x_j = \sum_{j=1}^n b_j (S_j - S_{j-1}) = b_n S_n - b_0 S_0 - \sum_{j=1}^n S_{j-1} (b_j - b_{j-1})\end{equation*}
and therefore (setting aj=bjbj1a_j = b_j - b_{j-1}),
1bnj=1nbjxj=Sn1bnj=1nSj1aj0,\begin{equation*}\frac{1}{b_n} \sum_{j=1}^n b_j x_j = S_n - \frac{1}{b_n} \sum_{j=1}^n S_{j-1} a_j \rightarrow 0,\end{equation*}
since, if SnxS_n \rightarrow x, then, by Toeplitz's lemma,
1bnj=1nSj1ajx.\begin{equation*}\frac{1}{b_n} \sum_{j=1}^n S_{j-1} a_j \rightarrow x.\end{equation*}
This establishes the lemma. \Box
Remark
If bn=n b_n = n , xn=yn/n x_n = y_n / n and (yn/n) \sum (y_n / n) converges, then
y1++ynn0,n.\begin{equation*}\frac{y_1 + \ldots + y_n}{n} \to 0, \quad n \to \infty.\end{equation*}

Proof of Strong Law of Large Numbers

The foolowing proof is taken from [Shiryaev2] pages 16-17

Here we present the strongest version of the SLLN, which was proved by Kolmogorov and require the finitnes of the expectation of the absolute value.
Theorem (Strong law of large numbers, Kolmogorov)
Let X1,X2,X_1, X_2, \ldots be i.i.d. with EX=μ\mathbb{E}X = \mu and EX<\mathbb{E} | X | < \infty. If Sn=X1++XnS_n = X_1 + \cdots + X_n, then Sn/nμS_n/n \to \mu a.s.
\end{theorem}
We break down the proof of the theorem in several steps. First, we consider the truncated sequence {Xn}\{X_n^{\prime} \}, noting that the convergence of its sample average is equivalent to the convergence of the sample average of {Xn}\{X_n \}. Next, we apply Kronecker's Lemma and Kolmogorov's Two-Series Theorem to derive the final result.

Truncated sequence

\quad Proof. Define Xn=Xn1{Xnn}X_n^{\prime}=X_n 1\left\{\left|X_n\right| \leq n\right\} and
nP{XnXn}=nP{X1>n}0P{X1>t}dt=defEX<.\begin{align*}\sum_n P\left\{X_n^{\prime} \neq X_n\right\} & =\sum_n P\left\{|X_1|>n\right\} \\& \leq \int_0^{\infty} P\left\{|X_1|>t\right\} d t \\& \stackrel{\text{def}}{=} E|X|<\infty .\end{align*}
Hence, the Borel-Cantelli lemma yields P{XnXnP\left\{X_n^{\prime} \neq X_n\right. i.o. }=0\}=0, and so Xn=XnX_n^{\prime}=X_n for all but finitely many nNn \in \mathrm{N} a.s.

Equivalent convergence of the truncated sequence

Because of Xn=XnX_n^{\prime}=X_n for all but finitely many nNn \in \mathrm{N} a.s., the convergence of knXkn0\frac{\sum_{k \leq n} X_k}{n} \rightarrow 0 a.s. is equivalent to knXkn0\frac{\sum_{k \leq n} X_k^{\prime}}{n} \rightarrow 0 a.s.
Note that in general X^n0 \hat{X}_n \neq 0 , but
E^X^n=EX^n1(X^n<n)=EX^11(X^1<n)EX^1=0.\begin{equation*}\hat{E}\hat{X}_n = E\hat{X}_n \mathbf{1}_{(|\hat{X}_n| < n)} = E\hat{X}_1 \mathbf{1}_{(|\hat{X}_1| < n)} \to E\hat{X}_1 = 0.\end{equation*}
Hence, by Toeplitz Lemma, for an=1,  n1a_n = 1, \; n \geq 1 and bn=nb_n = n,
1nk=1nE^X^k0,n,\begin{equation*}\frac{1}{n} \sum_{k=1}^n \hat{E}\hat{X}_k \to 0, \quad n \to \infty,\end{equation*}
By Kronecker Lemma it suffices to prove instead that
knXkkconvergesa.s.\begin{equation*}\sum_{k \leq n} \frac{X_k^{\prime}}{k}\end{equation*} converges a.s.

The convergence of variances

By   \; Kolmogorov and Khinchin Theorem we need to show
n=1VarXnn2n=1EXn2n2=n=11n2E(Xn1(Xn<n))2=n=11n2E(Xn21(Xn<n))=n=11n2k=1nE(X12(k1X1<k))=k=1E(X12(k1X1<k))n=k1n22k=11kE(X12(k1X1<k))2k=1E(X11(k1X1<k))=2EX1<.\begin{align*}\sum_{n=1}^{\infty} \frac{\operatorname{Var} X_n}{n^2} &\leq \sum_{n=1}^{\infty} \frac{E X_n^2}{n^2} = \sum_{n=1}^{\infty} \frac{1}{n^2} E\left(X_n \mathbf{1}_{(X_n < n)}\right)^2 \\& = \sum_{n=1}^{\infty} \frac{1}{n^2} E\left(X_n^2 \mathbf{1}_{(X_n < n)}\right) \\& = \sum_{n=1}^{\infty} \frac{1}{n^2} \sum_{k=1}^n E\left(X^2_{1} (k-1 \leq |X_1| < k)\right) \\& = \sum_{k=1}^{\infty} E\left(X^2_{1} (k-1 \leq |X_1| < k)\right) \sum_{n=k}^{\infty} \frac{1}{n^2} \\& \leq 2 \sum_{k=1}^{\infty} \frac{1}{k} E\left(X^2_{1} (k-1 \leq |X_1| < k)\right) \\& \leq 2 \sum_{k=1}^{\infty} E\left(|X_1| \mathbf{1}_{(k-1 \leq |X_1| < k)}\right) = 2E |X_1| < \infty.\end{align*}

Application of Strong Law of Large Numbers

The Monte Carlo method

Let f(x) f(x) be a continuous function defined on [0,1][0, 1], with values in [0,1][0, 1]. The following idea is the foundation of the statistical method of calculating
01f(x)dx\begin{equation*}\int_0^1 f(x) \, \text{d} x\end{equation*}
, called the Monte Carlo method. Let X1,X2,X3,X_1, X_2, X_3, \ldots be a sequence of independent random variables uniformly distributed on [0,1][0, 1]. Define a new sequence of i.i.d. random variables
Yn=f(Xn)\begin{equation*}Y_n = f(X_n)\end{equation*}
with the mathematical expectation
EYn=01f(x)dxn1.\begin{equation*}\mathbb{E}Y_n = \int_0^1 f(x) \: \text{d}x \quad \forall n \geq 1.\end{equation*}
By the strong law of large numbers (Theorem 3),
1ni=1nf(Xi)na.s.01f(x)dx.\begin{equation*}\frac{1}{n} \sum_{i=1}^n f(X_i) \xrightarrow[n \to \infty]{\text{a.s.}} \int_0^1 f(x) \: \text{d}x.\end{equation*}
Consequently, we can approximate an integral 01f(x)dx\int_0^1 f(x) \: \text{d}x by taking a simulation consisting of random variables Xi,i1X_i, i \geq 1 and then calculating f(Xi),i1f(X_i), i \geq 1 and 1ni=1nf(Xi)\frac{1}{n} \sum_{i=1}^n f(X_i).

References