Skip to content

Probability Inequalities

This article collects several fundamental inequalities that are central to probability theory and statistics.
The inequalities discussed in this article are:
  • Markov's inequality
  • Chebyshev's inequality
  • Jensen's inequality
  • Cauchy–Schwarz inequality
  • Paley–Zygmund inequality
  • Chernoff bound
  • Hoeffding's inequality
  • Bernstein's inequality
  • Azuma–Hoeffding inequality
  • Kolmogorov’s maximal inequality
ε1{Xε}X\varepsilon\mathbf{1}_{\{X \geq \varepsilon\}} \leq X
MARKOV'S INEQUALITY
Let X0X \geq 0 be a random variable with mean EX<\mathbb{E}X<\infty, and let ε>0\varepsilon>0. Then
P(Xε)EXε.\begin{equation*}\mathbb{P}(X \geq \varepsilon) \leq \frac{\mathbb{E}X}{\varepsilon}.\end{equation*}
\quad Proof. We can notice inequality holds true
ε1{Xε}X.\begin{equation*}\varepsilon\mathbf{1}_{\{X \geq \varepsilon\}} \leq X.\end{equation*}
Taking expectation yields
εP(Xε)=E[ε1{Xε}].\begin{equation*}\varepsilon\mathbb{P}(X \geq \varepsilon)=\mathbb{E}\left[\varepsilon\mathbf{1}_{\{X \geq \varepsilon\}}\right].\end{equation*}
Dividing by ε\varepsilon proves the claim. \Box
CHEBYSHEV'S INEQUALITY
Let XX be a random variable with mean EX\mathbb{E}X and finite variance Var(X)\operatorname{Var}(X). Then for every ε>0\varepsilon>0,
P(XEXε)Var(X)ε2.\begin{equation*}\mathbb{P}\left(|X-\mathbb{E}X| \geq \varepsilon\right) \leq \frac{\operatorname{Var}(X)}{\varepsilon^2}.\end{equation*}
\quad Proof. Apply Markov's inequality to the nonnegative random variable Y=(XEX)2Y=(X-\mathbb{E}X)^2:
P(XEXε)=P((XEX)2ε2)E(XEX)2ε2=Var(X)ε2.\begin{equation*}\mathbb{P}\left(|X-\mathbb{E}X| \geq \varepsilon\right)=\mathbb{P}\left((X-\mathbb{E}X)^2 \geq \varepsilon^2\right)\leq \frac{\mathbb{E}(X-\mathbb{E}X)^2}{\varepsilon^2}=\frac{\operatorname{Var}(X)}{\varepsilon^2}. \qquad \Box\end{equation*}
ax+bφ(x)ax+b\leq \varphi(x)
JENSEN'S INEQUALITY
Let XX be with mean EX<\mathbb{E}|X| < \infty, and let φ:RR\varphi:\mathbb{R}\to\mathbb{R} be convex with Eφ(X)<\mathbb{E}|\varphi(X)|<\infty. Then
φ(EX)Eφ(X).\begin{equation*}\varphi(\mathbb{E}X) \leq \mathbb{E}\varphi(X).\end{equation*}
\quad Proof. By convexity, there exists a linear function ax+bax+b such that ax+bφ(x)ax+b\leq \varphi(x) for all xx and aEX+b=φ(EX)a\mathbb{E}X + b=\varphi(\mathbb{E}X). Therefore
φ(EX)=aEX+b=E(aX+b)Eφ(X).\begin{equation*}\varphi(\mathbb{E}X)=a\mathbb{E}X+b=\mathbb{E}(aX+b)\leq \mathbb{E}\varphi(X).\end{equation*}
This proves Jensen's inequality. \Box
CAUCHY–SCHWARZ INEQUALITY
If XX and YY are square-integrable random variables, then
E(XY)(EX2)1/2(EY2)1/2.\begin{equation*}\left|\mathbb{E}(XY)\right| \leq \left(\mathbb{E}X^2\right)^{1/2}\left(\mathbb{E}Y^2\right)^{1/2}.\end{equation*}
\quad Proof. For every λR\lambda\in\mathbb{R},
0E(λXY)2  =EX2aλ2+(2E(XY))bλ+EY2c.\begin{equation*}0\leq \mathbb{E}(\lambda X-Y)^2\;=\underbrace{\mathbb{E}X^2}_{a}\lambda^2+\underbrace{\left(-2\mathbb{E}(XY)\right)}_{b}\lambda+\underbrace{\mathbb{E}Y^2}_{c}.\end{equation*}
For a quadratic polynomial aλ2+bλ+ca\lambda^2+b\lambda+c to be nonnegative on all R\mathbb{R}, its discriminant must be nonpositive:
b24ac0(2E(XY))24EX2EY20.\begin{equation*}b^2-4ac\leq 0\quad\Longrightarrow\quad\left(-2\mathbb{E}(XY)\right)^2-4\mathbb{E}X^2\,\mathbb{E}Y^2\leq 0.\end{equation*}
Therefore
(E(XY))2EX2EY2Taking square rootsE(XY)(EX2)1/2(EY2)1/2.\begin{equation*}\left(\mathbb{E}(XY)\right)^2\leq \mathbb{E}X^2\,\mathbb{E}Y^2\overset{\text{Taking square roots}}{\Longrightarrow}\left|\mathbb{E}(XY)\right|\leq \left(\mathbb{E}X^2\right)^{1/2}\left(\mathbb{E}Y^2\right)^{1/2}. \quad \Box\end{equation*}
PALEY–ZYGMUND INEQUALITY
Let X0X\geq 0 be a random variable with EX>0\mathbb{E}X>0 and EX2<\mathbb{E}X^2<\infty. Then for every θ(0,1)\theta\in(0,1),
P ⁣(XθEX)(1θ)2(EX)2EX2.\begin{equation*}\mathbb{P}\!\left(X\geq \theta\mathbb{E}X\right)\geq (1-\theta)^2\frac{(\mathbb{E}X)^2}{\mathbb{E}X^2}.\end{equation*}
\quad Proof. Let A:={XθEX}A:=\{X\geq \theta\mathbb{E}X\} and decompose XX and EX\mathbb{E}X
X=X1A+X1AcEX=E(X1A)+E(X1Ac).\begin{equation*}X=X\mathbf{1}_{A}+X\mathbf{1}_{A^c} \quad\Longrightarrow\quad \mathbb{E}X=\mathbb{E}(X\mathbf{1}_{A})+\mathbb{E}(X\mathbf{1}_{A^c}).\end{equation*}
On AcA^c we have X<θEXX<\theta\mathbb{E}X, so
E(X1Ac)θEX.E(X1A)(1θ)EX.\begin{equation*}\mathbb{E}(X\mathbf{1}_{A^c})\leq \theta\mathbb{E}X. \quad\Longrightarrow\quad \mathbb{E}(X\mathbf{1}_{A})\geq (1-\theta)\mathbb{E}X.\end{equation*}
Apply Cauchy–Schwarz to X1AX\mathbf{1}_{A} and 1A\mathbf{1}_{A}:
E(X1A)2E(X2)E(1A2)=E(X2)P(A).\begin{equation*}\mathbb{E}(X\mathbf{1}_{A})^2 \leq \mathbb{E}(X^2)\mathbb{E}(\mathbf{1}_{A}^2)=\mathbb{E}(X^2)\mathbb{P}(A).\end{equation*}
Combining with E(X1A)(1θ)EX\mathbb{E}(X\mathbf{1}_{A})\geq (1-\theta)\mathbb{E}X gives
(1θ)2(EX)2E(X2)P(A)P ⁣(XθEX)=P(A)(1θ)2(EX)2EX2\begin{equation*}(1-\theta)^2(\mathbb{E}X)^2\leq \mathbb{E}(X^2)\mathbb{P}(A)\quad\Longrightarrow\quad\mathbb{P}\!\left(X\geq \theta\mathbb{E}X\right)=\mathbb{P}(A)\geq (1-\theta)^2\frac{(\mathbb{E}X)^2}{\mathbb{E}X^2} \quad \Box\end{equation*}
CHERNOFF BOUND
Let XX be a random variable such that EeλX<\mathbb{E}e^{\lambda X}<\infty for some λ>0\lambda>0. Then for every ε>0\varepsilon>0,
P(Xε)eλεEeλX.\begin{equation*}\mathbb{P}(X \geq \varepsilon) \leq e^{-\lambda \varepsilon}\mathbb{E}e^{\lambda X}.\end{equation*}
Consequently,
P(Xε)infλ>0(eλεEeλX).\begin{equation*}\mathbb{P}(X \geq \varepsilon) \leq \inf_{\lambda>0} \left(e^{-\lambda \varepsilon}\mathbb{E}e^{\lambda X}\right).\end{equation*}
\quad Proof. Apply Markov's inequality to the nonnegative random variable eλXe^{\lambda X}:
P(Xε)=P(eλXeλε)EeλXeλε.\begin{equation*}\mathbb{P}(X \geq \varepsilon)=\mathbb{P}\left(e^{\lambda X}\geq e^{\lambda \varepsilon}\right)\leq \frac{\mathbb{E}e^{\lambda X}}{e^{\lambda \varepsilon}}.\end{equation*}
Because this bound holds for each λ>0\lambda>0, we can take the infimum over λ>0\lambda>0 and get the sharpest bound:
P(Xε)infλ>0(eλεEeλX)\begin{equation*}\mathbb{P}(X \geq \varepsilon)\leq \inf_{\lambda>0}\left(e^{-\lambda\varepsilon}\mathbb{E}e^{\lambda X}\right) \quad \Box\end{equation*}
Let XN(μ,σ2)X \sim \mathcal{N}(\mu,\sigma^2) on R\mathbb{R}, and fix ε>0\varepsilon>0. Apply Chernoff to the centered variable Z:=XμZ:=X-\mu:
P(Xμε)=P(Zε)eλεEeλZ,λ>0.\begin{equation*}\mathbb{P}(X-\mu \geq \varepsilon)=\mathbb{P}(Z \geq \varepsilon)\leq e^{-\lambda\varepsilon}\mathbb{E}e^{\lambda Z}, \qquad \lambda>0.\end{equation*}
For a Gaussian, the centered moment generating function is
EeλZ=exp(λ2σ22)P(Xμε)exp(λε+λ2σ22),λ>0.\begin{equation*}\mathbb{E}e^{\lambda Z}=\exp\left(\frac{\lambda^2\sigma^2}{2}\right) \quad \Longrightarrow \quad \mathbb{P}(X-\mu \geq \varepsilon)\leq \exp\left(-\lambda\varepsilon+\frac{\lambda^2\sigma^2}{2}\right), \qquad \lambda>0.\end{equation*}
Now, we need to minimize
f(λ):=λε+λ2σ22,f(λ)=ε+λσ2,f(λ)=σ2>0,\begin{equation*}f(\lambda):=-\lambda\varepsilon+\frac{\lambda^2\sigma^2}{2},\qquad f'(\lambda)=-\varepsilon+\lambda\sigma^2,\qquad f''(\lambda)=\sigma^2>0,\end{equation*}
so the minimizer is
λ=εσ2.\begin{equation*}\lambda_*=\frac{\varepsilon}{\sigma^2}.\end{equation*}
Substituting λ\lambda_* gives the optimized Chernoff bound
P(Xμε)exp(ε22σ2).\begin{equation*}\mathbb{P}(X-\mu \geq \varepsilon)\leq \exp\left(-\frac{\varepsilon^2}{2\sigma^2}\right).\end{equation*}
Before proving the Hoeffding's inequality we need the following lemma.
HOEFFDING'S LEMMA
If a random variable YY satisfies EY=0\mathbb{E}Y=0 and aYba \leq Y \leq b almost surely, then for every λR\lambda \in \mathbb{R},
EeλYexp(λ2(ba)28).\begin{equation*}\mathbb{E}e^{\lambda Y} \leq \exp\left(\frac{\lambda^2(b-a)^2}{8}\right).\end{equation*}
\quad Proof. Fix λR\lambda \in \mathbb{R}. By convexity of xeλxx \mapsto e^{\lambda x}, for each y[a,b]y \in [a,b],
bybaeλa+yabaeλb=1eλybybaeλa+yabaeλb.\begin{equation*}\frac{b-y}{b-a}e^{\lambda a}+\frac{y-a}{b-a}e^{\lambda b} = 1 \quad \Longrightarrow \quad e^{\lambda y} \leq \frac{b-y}{b-a}e^{\lambda a}+\frac{y-a}{b-a}e^{\lambda b}.\end{equation*}
Taking expectation and using EY=0\mathbb{E}Y=0,
EeλYbbaeλaabaeλb.\begin{equation*}\mathbb{E}e^{\lambda Y} \leq \frac{b}{b-a}e^{\lambda a}-\frac{a}{b-a}e^{\lambda b}.\end{equation*}
Set
p:=aba[0,1],u:=λ(ba).\begin{equation*}p:=-\frac{a}{b-a}\in[0,1],\qquad u:=\lambda(b-a).\end{equation*}
Then the right-hand side is
(1p)epu+pe(1p)u.\begin{equation*}(1-p)e^{-pu}+pe^{(1-p)u}.\end{equation*}
Define
h(u):=log((1p)epu+pe(1p)u).\begin{equation*}h(u):=\log\left((1-p)e^{-pu}+pe^{(1-p)u}\right).\end{equation*}
We have h(0)=0h(0)=0 and h(0)=0h'(0)=0. A direct computation gives
h(u)=p(1p)eu(1p+peu)214.\begin{equation*}h''(u)=\frac{p(1-p)e^u}{(1-p+pe^u)^2}\leq \frac14.\end{equation*}
Hence, by Taylor's formula with remainder bound,
h(u)u28.\begin{equation*}h(u)\leq \frac{u^2}{8}.\end{equation*}
Therefore
EeλYeh(u)exp(u28)=exp(λ2(ba)28).\begin{equation*}\mathbb{E}e^{\lambda Y}\leq e^{h(u)}\leq \exp\left(\frac{u^2}{8}\right)=\exp\left(\frac{\lambda^2(b-a)^2}{8}\right).\end{equation*}
The lemma is proved. \Box
HOEFFDING'S INEQUALITY
Let X1,,XnX_1,\ldots,X_n be independent random variables such that aiXibia_i \leq X_i \leq b_i almost surely. Let μi:=EXi\mu_i:=\mathbb{E}X_i and define
Sn:=i=1n(Xiμi).\begin{equation*}S_n:=\sum_{i=1}^n\left(X_i-\mu_i\right).\end{equation*}
Then for every ε>0\varepsilon>0,
P(Snε)exp(2ε2i=1n(biai)2),\begin{equation*}\mathbb{P}(S_n \geq \varepsilon) \leq \exp\left(-\frac{2\varepsilon^2}{\sum_{i=1}^n(b_i-a_i)^2}\right),\end{equation*}
\quad Proof of Hoeffding's inequality. Let
Yi:=Xiμi,Sn=i=1nYi.\begin{equation*}Y_i:=X_i-\mu_i,\qquad S_n=\sum_{i=1}^n Y_i.\end{equation*}
Then
aiμiYibiμi,(biμi)(aiμi)=biai.\begin{equation*}a_i-\mu_i \leq Y_i \leq b_i-\mu_i,\qquad (b_i-\mu_i)-(a_i-\mu_i)=b_i-a_i.\end{equation*}
For λ>0\lambda>0, Chernoff's bound gives
P(Snε)eλεEeλSn=eλεi=1nEeλYi,\begin{equation*}\mathbb{P}(S_n\geq \varepsilon)\leq e^{-\lambda\varepsilon}\mathbb{E}e^{\lambda S_n}=e^{-\lambda\varepsilon}\prod_{i=1}^n \mathbb{E}e^{\lambda Y_i},\end{equation*}
where we used independence. Applying Hoeffding's lemma to each YiY_i:
EeλYiexp(λ2(biai)28)P(Snε)exp(λε+λ28i=1n(biai)2B).\begin{equation*}\mathbb{E}e^{\lambda Y_i}\leq \exp\left(\frac{\lambda^2(b_i-a_i)^2}{8}\right)\Longrightarrow\mathbb{P}(S_n\geq \varepsilon)\leq \exp\left(-\lambda\varepsilon+\frac{\lambda^2}{8}\underbrace{\sum_{i=1}^n(b_i-a_i)^2}_{B}\right).\end{equation*}
The exponent is minimized at λ=4ε/B\lambda=4\varepsilon/B, and substitution yields
P(Snε)exp(2ε2B)\begin{equation*}\mathbb{P}(S_n\geq \varepsilon)\leq \exp\left(-\frac{2\varepsilon^2}{B}\right) \qquad \Box\end{equation*}
BERNSTEIN BOUND
Let XX satisfy EX=0\mathbb{E}X=0, XM|X|\leq M almost surely, and let σ2=Var(X)\sigma^2=\operatorname{Var}(X). Then for every λ[0,3/M)\lambda \in [0,3/M),
EeλXexp(λ2σ22(1λM/3)).\begin{equation*}\mathbb{E}e^{\lambda X}\leq \exp\left(\frac{\lambda^2\sigma^2}{2(1-\lambda M/3)}\right).\end{equation*}
\quad Proof. Since EX=0\mathbb{E}X=0,
EeλX=1+k=2λkE(Xk)k!1+k=2λkE(Xk)k!.\begin{equation*}\mathbb{E}e^{\lambda X}=1+\sum_{k=2}^{\infty}\frac{\lambda^k\mathbb{E}(X^k)}{k!}\leq 1+\sum_{k=2}^{\infty}\frac{\lambda^k\mathbb{E}(|X|^k)}{k!}.\end{equation*}
Because XM|X|\leq M, for k2k\geq 2 we have XkMk2X2|X|^k\leq M^{k-2}X^2, hence
E(Xk)Mk2E(X2)=Mk2σ2.\begin{equation*}\mathbb{E}(|X|^k)\leq M^{k-2}\mathbb{E}(X^2)=M^{k-2}\sigma^2.\end{equation*}
Therefore
EeλX1+σ2M2k=2(λM)kk!=1+σ2M2(eu1u),u:=λM.\begin{equation*}\mathbb{E}e^{\lambda X}\leq 1+\frac{\sigma^2}{M^2}\sum_{k=2}^{\infty}\frac{(\lambda M)^k}{k!}=1+\frac{\sigma^2}{M^2}\left(e^{u}-1-u\right),\qquad u:=\lambda M.\end{equation*}
Now for u[0,3)u\in[0,3),
eu1u=k=2ukk!k=2uk23k2=u22j=0(u3)j=u22(1u/3),\begin{equation*}e^u-1-u=\sum_{k=2}^{\infty}\frac{u^k}{k!}\leq \sum_{k=2}^{\infty}\frac{u^k}{2\cdot 3^{k-2}}=\frac{u^2}{2}\sum_{j=0}^{\infty}\left(\frac{u}{3}\right)^j=\frac{u^2}{2(1-u/3)},\end{equation*}
because k!23k2k!\geq 2\cdot 3^{k-2} for all k2k\geq 2. Plugging this in,
EeλX1+λ2σ22(1λM/3)exp(λ2σ22(1λM/3)).\begin{equation*}\mathbb{E}e^{\lambda X}\leq 1+\frac{\lambda^2\sigma^2}{2(1-\lambda M/3)}\leq \exp\left(\frac{\lambda^2\sigma^2}{2(1-\lambda M/3)}\right).\end{equation*}
The lemma is proved. \Box
BERNSTEIN'S INEQUALITY
Let X1,,XnX_1,\ldots,X_n be independent random variables with means EXi=0\mathbb{E}X_i=0 and XiM|X_i|\leq M almost surely. Put
Sn:=i=1nXi,Vn:=i=1nVar(Xi).\begin{equation*}S_n:=\sum_{i=1}^n X_i,\qquad V_n:=\sum_{i=1}^n \operatorname{Var}(X_i).\end{equation*}
Then for every ε>0\varepsilon>0,
P(Snε)exp(ε22(Vn+Mε3)).\begin{equation*}\mathbb{P}(S_n \geq \varepsilon) \leq \exp\left(-\frac{\varepsilon^2}{2\left(V_n + \frac{M\varepsilon}{3}\right)}\right).\end{equation*}
\quad Proof of Bernstein's inequality. For λ(0,3/M)\lambda\in(0,3/M), Chernoff's bound gives
P(Snε)eλεEeλSn=eλεi=1nEeλXi.\begin{equation*}\mathbb{P}(S_n\geq\varepsilon)\leq e^{-\lambda\varepsilon}\mathbb{E}e^{\lambda S_n}=e^{-\lambda\varepsilon}\prod_{i=1}^n\mathbb{E}e^{\lambda X_i}.\end{equation*}
Applying the lemma to each XiX_i (with σi2=Var(Xi)\sigma_i^2=\operatorname{Var}(X_i)),
P(Snε)exp(λε+λ22(1λM/3)i=1nσi2)=exp(λε+λ2Vn2(1λM/3)).\begin{equation*}\mathbb{P}(S_n\geq\varepsilon)\leq\exp\left(-\lambda\varepsilon+\frac{\lambda^2}{2(1-\lambda M/3)}\sum_{i=1}^n\sigma_i^2\right)=\exp\left(-\lambda\varepsilon+\frac{\lambda^2V_n}{2(1-\lambda M/3)}\right).\end{equation*}
Choose
λ:=εVn+Mε/3,\begin{equation*}\lambda_*:=\frac{\varepsilon}{V_n+M\varepsilon/3},\end{equation*}
for which λ<3/M\lambda_*<3/M. Substituting λ\lambda_* yields
P(Snε)exp(ε22(Vn+Mε3)).\begin{equation*}\mathbb{P}(S_n\geq\varepsilon)\leq\exp\left(-\frac{\varepsilon^2}{2\left(V_n+\frac{M\varepsilon}{3}\right)}\right).\end{equation*}
Applying the same argument to Xi-X_i and using the union bound gives the two-sided inequality. \Box
AZUMA–HOEFFDING INEQUALITY
Let {(Mk,Fk)}k=0n\{(M_k,\mathcal{F}_k)\}_{k=0}^n be a martingale and suppose there are constants ckc_k such that
MkMk1cka.s., 1kn.\begin{equation*}|M_k - M_{k-1}| \leq c_k \quad \text{a.s., } 1 \leq k \leq n.\end{equation*}
Then for every ε>0\varepsilon>0,
P(MnM0ε)exp(ε22k=1nck2).\begin{equation*}\mathbb{P}(M_n-M_0 \geq \varepsilon) \leq \exp\left(-\frac{\varepsilon^2}{2\sum_{k=1}^n c_k^2}\right).\end{equation*}
\quad Proof. Define martingale differences for 1kn 1\leq k\leq n
Dk:=MkMk1E(DkFk1)=0.\begin{equation*}D_k:=M_k-M_{k-1} \quad\Longrightarrow\quad \mathbb{E}(D_k\mid\mathcal{F}_{k-1})=0.\end{equation*}
Applying Hoeffding's lemma on the interval [ck,ck][-c_k,c_k] gives, for every λ>0\lambda>0,
E ⁣(eλDkFk1)exp(λ2ck22).\begin{equation*}\mathbb{E}\!\left(e^{\lambda D_k}\mid\mathcal{F}_{k-1}\right)\leq\exp\left(\frac{\lambda^2c_k^2}{2}\right).\end{equation*}
Now iterate:
Eeλ(MnM0)=Eeλk=1nDk=E ⁣[eλk=1n1DkE ⁣(eλDnFn1)]exp(λ2cn22)Eeλk=1n1Dkexp(λ22k=1nck2).\begin{equation*}\mathbb{E}e^{\lambda(M_n-M_0)} =\mathbb{E}e^{\lambda\sum_{k=1}^n D_k}=\mathbb{E}\!\left[e^{\lambda\sum_{k=1}^{n-1}D_k}\mathbb{E}\!\left(e^{\lambda D_n}\mid\mathcal{F}_{n-1}\right)\right]\leq \exp\left(\frac{\lambda^2c_n^2}{2}\right)\mathbb{E}e^{\lambda\sum_{k=1}^{n-1}D_k}\leq \cdots \leq\exp\left(\frac{\lambda^2}{2}\sum_{k=1}^n c_k^2\right).\end{equation*}
Applying Chernoff's bound:
P(MnM0ε)exp(λε+λ22k=1nck2).\begin{equation*}\mathbb{P}(M_n-M_0\geq\varepsilon)\leq\exp\left(-\lambda\varepsilon+\frac{\lambda^2}{2}\sum_{k=1}^n c_k^2\right).\end{equation*}
Minimizing over λ>0\lambda>0 with λ=ε/k=1nck2\lambda=\varepsilon/\sum_{k=1}^n c_k^2 gives
P(MnM0ε)exp(ε22k=1nck2)\begin{equation*}\mathbb{P}(M_n-M_0\geq\varepsilon)\leq\exp\left(-\frac{\varepsilon^2}{2\sum_{k=1}^n c_k^2}\right) \qquad \Box\end{equation*}
Kolmogorov's inequality is one of the main inequalities in probability theory; it gives an upper bound for the maximum of partial sums of independent mean-zero random variables.
KOLMOGOROV’S MAXIMAL INEQUALITY
Let X1,X2,,XnX_1, X_2, \ldots, X_n be independent random variables with EXi=0,EXi2<\mathbb{E} X_i=0, \, \mathbb{E} X_i^2<\infty, ini \leq n. If Sn=X1++XnS_n = X_1 + \dots + X_n then
P(max1knSkε)ESn2ε2.\begin{equation*}\mathbb{P} \left( \max _{1 \leq k \leq n}\left|S_k \right| \geq \varepsilon \right) \leq \frac{\mathbb{E} S_n^2}{\varepsilon^2} .\end{equation*}
\quad Proof. We put
A={max1knSkε},Ak={Si<ε,i=1,,k1,Skε},1kn,\begin{align*}A & =\left\{\max_{1 \leq k \leq n} \left|S_k\right| \geq \varepsilon\right\}, \\A_k & =\left\{\left|S_i\right|<\varepsilon, i=1, \ldots, k-1,\left|S_k\right| \geq \varepsilon\right\}, \quad 1 \leq k \leq n ,\end{align*}
i.e., we break things down according to the time that Sk\left|S_k\right| first exceeds ε\varepsilon. Then AkAj=,jkA_k \cap A_j = \emptyset, \, j \neq k and A=k=1nAkA= \bigcup_{k=1}^{n} A_k,
ESn2ESn2IA=k=1nESn2IAk\begin{equation*}\mathbb{E} S_n^2 \geq \mathbb{E} S_n^2 I_A=\sum_{k = 1}^{n} \mathbb{E} S_n^2 I_{A_k}\end{equation*}
But
ESn2IAk=E(Sk+(Xk+1++Xn))2IAk=ESk2IAk+2ESk(Xk+1++Xn)IAk=0+E(Xk+1++Xn)2IAk0ESk2IAk.\begin{equation*}\mathbb{E} S_n^2 I_{A_k}=\mathbb{E}\left(S_k+\left(X_{k+1}+\cdots+X_n\right)\right)^2 I_{A_k}=\mathbb{E} S_k^2 I_{A_k}+\underbrace{2 \mathbb{E} S_k\left(X_{k+1}+\cdots+X_n\right) I_{A_k}}_{=0}+\underbrace{\mathbb{E}\left(X_{k+1}+\cdots+X_n\right)^2 I_{A_k}}_{\ge 0}\geq \mathbb{E} S_k^2 I_{A_k}.\end{equation*}
since
ESk(Xk+1++Xn)IAk=ESkIAkE(Xk+1++Xn)=0\begin{equation*}\mathbb{E} S_k\left(X_{k+1}+\cdots+X_n\right) I_{A_k}=\mathbb{E} S_k I_{A_k} \cdot \mathbb{E}\left(X_{k+1}+\cdots+X_n\right)=0\end{equation*}
because of independence and the conditions EXi=0,in\mathbb{E} X_i=0, i \leq n. Hence
ESn2k=1nESk2IAkε2k=1nP(Ak)=ε2P(max1knSkε)\begin{equation*}\mathbb{E} S_n^2 \geq \sum_{k = 1}^{n} \mathbb{E}S_k^2 I_{A_k} \geq \varepsilon^2 \sum_{k = 1}^{n} \mathbb{P}\left(A_k\right)=\varepsilon^2 \mathbb{P} \left( \max _{1 \leq k \leq n}\left|S_k \right| \geq \varepsilon \right) \qquad \Box\end{equation*}

References