\renewcommand{}{}

N or Space for next slide, P for previous slide, F for full screen, M for menu

Lec 18: Probability 101

Boaz Barak (instructor), Brian Sapozhnikov (Head TF), Albert Chalom, Alexis Ross, Charles O’Mara, Daniel Chen, David Li , Jambay Kinley, Jane Ahn, John Shen, Josh Seides, Laura Pierson, Pranay Tankala, Raymond Lin, Theresa Nguyen, Wanqian Yang, William Fu, Hikari Sorensen (Extension TF), Eric Lu (Patel Fellow)

Laptops/phones only on last 5 rows please.

What is probability?

Bayesians: Probability of $A$ is your belief that $A$ will occur. e.g., $\mathbb{P}[\text{"rain tomorrow"}]=0.3$.

Frequentists: Probability of $A$ is the fraction of times (i.e., frequency) $A$ occurs if we were to repeat experiment. e.g., $\mathbb{P}[\text{"intervention better than control"}] \lt 0.05$

Mathematics: Probability of $A \subseteq \Omega$ is $\tfrac{|A|}{|\Omega|}$. (More generally: $\mathbb{P}[A]=\sum_{a\in A}\mu(a)$ where $\mu:\Omega\rightarrow [0,1]$, $\sum_{x\in \Omega}\mu(x)=1$, or $\mathbb{P}[A]=\int_{\Omega} 1_A d\mu$.)

For us: $\Omega = \{0,1\}^n$: tossing $n$ coins. $\mathbb{P}[A]=|A|/2^n$.

What you need to know

What is the sample space (usually $\{0,1\}^m$)
Events, union bound.
Random variables and expectation.
Conditioning and independence, Bayes’ rule
Concentration: Markov and Chernoff
Amplification

Less important for us: Continuous random variables, probability density functions, specific families such as Binomial, Geometric, Exponential, Negative Binomial, Hypergeometric, Poisson, Uniform, Normal, Gamma, Beta, chi-squared, student-t.

Question

What is $\mathbb{P}_{x \sim \{0,1\}^n}[ \text{$4$ divides $n(x)$}]$ ($n(x)$: number $x$ represents in binary)

a. $1/2$

b. $1/4$

c. $1/8$

d. $1/3$

Union bound

Thm: If $\mathbb{P}[A] \leq 0.1$ and $\mathbb{P}[B] \leq 0.1$ then $\mathbb{P}[A \cup B] \leq 0.2$

Random variables

Def: Random variable is $X:\{0,1\}^n \rightarrow \mathbf{R}$

Expectation: $\mathbb{E}[X] = 2^{-n}\sum_{x\in \{0,1\}^n} X(x)$

Problem: Prove that if $X(x) \leq Y(x)$ for every $x$ then $\mathbb{E}[X] \leq \mathbb{E}[Y]$.

Random variables

Def: Random variable is $X:\{0,1\}^n \rightarrow \mathbf{R}$

Expectation: $\mathbb{E}[X] = 2^{-n}\sum_{x\in \{0,1\}^n} X(x)$

Problem: Prove that if $X(x) \leq Y(x)$ for every $x$ then $\mathbb{E}[X] \leq \mathbb{E}[Y]$.

Random variables

Def: Random variable is $X:\{0,1\}^n \rightarrow \mathbf{R}$

Expectation: $\mathbb{E}[X] = 2^{-n}\sum_{x\in \{0,1\}^n} X(x)$

Problem: Prove that if $X(x) \leq Y(x)$ for every $x$ then $\mathbb{E}[X] \leq \mathbb{E}[Y]$.

Proof:

\[\mathbb{E}[X] = 2^{-n}\sum_{x\in \{0,1\}^n} X(x) \leq 2^{-n}\sum_{x\in \{0,1\}^n} Y(x) \leq \mathbb{E}[Y]\]

“Typical / likely” value of random variables

Expectation / mean: $\mathbb{E}[X] = 2^{-n}\sum_{x\in \{0,1\}^n} X(x)$
Median: Smallest $\alpha\in \mathbf{R}$ such that $\mathbb{P}[ X \geq \alpha ] \gt \geq 1/2$
Mode(s): $\alpha \in \mathbf{R}$ maximizing $\mathbb{P}[ X = \alpha]$
95% Confidence interval: minimum interval $\alpha,\beta$ such that $\mathbb{P}[ \alpha \leq X \leq \beta ] \geq 0.95$

If $X$ is “nice” then these are all similar. Proven using tail bounds

Often: With good probability $X$ is within $\mathbb{E}[X] \pm \sigma[X]$, where $\sigma[X] = \sqrt{Var[X]}$. ($Var[X] = \mathbb{E}[(X-\mu)^2]$ for $\mu = \mathbb{E}[X]$.)

Markov’s Inequality

Thm: If $X \geq 0$ and $\mu = \mathbb{E}[X]$ then $\mathbb{P}[X \geq k\mu ] \leq 1/k$

Problem: Prove special case: If $X \geq 0$ and $\mathbb{E}[X]=1$ then $\mathbb{P}[X \geq 100 ] \leq 0.1$

Markov’s Inequality

Thm: If $X \geq 0$ and $\mu = \mathbb{E}[X]$ then $\mathbb{P}[X \geq k\mu ] \leq 1/k$

Problem: Prove special case: If $X \geq 0$ and $\mathbb{E}[X]=1$ then $\mathbb{P}[X \geq 100 ] \leq 0.1$

Drunkard’s walk

Line with locations $\ldots,-3,-2,-1,0,+1,+2,+3,\ldots$

Person starts at position $0$, and each step goes right or left with probability $1/2$ independently.

Drunkard’s walk

Person starts at position $0$, and each step changes by $+1$ or $-1$ with probability $1/2$ independently.

Interesting fact: After $m$ steps will likely be in position that is roughly $\pm \sqrt{m}$.

Intuition: Define $X_i = \begin{cases} +1 & \text{step $i$ is left} \\ -1 & \text{otherwise} \end{cases}$

Position at time $m$ is r.v. $X = \sum_{i=0}^{m-1} X_i$.

$\mathbb{E}[X] = \mathbb{E}[\sum_i X_i] = \sum_i \mathbb{E}[X_i] = 0$

$Var[X]= \mathbb{E}[X^2]$ $= \mathbb{E}[\sum_{i,j}X_iX_j]$ $= \sum_{i,j}\mathbb{E}[X_iX_j]$ $= \sum_{i=j}\mathbb{E}[X_i^2] + \sum_{i \neq j} \mathbb{E}[X_i]\mathbb{E}[X_j]$ $=m \cdot \tfrac{1}{2} + 0$

Hence if $X$ is “nice” then it would “typically” be in the range $0 \pm O(\sqrt{m})$.

Drunkard’s walk

Thm 1: After $m$ steps, $\mathbb{P}[ |\text{position}| \lt 1000 \sqrt{m}] \geq 0.99$

Thm 2: After $m$ steps, $\mathbb{P}[ |\text{position}| \gt 0.001 \sqrt{m}] \geq 0.99$

Proof of Theorem 1: Since $\mathbb{E}[X^2] = m/2$, $\mathbb{P}[ X^2 \gt 10^6m] \lt 0.001$ by Markov.

Proof of Theorem 2: Version with $0.1$ instead of $0.99$ follows from Paley Zygmund inequality: for $Y\geq 0$, $0 \leq \epsilon \leq 1$, $\mathbb{P}[ Y \geq \epsilon \mathbb{E}Y ] \geq (1-\epsilon)^2\tfrac{\mathbb{E}[Y]^2}{\mathbb{E}[Y^2]}$.

Use that for $Y = X^2$, $\epsilon = 1/100$, together with $(*) \mathbb{E}[Y^2]=\mathbb{E}[X^4]=3n^2$.

(Prove $(*)$ using similar calculation as before)

2SAT Algorithm

Input: 2SAT formula $\varphi$ with $m$ constraints on $n$ variables.

Operation:

Choose $x \sim \{0,1\}^n$
Repeat for $100 n^2$ times:
1. If all clauses satisfied then return $x$.
2. Pick random unsatisfied $(\neg)x_i \vee (\neg) x_j$ and flip at random either $x_i$ or $x_j$

Analysis: Suppose $\varphi$ has satisfying assignment $x^*$. Let $Z_i$ be number of variables on which $x$ and $x^*$ agree at step $i$ of computation. If don’t halt $Z_{i+1}= Z_i \pm 1$.

If clause $(\neg)x_i \vee (\neg) x_j$ is unsatisfied then $x$ disagrees with $x^*$ in at least one of $x_i,x_j$. $\Rightarrow \mathbb{P}[Z_{i+1}=Z_i+1] \geq 1/2$

Worst case: $Z_{i+1}=Z_i - 1$ w.p. $1/2$, $Z_{i+1}=Z_i+1$ w.p. 1/2 drunkard’s walk!

With constant probability will reach distance $n$ from beginning after $100n^2$ steps.

Correlation and concentration

$\mathbb{P}[ \text{Trump wins PA ∧ FL ∧ OH ∧ MI}] \sim 0.4^4 \sim 0.0256$

Correlation and concentration

Random variables

Let $X_0,\ldots,X_{n-1}$ be i.i.d $0/1$ r.v.’s such that $\mathbb{P}[X_i=1]=1/2$ for every $i$.

Let $Y_0,\ldots,Y_{n-1}$ be perfectly correlated $0/1$ r.v.’s such that $\mathbb{P}[Y_i=1]=1/2$ for every $i$.

Which of following is true:

a. $\mathbb{E}[X] \gt \mathbb{E}[Y]$

b. $\mathbb{E}[X] \lt \mathbb{E}[Y]$

c. $\mathbb{E}[X]=\mathbb{E}[Y]$ but $\mathbb{P}[ |X-\mathbb{E}[X]| \gt n/4 ] \gt \mathbb{P}[ |Y - \mathbb{E}[Y] | \gt n/4 ]$

d. $\mathbb{E}[X]=\mathbb{E}[Y]$ but $\mathbb{P}[ |X-\mathbb{E}[X]| \gt n/4 ] \lt \mathbb{P}[ |Y - \mathbb{E}[Y] | \gt n/4 ]$

Random variables

Let $X_0,\ldots,X_{n-1}$ be i.i.d $0/1$ r.v.’s such that $\mathbb{P}[X_i=1]=1/2$ for every $i$.

Let $Y_0,\ldots,Y_{n-1}$ be perfectly correlated $0/1$ r.v.’s such that $\mathbb{P}[Y_i=1]=1/2$ for every $i$.

Which of following is true:

a. $\mathbb{E}[X] \gt \mathbb{E}[Y]$

b. $\mathbb{E}[X] \lt \mathbb{E}[Y]$

c. $\mathbb{E}[X]=\mathbb{E}[Y]$ but $\mathbb{P}[ |X-\mathbb{E}[X]| \gt n/4 ] \gt \mathbb{P}[ |Y - \mathbb{E}[Y] | \gt n/4 ]$

d. $\mathbb{E}[X]=\mathbb{E}[Y]$ but $\mathbb{P}[ |X-\mathbb{E}[X]| \gt n/4 ] \lt \mathbb{P}[ |Y - \mathbb{E}[Y] | \gt n/4 ]$

Random variables

Let $X_0,\ldots,X_{n-1}$ be i.i.d $0/1$ r.v.’s such that $\mathbb{P}[X_i=1]=1/2$ for every $i$.

Let $Y_0,\ldots,Y_{n-1}$ be perfectly correlated $0/1$ r.v.’s such that $\mathbb{P}[Y_i=1]=1/2$ for every $i$.

Which of following is true:

a. $\mathbb{E}[X] \gt \mathbb{E}[Y]$

b. $\mathbb{E}[X] \lt \mathbb{E}[Y]$

c. $\mathbb{E}[X]=\mathbb{E}[Y]$ but $\mathbb{P}[ |X-\mathbb{E}[X]| \gt n/4 ] \gt \mathbb{P}[ |Y - \mathbb{E}[Y] | \gt n/4 ]$

d. $\mathbb{E}[X]=\mathbb{E}[Y]$ but $\mathbb{P}[ |X-\mathbb{E}[X]| \gt n/4 ] \lt \mathbb{P}[ |Y - \mathbb{E}[Y] | \gt n/4 ]$

$\mathbb{E}[X] = \sum \mathbb{E}[X_i] = n(1/2)= n/2$

$\mathbb{E}[Y] = \sum_i \mathbb{E}[Y_i] = n(1/2)=n/2$.

$\mathbb{P}[|Y-n/2| \gt n/4] = 1$.

BY Chernoff $\mathbb{P}[|X-n/2| \gt n/4] = 2^{-\Omega(n)}$

Chernoff Bound

Thm: If $X_0,\ldots,X_{n-1}$ i.i.d $0 \leq X_i \leq 1$, then
\[\mathbb{P}\bigl[ \bigl|\sum_{i=0}^{n-1}X_i - \mathbb{E}[\sum X_i] \bigr| \geq \epsilon n \bigr] \leq 2e^{-2\epsilon^2 n}\]

Learning / hypothesis testing interpretation: From samples of $\mu$-coin, rule out NULL hypothesis that coin is $\mu -\epsilon$ after $O(1/\epsilon^2)$ samples.

$Var[X] = \sum Var[X_i] \leq n$. $\sigma[X] \leq \sqrt{n}$

$\mathbb{P}[ |X - \mu | \gt k\sigma ] \leq exp(-k^2)$ (set $k=\epsilon \sqrt{n}$)

Amplifying success of algorithms

Alg $SOLVE2SAT$ finds satisfying assignment with probability at least $0.1$.

Crucial observation: Sample space for probability is coins of the algorithm. not the inputs!

If Alg uses $m$ coins, then for every satisfiable $\varphi$ there is $BAD_{\varphi} \subseteq \{0,1\}^m$ s.t.

$|BAD| \leq 0.9 \cdot 2^m$
$r \not\in BAD_\varphi$ then $SOLVE2SAT(\varphi;r)$ satisfies $\varphi$.

Corollary: If we repeat $k$ times with independent coins then will find sat assignment with prob at least $1-0.9^k$.

Proof: Choose independent $r_0,\ldots,r_{k-1} \in \{0,1\}^m$, and let $B_i$ event that $r_i \in BAD_\varphi$.

$B_0,\ldots,B_{k-1}$ depend on disjoint coins $\Rightarrow$ are mutually independent

$\mathbb{P}[B_0 \wedge B_1 \wedge \cdots \wedge B_{k-1} ] \leq 0.9^k$.

Amplifying algorithms with two-sided error

Thm: Suppose $ALG$ is probabilistic alg s.t. for every $x$, $\mathbb{P}[ALG(x)=F(x)] \geq 2/3$. where probability is over the coins of $ALG$ Then if we run $ALG(x)$ $k$ independent times, $\mathbb{P}[ \text{majority answer is $F(x)$}]\gt1-2e^{-0.01k}$

Proof: Fix $x\in \{0,1\}^*$, and define $X_i=1$ if run $i$ of $ALG(x)$ is correct. $X_i=0$ otherwise. Then $X_0,\ldots,X_{k-1}$ i.i.d with $\mu = \mathbb{E}[X_i]\geq 2/3$

By Chernoff $\mathbb{P}[ | \sum_{i=0}^{k-1} X_i - \mu | \gt \epsilon k ] \leq 2e^{-\epsilon^2 k/2}$

Since $\mu \geq (2/3)k$, to get majority wrong $\epsilon \gt 1/6$. Comes out to $2e^{-k/72}$.

Bonus: Proof of Paley-Zigmund

Thm: For $Y\geq 0$, let $\mu = \mathbb{E}[Y]$ and $p=\mathbb{P}[ Y \geq \epsilon \mathbb{E}Y ]$ then $p \geq (1-\epsilon)^2\tfrac{\mu^2}{\mathbb{E}[Y^2]}$.

Proof: Let $\mu = \mathbb{E}Y$ and $Z = \begin{cases} 1 & Y \geq \epsilon\mu \\ 0 & \text{otherwise} \end{cases}$. $p=\mathbb{E}[Z]$

$\mu = \mathbb{E}[Y] = \mathbb{E}[Y(Z+(1-Z))] = \mathbb{E}[YZ] + \mathbb{E}[Y(1-Z)]$ $\leq \mathbb{E}[YZ] + \epsilon \mu$ $\leq \sqrt{\mathbb{E}[Y^2]^{1/2}\mathbb{E}[Z^2]}+\epsilon \mu$

Since $Z=Z^2$ and $\mathbb{E}[Z]=p$ we get

$\sqrt{\mathbb{E}[Y^2]\cdot p} \geq \mu - \epsilon\mu = (1-\epsilon)\tfrac{\mu^2}{\mathbb{E}[Y^2]}$

Drunkard’s walk

Thm: $\mathbb{P}[ \text{position after $n$ steps} \gt \sqrt{n}/10 ] \geq 1/10$

Proof: Follows from Paley Zygmund inequality: for $Y\geq 0$, $0 \leq \epsilon \leq 1$, $\mathbb{P}[ Y \geq \epsilon \mathbb{E}Y ] \geq (1-\epsilon)^2\tfrac{\mathbb{E}[Y]^2}{\mathbb{E}[Y^2]}$.

Use that for $Y = (\sum X_i)^2$, $\epsilon = 1/100$, together with $(*) \mathbb{E}[Y^2]=\mathbb{E}[X^4]=3n^2$.

Proof of $(*)$:

$\mathbb{E}[Y^2] = \mathbb{E}[ (\sum_i X_i)^4] = \mathbb{E}[\sum_{i,j,k,\ell}X_iX_jX_kX_\ell]$ $\sum_{i,j,k,\ell}\mathbb{E}[X_iX_jX_kX_\ell]$

$\mathbb{E}[X_iX_jX_kX_\ell]=1$ iff $i=j$ and $k=\ell$ or $i=k$ and $j=\ell$ or $i=\ell$ and $j=k$.

$\mathbb{E}[Y^2] = 3\sum_{a=0}^{n-1}\sum_{b=0}^{n-1} 1 = 3n^2$

Hence $\mathbb{P}[Y \geq n/100 ] \geq 0.99^2 \tfrac{n^2}{3n^2} \sim 0.32$