N or Space for next slide, P for previous slide, F for full screen, M for menu
Boaz Barak (instructor), Brian Sapozhnikov (Head TF), Albert Chalom, Alexis Ross, Charles O’Mara, Daniel Chen, David Li , Jambay Kinley, Jane Ahn, John Shen, Josh Seides, Laura Pierson, Pranay Tankala, Raymond Lin, Theresa Nguyen, Wanqian Yang, William Fu, Hikari Sorensen (Extension TF), Eric Lu (Patel Fellow)
Laptops/phones only on last 5 rows please.
Suggestions if midterm grade lower than pset grade:
Phase 0: Strings, functions, sets, representations.
Take away: Comfort manipulating, sets, functions, functions as objects and inputs.
Phase 1: Computing finite aka fixed input functions (NAND, Boolean circuits)
T/A: Distinguish functions and programs, programs as strings and inputs to other programs.
Phase 2: Computing infinite aka unbounded input functions (NAND++,NAND<<,Turing machines,..)
T/A 1: Still, distinguish functions and programs, and programs as strings. Some functions can’t even be computed!
T/A 2: Unimportance of computational model. Think in algorithms not NAND++ (or Python/C/OCaml)
T/A 3: Reductions as way to relate problems’ difficulty.
Next up: Phase 3: Efficient computation, P and NP and Phase 4: Probabilistic computation.
Phases 2 (infinite functions) and 3 (efficient computation, P and NP) well covered in Sisper’s book.
Translate Languages to Functions
This course | Sipser | |
Computational task: | Computing a function \(F:\{0,1\}^*\rightarrow \{0,1\}\) | Deciding a language \(L \subseteq \{0,1\}^*\) |
Input: | \(x\in \{0,1\}^*\) | \(x\in \{0,1\}^*\) |
Goal: | Compute \(F(x)\) | Decide if \(x\in L\) (output \(1\) for “Yes”, \(0\) for “No”) |
NAND++, NAND<<, Turing machines, RAM machines, \(\lambda\) calculus, Game of life, Scratch, C, Fortran, Javascript, PHP, Python, Java, Julia, Swift, Ruby, Rust, Go, Perl, Fortran, Lisp, Haskell, OCaml, ..
Domain Specific Languages (DSL): HTML, SQL, Matlab, bash, make, UnrealScript, ANTLR , JSON, EMACS Lisp, Map-reduce, Halide, Graphviz dot, XML, UML, VHDL, LaTeX, Solidity, Michelson, ..
“Sub languages”: C preprocessor, type system, templates, regular expressions, list comprehensions, format strings, ..
Why so many languages?
We want assurance that:
Search for a string will terminate. (regular expressions.)
Compiler runs in finite time. (context free grammars.)
Program respects type safety. (type system)
Access is granted to right people. (access control rules.)
Database query terminates. (SQL)
Web page is rendered in finite time (HTML)
…
Want formalisms that are:
Two examples of a restricted formalisms: Regular Expressions and Context Free Grammars:
Def: Regular expression \(exp\) is one of:
\(\Phi_{exp}(x)=1\) iff \(exp\) matches \(x\).
Example: \((00|11)^*\) corresponds to \(\Phi(x)=1\) iff \(x_{2i+1}=x_{2i}\) for all \(i\).
Which of the following regular expressions captures the set of strings that contain \(010\) as a substring:
a. \((010)^*\)
b. \((001100)^*\)
c. \((010)(0|1)^*\)
d. \((0|1)^*(010)(0|1)^*\)
Which of the following regular expressions captures the set of strings that contain \(010\) as a substring:
a. \((010)^*\)
b. \((001100)^*\)
c. \((010)(0|1)^*\)
d. \((0|1)^*(010)(0|1)^*\)
Give a regular expression for the function \(F:\{0,1\}^* \rightarrow \{0,1\}\) s.t. \(F(x)=1\) iff \(x\) does not contain \(010\) as a substring. Hint: Not containing \(010\) means every \(1\) cannot be “isolated” apart from at beginning and end.
Give a regular expression for the function \(F:\{0,1\}^* \rightarrow \{0,1\}\) s.t. \(F(x)=1\) iff \(x\) does not contain \(010\) as a substring. Hint: Not containing \(010\) means every \(1\) cannot be “isolated” apart from at beginning and end.
Answer:
Thm 1: There is an algorithm \(P\) such that \(P(exp,x)=1\) iff \(\Phi_{exp}(x)=1\).
Proof: In text - directly following definition.
Thm 2: There is an algorithm \(P\) such that \(P(exp,x)=1\) iff \(\Phi_{exp}(x)=1\). Moreover, \(P\) runs in \(O(n)\) time where \(n=|exp|\). More-moreover \(P\) uses \(O(1)\) space and make single pass - deterministic finite automaton.
For regexp \(exp\), \(exp[0]\) matches \(x\) if and only if \(exp\) matches \(x0\).
Example 1: \(exp=10011|100\), \(exp[0]=10\), \(exp[1]=1001\).
Example 2: \(exp=(100)*11|100\), \(exp[1]=(100)*1\), \(exp[0]=10\). \((exp[1])[0] = \emptyset\)
Question: If \(exp = (011010)^*\), what is \(exp[0]\)?
a. \((01101)^*\)
b. \((01101) | (011010)^*\)
c. \((011010)^*(01101)\)
d. \((01101)(011010)^*\)
For regexp \(exp\), \(exp[0]\) matches \(x\) if and only if \(exp\) matches \(x0\).
Example 1: \(exp=10011|100\), \(exp[0]=10\), \(exp[1]=1001\).
Example 2: \(exp=(100)*11|100\), \(exp[1]=(100)*1\), \(exp[0]=10\). \((exp[1])[0] = \emptyset\)
Question: If \(exp = (011010)^*\), what is \(exp[0]\)?
a. \((01101)^*\)
b. \((01101) | (011010)^*\)
c. \((011010)^*(01101)\)
d. \((01101)(011010)^*\)
For regexp \(exp\), \(exp[0]\) matches \(x\) if and only if \(exp\) matches \(x0\).
Example 1: \(exp=10011|100\), \(exp[0]=10\), \(exp[1]=1001\).
Example 2: \(exp=(100)*11|100\), \(exp[1]=(100)*1\), \(exp[0]=10\). \((exp[1])[0] = \emptyset\)
Lemma: For every regular expression \(exp\) and \(\sigma \in \{0,1\}\), can find expression \(exp[\sigma]\) that matches \(x\) iff \(exp\) matches \(x\sigma\). Moreover, \(C(exp[\sigma]) \leq C(exp)\).*
Proof by recursive algorithm in text
Algorithm \(MATCH\):
Input: \(exp\), \(x\in \{0,1\}^n\).
Operation:
If \(|x|=0\), return \(1\) iff \(exp\) contains ""
Return \(MATCH(exp[x_{n-1}],x_0\cdots x_{n-1})\).
Correctness proof by induction:
\(MATCH(exp,x_0 \cdots x_{n-2}x_{n-1})\) \(= MATCH(exp[x_{n-1}],x_0\cdots x_{n-2})\) (ind. hyp.) \(= \Phi_{\exp[x_{n-1}]}(x_0\cdots x_{n-2})\) (lemma) \(=\Phi_{exp}(x_0 \cdots x_{n-2} x_{n-1})\)
\(TIME(n) = TIME(n-1) + O(1)\).
Thm 3: \(F\) is regular if and only if it can be computed by a one-pass constant-space algorithm (i.e., DFA).
One direction follows from memoization: turing recursive algorithm to dynamic program.
Many tools to perform this conversion.
Corollary: \(\forall\) \(exp\) \(\exists\) \(\overline{exp}\) s.t. \(\Phi_{\overline{exp}} = \neg \Phi_{exp}\). can you see why?
Corollary: \(\forall\) \(exp',exp''\), \(\exists\) \(exp\) s.t. \(\Phi_{exp} = \Phi_{exp'} \wedge \Phi_{exp''}\). can you see why?
\(a \wedge b = \overline{\overline{a} \vee \overline{b}}\)
Corollary: Can add \(\neg\) and \(\wedge\) syntactic sugar to reg exps.
Thm: There is algorithm \(EQ(exp',exp'')\) that outputs \(1\) iff \(\Phi_{exp'}=\Phi_{exp''}\).
Proof: Given \(exp',exp''\), can build \(exp\) s.t. \(\Phi_{exp}(x)=1\) iff \(\Phi_{exp'}(x) \neq \Phi_{exp''}(x)\). (Can express \(\neq\) using \(\wedge,\vee,\neg\).)
Reduces to deciding emptiness: Given \(exp\), is \(\Phi_{exp} \equiv 0\). ✔
Corollary: Decide semantic properties of finite state machines.
Pumping lemma: (informal) If \(w\) matching \(exp\) is long enough can repeat substring of \(w\) arbitrarily many times and still match.
Proof idea: You can’t match a long string without using \(*\) operator.
Pumping Lemma: If \(\Phi_{exp}(w)=1\) and \(|w| \gt 2|exp|\) then \(w=xyz\) with \(|xy| \leq n_0\), \(|y| \geq 1\) and \(\Phi(xy^kz)=1\) for all \(k\in \mathbb{N}\).
Example: \(PAREN: \{ \langle, \rangle \}^* \rightarrow \{0,1\}\)
Question: Let \(F:\{a-z,(,),.,\)\(\} \rightarrow \{0,1\}\) be the function such that \(F(x)=1\) if \(x\) can be generated from \(\langle expression \rangle\) using grammar above. Describe \(F\) in words.
Question: Let \(F:\{a-z,(,),.,\)\(\} \rightarrow \{0,1\}\) be the function such that \(F(x)=1\) if \(x\) can be generated from \(\langle expression \rangle\) using grammar above. Describe \(F\) in words.
In practice: Not just decide if \(x\) is generated by \(G\), but also produce the sequence of derivations known as parse tree.
If \(F\) is regular then \(F\) is CFG.
If \(F\) is CFG then \(F\) is computable. (recursive algorithm following definition)
Neither of those is “if and only if”
Can decide emptiness for CFG.
Cannot decide equivalence: \(EQ(G,G')=1\) iff \(\Phi_G = \Phi_{G'}\) is uncomputable
Thm: Let \(EQCFG:\{0,1\}^* \rightarrow \{0,1\}\) such that
\[EQCFG(G,G') = \begin{cases}1 & \Phi_G = \Phi_{G'} \\ 0 & \text{otherwise} \end{cases}\]then \(EQCFG\) is uncomputable.
Lemma: Let \(F(x)=1\) iff \(x = u\|v^R\) such that \(u \neq v\). Then \(F\) is CFG.
Proof:
Thm: Let \(EQCFG:\{0,1\}^* \rightarrow \{0,1\}\) such that \(EQCFG(G,G') = 1\) iff \(\Phi_G = \Phi_{G'}\). Then \(EQCFG\) is uncomputable.
Proof: Let \(P\) be NAND++ program, and \(x\) be input. A computation history of \(P\) on \(x\) is the sequence \(\sigma_0,\ldots,\sigma_{t-1}\) such that \(\sigma_i\) encodes configuration of \(P(x)\) at step \(i\).
We say \(\sigma_0,\ldots, \sigma_{t-1}\) is SENSICAL if \(\alpha_0\) is starting, \(\sigma_{t-1}\) is halting and every \(\sigma_i\) has the right format (string of symbols in \(\Sigma_{small}=\{0,1\}^a\) with a single symbol in \(\Sigma_{big} = \{0,1\}^{a+b}\))
CLAIM: There is regexp to decide if \(\sigma_0,\ldots, \sigma_{t-1}\) is SENSICAL.
We say that \(\sigma_0,\ldots,\sigma_{t-1}\) is VALID if for every \(i\), \(\sigma_{i+1}^\top = NEXT_P(\sigma_i^\top)\) where \(\cdot^\top\) means that we reverse strings for odd \(i\).
CLAIM: There is context free language to decide if \(\sigma_0,\ldots,\sigma_{t-1}\) is INVALID.
\(P\) does not halt on \(x\) iff every SENSICAL configuration is INVALID.
Model | Halting | Emptiness | Equivalence |
---|---|---|---|
Regular Expressions | Decidable | Decidable | Decidable |
Context Free Grammars | Decidable | Decidable | Undecidable |
Turing complete models | Undecidable | Undecidable | Undecidable |