update notes in midterm1

2025-10-27 04:25:46 +09:00
parent 4a11a52f1a
commit 13b27cc21e
2 changed files with 211 additions and 15 deletions
--- a/notes/2.md
+++ b/notes/2.md
@@ -1,5 +1,4 @@
-Lexical Analysis
-===
+# Lexical Analysis

 포트란은 모든 whitespace를 지움.

--- a/notes/3.md
+++ b/notes/3.md
@@ -1,30 +1,72 @@
 # Syntax Analysis

+## Specification

-## Context-Free Grammars (CFG)
+### Context-Free Grammars (CFG)

-## Parse Tree
+There are four main compoennts of CFG

-A tree representation of the derivation
+* Terminal Symbols
+* Non-terminal Symbols
+* Start Symbol $S$
+* Production
+
+Language generated by a CFG is a set of strings of terminals by repeatedly applying productions to the non-terminals: $L(G)$ indicates a language generated by the grammar $G$.
+
+We can use CFGs to express the syntax of the target programming languages: Parser detects that the token stream from lexer is valid or invalid.
+
+**We cannot rely on regex to sepcify the syntax:** Because regex is not expressive enough to describe valid syntax. (e.g. nested parenthesis)
+
+## Recognition
+
+### Parse Tree
+
+A tree representation of the derivation.

 parse tree has `terminals` at the leaves, `non-terminals` at the interior.

 An in-order traversal of the leaves is the original input.

+We can appply productions for the non-terminals in any order:
 * leftmost derivation
 * rightmost derivation

-### Ambiguity
+#### Ambiguity

-should be removed
+A grammar $G$ is ambigous if it produces different parse tree depening on the order.

-for example: `A + B * C` should be resolved
+It should be **resolved** to construct a useful parser.

 **removing ambiguity**

+Example of Ambiguity.
+
+1. Precedence:
+The production at higher levels will have operators with lower priorities (and vice versa).
+we can insert non-terminals to enforce precendence.
+
+2. Associativity:
+We should determine where to place recursion depending on the associativity.
+   * left associative: place the recursion on the left
+   ```txt
+   S -> S - T | T
+   T -> id
+   ```
+   * right associative: place the recursion on the right
+   ```txt
+   S -> T ^ S | T
+   T -> id
+   ```
+   * non associative: do not use recursion
+   ```txt
+   S -> A < A | A
+   A -> id
+   ```
+

 ### AST (Abstract Syntax Tree)

+AST discards unneeded information for syntax analysis: removes non-terminals from parse tree.

 ### Error Handling

@@ -32,24 +74,179 @@ One of the purposes of the compiler is error handling.
 - to detect non-valid programs
 - and to translate the non-valid to the valid

-## Parsing
+so therefore, error handler should:
+* report errors accurately and clearly
+* recover quickly from an error
+* not slow down compilation of valid code

-* Top-down Parsing
+so Why?
+* back in the day, the compiler was extremely slow.
+* but nowadays, we do not need complex error handling procedure
+  * Quick recompilatio
+  * Users tend to correct few errors at once
+  * Does not need a complex error recovery procedure

-**Recursive Descent Parsing**
+## Automation

-by using backtracking
+How to generate parse tree from CFG?

-* Predictive Parsing
+### Top-Down Parsing

+Construct a leftmost derivation of string while reading the token stream.
+e.g.
+```txt
+S -> E + S | E
+E -> num | (S)
+```

-Parsing Table: no need to backtrack.
+We can implment it as **recursive descent parsing**.
+Recursive descent parsing is try out rules in order and backtrack if the production does not generate proper token.
+
+#### Predictive Parsing and LL(1)
+
+But it needs **backtracking**.
+So we introduce **predictive parsing**.
+
+Predictive Parsing applies a single production without "backtracking". LL(1) grammar can apply **"at most a single production"**, which actually eliminates the multiple matches in top-down parsing(recursive-descent).
+
+### Parser Implementation
+
+#### Recursive Descent Parser by LL(1)
+
+```text
+S -> ES'
+S' -> +ES' | e
+E -> num | (S)
+```
+
+We can use the table to implement parsers.
+
+| *   | num | +   | (   | )   | $(EOF) |
+| --- | --- | --- | --- | --- | ------ |
+| S   | ES_ |     | ES_ |     |        |
+| S_  |     | + S |     | e   | e      |
+| E   | num |     | (S) |     |        |
+
+```c
+void parse_S() {
+    switch(token) {
+        case num: parse_E(); parse_S_(); return;
+        case '(': parse_E(); parse_S_(); return;
+        default: ParseError();
+    }
+}
+void parse_S_() {
+    switch(token) {
+        case '+': token=input.next(); parse_S_(); return;
+        case ')': return;
+        case EOF: return;
+        default: ParseError();
+    }
+}
+
+void parse_E() {
+    switch(token) {
+        case num: token = input.next(); return;
+        case '(': token = input.next(); parse_S(); if(token != ')') ParseError(); token = input.next(); return;
+        default: ParseError();
+    }
+}
+```
+
+#### Parsing Tables
+
+And then How to Construct a Parsing Tables? There are three important traits to gen parse tables.
+* $x$ is nullable if it can derive an empty string.
+* $\text{First}(x)$ is a set of terminals that can derived in the first position.
+* $\text{Follow}(x)$ is a set of terminals that can appears after $\alpha$ in at least one of the derivations.
+
+Computing Nullable
+1. Easy
+
+Computing First
+1. $\text{First}(t) = \set{t}$
+2. For a production $x \to A_1 A_2 \dots A_n$ where $A_1 \dots A_{i-1}$ are nullable, then $\text{First}(x) += \text{First}(A_{i})$
+
+Computing Follow
+1. $\text{Follow}(S) = \set{ \$ }$, ($S$ is the start symbol)
+2. For a production $x \to A_1 A_2 \dots A_n$ where $A_{i+1}\dots A_{n}$ are nullable, then $\text{Follow}(A_{i}) += \text{Follow}(x)$
+3. For a production $x \to A_1 A_2 \dots A_n$ where $A_{i+1}A_{i+2}\dots A_{j-1}$ are nullable, $\text{Follow}(A_{i}) += \text{Follow}(A_{j})$
+
+So Combine Them Together:
+
+```py
+S = symbols.start
+for x in symbols:
+    First(x) = {}; Follow(x) = {}, Nullable(x) = false
+Follow(S) = {$}
+for t in terminals:
+    First(t) = {t}
+Nullable(e) = True
+while not (First.is_changed or Follow.is_changed or Nullable.is_changed):
+    for X, A in productions:
+        if all(A) is Nullable: Nullalbe(X) = True
+        if A[1..i-1] is Nullable: First(X) += First(A[i])
+        if A[i+1..n] is Nullable: Follow(A[i]) += Follow(X)
+        if A[i+1..j-1] is Nullable: Follow(A[i]) += First(A[j])
+```
+
+We can use the tables with `First`, `Follow`, `Nullable` to make actual **Parsing Tables** by combining.
+
+### Bottom-Up Parsing
+
+Bottom-up Parsing is more efficient than Top-down parsing by using **LR grammars**, which means **L**eft-recursive grammars, and **R**ight-most derivation.
+It relies on **Shift-reduce Parsers**.
+
+Shift-Reducing Parsing:
+
+Bottom-up parser traces a rightmost derivation in reverse, we should scan the input terminals in a left-to-right manner.
+So the parser splits a string into two parts: sequence of symbols to reduce(**stack**), remaining tokens(**input**).
+And shift-reduce parsing requires two actions: **Reduce**, **Shift**.
+
+What's the important challenge is "Action Selection Problem":
+As there can be conflicts: For a given state(stack + input) there can be multiple possible actions:
+* shift-reduce conflict
+* reduce-reduce conflict
+
+### LR Grammars
+
+* LR(k): left-toright scanning, right most derivation and $k$ symbol lookahead
+* LR(0) Grammar
+LR(0) indicates grammars that can determine actions without any lookahead.
+There are **no reduce-reduce and shift-reduce conflicts**, because it should be determined by stacks.



-### Parser Implement
+### NFA Representations

+We can represent shift-reduce parsing using an **NFA**, whose states are production with separator '`.`' on RHS. And we have an additional dummy production `S' -> S$` for a start and end state.
+There are two types of transitions between the states:
+* **shift transition**: transition by the shift actions.
+* **$\epsilon$ transition**: that is the parser expand the expected list not consuming any input tokens. (transition to LHS of the production is equal to next of the current position)

+#### NFA to DFA

+NFA can be fully converted to DFA. by using DFA, the parser determine whether to shift or reduce: by using symbols in the stack to **traverse the state** and **determine whether to shift or reduce by destination state**.

+### Parsing Table And LR(0)

+DFA Traversal is implemented by simplifying DFA to **LR Parsing Table**.
+
+Store the states along with the symbols in the stack: <`symbol`, `state`>
+
+There are two different types of tables: `goto`, `action`.
+* `goto`: determine the next state using the top state and an **input non-terminal**.
+* `action`: determine the action using the top state and an input terminal.
+
+And table consists of four different actions:
+* `shift x`: push<a, x> on the stack (a is current input and x is a state)
+* `reduce x -> a`: pop a from the stack and push <`x`, `goto[curr_state, x]`>
+* accept(`S' -> S$.`) / Error
+
+DFA states are converted to index of each rows.
+
+### SLR(1) Parsing
+
+A simple extension of LR(0).
+
+For each reduction `X -> b`, look at the next symbol `c` and then apply reduction only if `c` is in `Follow(X)`.