Files
2025-02-Compiler/notes/4.md
2025-10-02 10:07:21 +09:00

170 lines
3.9 KiB
Markdown

# Syntax Anlysis 2
## Bottom-Up Parsing
Bottom-up Parsing is more efficient than Top-down parsing.
it uses LR grammars: Left-recursive and right-most derivation.
Rely on **Shift-reduce Parsers**.
example:
```c
E -> T | T + E
T -> int | int * T | (E)
```
```c
// reduce
int * int + int | T -> int
int * T + int | T -> int * T
T + int | T -> int
T + T | E -> T
T + E | E -> T + E
E |
// view as reversed, it seems right-most derivation
```
### Shift-Reudce Parsing
```c
E -> T | T + E
T -> int | int * T | (E)
```
| Stack | Input | Applied Production |
| ----------- | ----------------- | --------------------- |
| | `int * int + int` | shift |
| `int` | `* int + int` | shift |
| `int *` | `int + int` | shift |
| `int * int` | `+ int` | reduce `T -> int` |
| `int * T` | `+ int` | reduce `T -> int * T` |
| `T` | `+ int` | shift |
| `T +` | `int` | shift |
| `T + int` | | reduce `T -> int` |
| `T + T` | | reduce `E -> T` |
| `T + E` | | reduce `E -> T + E` |
| `E` | | |
**Action Selection Problem**
Which action should we take?
when shift, when reduce, which production applied?
#### LR-Style Grammars
* LR(k): left-toright scanning, right most derivation and k symbol lookahead
* LR(0) Grammar
LR(0) indicates grammars that can determine actions without any lookahead: there are no reduce-reduce and shift-reduce conflicts when using **only the symbols in the stack**.
represent shift-reduce parsing using an **NFA**, whose states are production with separator '`.`' on RHS.
for eample, a production `T -> (E)` has four states: `T -> .(E)`, `T -> (.E)`, `T -> (E.)`, `T -> (E.)`.
before `.` means already in stack, next item means expecting item.
plus an additional dummy production `S' -> S$` for a start and end state.
there are two types of transitions between the stats
- shift transition
- $\epsilon$ transition:
example:
```
S -> (L) | id
L -> LS | L,S
```
```
S' -> S$
S -> (L) | id
L -> S | L,S
```
It can be represented as a NFA:
```python {cmd matplotlib hide}
import sys
import pymupdf
from PIL import Image
doc = pymupdf.open("../pdf/L4.pdf")
pix = doc[22].get_pixmap(dpi=360)
img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
plt.imshow(img)
plt.axis('off')
plt.tight_layout()
plt.show()
```
* SLR(1) Parsing
* LR(1) Grammar
```python {cmd matplotlib hide}
import sys
import pymupdf
from PIL import Image
doc = pymupdf.open("../pdf/L4.pdf")
pix = doc[47].get_pixmap(dpi=360)
img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
plt.imshow(img)
plt.axis('off')
plt.tight_layout()
plt.show()
```
LR(1) Parsing Table
is same as LR(0) parsing table construction except for reductions:
* LALR(1) Grammar
LALR(1) generally has the same number of states as SLR (much less than LR(1))
for Pascal language, SLR requires several hundred states, LR(1) requires several thousand states.
#### Ambiguous Grammar
Ambiguity is mainly from
* Precedence
* The production at higher levels will have operators with lower priorities (and vice versa).
* we can insert non-terminals to enforce precendence.
* Associativity
* we should determine where to place recursion depending on the associativity
for example: `if-then-else`
**Automatic Disambiguation**
We can define precedence to use ambiguous grammars w/o shift-reduce conflicts.
## AST
### AST Construction LL
```c
expr parse_S() {
switch(token) {
case num:
case '(':
expr child1 = parse_E();
expr child2 = parse_Sp();
return new S(child1, child2);
default:
parseError();
}
}
```
### AST Construction LR