2025-02-Compiler/notes/4.md

# Syntax Anlysis 2

## Bottom-Up Parsing

Bottom-up Parsing is more efficient than Top-down parsing.
it uses LR grammars: Left-recursive and right-most derivation.

Rely on **Shift-reduce Parsers**.

example:

```c
E -> T | T + E
T -> int | int * T | (E)
```

```c
// reduce
int * int + int | T -> int
int * T + int   | T -> int * T
T + int         | T -> int
T + T           | E -> T
T + E           | E -> T + E
E               |
// view as reversed, it seems right-most derivation
```

### Shift-Reudce Parsing

```c
E -> T | T + E
T -> int | int * T | (E)
```

| Stack       | Input             | Applied Production    |
| ----------- | ----------------- | --------------------- |
|             | `int * int + int` | shift                 |
| `int`       | `* int + int`     | shift                 |
| `int *`     | `int + int`       | shift                 |
| `int * int` | `+ int`           | reduce `T -> int`     |
| `int * T`   | `+ int`           | reduce `T -> int * T` |
| `T`         | `+ int`           | shift                 |
| `T +`       | `int`             | shift                 |
| `T + int`   |                   | reduce `T -> int`     |
| `T + T`     |                   | reduce `E -> T`       |
| `T + E`     |                   | reduce `E -> T + E`   |
| `E`         |                   |                       |


**Action Selection Problem**
Which action should we take?
when shift, when reduce, which production applied?

#### LR-Style Grammars

* LR(k): left-toright scanning, right most derivation and k symbol lookahead

* LR(0) Grammar

LR(0) indicates grammars that can determine actions without any lookahead: there are no reduce-reduce and shift-reduce conflicts when using **only the symbols in the stack**.

represent shift-reduce parsing using an **NFA**, whose states are production with separator '`.`' on RHS.
for eample, a production `T -> (E)` has four states: `T -> .(E)`, `T -> (.E)`, `T -> (E.)`, `T -> (E.)`.
before `.` means already in stack, next item means expecting item.
plus an additional dummy production `S' -> S$` for a start and end state.

there are two types of transitions between the stats
- shift transition
- $\epsilon$ transition:

example:

```
S -> (L) | id
L -> LS | L,S
```

```
S' -> S$
S -> (L) | id
L -> S | L,S
```

It can be represented as a NFA:

```python {cmd matplotlib hide}
import sys
import pymupdf
from PIL import Image
doc = pymupdf.open("../pdf/L4.pdf")
pix = doc[22].get_pixmap(dpi=360)
img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)

plt.imshow(img)
plt.axis('off')
plt.tight_layout()
plt.show()
```

* SLR(1) Parsing

* LR(1) Grammar


```python {cmd matplotlib hide}
import sys
import pymupdf
from PIL import Image
doc = pymupdf.open("../pdf/L4.pdf")
pix = doc[47].get_pixmap(dpi=360)
img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)

plt.imshow(img)
plt.axis('off')
plt.tight_layout()
plt.show()
```

LR(1) Parsing Table

is same as LR(0) parsing table construction except for reductions:


* LALR(1) Grammar

LALR(1) generally has the same number of states as SLR (much less than LR(1))
for Pascal language, SLR requires several hundred states, LR(1) requires several thousand states.

#### Ambiguous Grammar

Ambiguity is mainly from

* Precedence
  * The production at higher levels will have operators with lower priorities (and vice versa).
  * we can insert non-terminals to enforce precendence.
* Associativity
  * we should determine where to place recursion depending on the associativity


for example: `if-then-else`


**Automatic Disambiguation**

We can define precedence to use ambiguous grammars w/o shift-reduce conflicts.


## AST

### AST Construction LL

```c
expr parse_S() {
    switch(token) {
        case num:
        case '(':
            expr child1 = parse_E();
            expr child2 = parse_Sp();
            return new S(child1, child2);
        default:
            parseError();

    }
}

```

### AST Construction LR