# Machine Level Programming ## History of Intel Processors * Eveolutionary design: **Backwards compatible** up until `8086` in 1978 * **C**omplex **I**nstruction **S**et **C**omputer (CISC) **RISC vs CISC** * CISC has variable length instructions * RISC has constant length instructions 1. Intel x86(8086) 1. IA32 to IA64 2. (after x86-64) EM64T(almost same as AMD x86-64) 2. AMD x86-64 ## C, Assembly, machine code * ***Architecture***: The parts of a processor design that one needs to understand or write assembly/machine code * **ISA(Instruction Set Architecture)** * e.g., x86, IA32, Itanium, x86-64, ARM * ***Microarchitecture***: Implementation of the architecture form of code: * Machine Code: the byte-level programs that a processor executes * Assembly Code: A text representation of machine code ### Assembly/Machine Code View Programmer-Visible State (shown by ISAs) * PC(Program Counter) * Address of next instruction * RIP in (x86-64) * Register file * Heavily used program data * Condition codes * store status information about most recent arithmetic or logical op * Used for conditional branching * Memory(external) Compiling Into Assembly ```c {cmd=gcc, args=[-Og -x c -c $input_file -o 3_1.o]} long plus(long x, long y); void sumstore(long x, long y, long *dest) { long t = plus(x, y); *dest = t; } ``` ```sh {cmd hide} while ! [ -r 3_1.o ]; do sleep .1; done; objdump -d 3_1.o ``` ### Integer Registers * In x86-64 `ax` `bx` `cx` `dx` `si` `di` `sp` `bp` (in 8bytes `r` 4bytes `e`) `r8` `r9` `r10` `r11` `r12` `r13` `r14` `r15` (in 4bytes: add `d`) * In IA32 `eax`(32bit): 16bit `ax`(`ah`, `al`); origin: accumulate `ecx`(32bit): 16bit `cx`(`ch`, `cl`); origin: counter `edx`(32bit): 16bit `bx`(`bh`, `bl`); origin: data `ebx`(32bit): 16bit `dx`(`dh`, `dl`); origin: base `esi`(32bit): 16bit `si`(`sih`, `sil`); origin: source index `edi`(32bit): 16bit `di`(`dil`, `dil`); origin: destination index `esp`(32bit): 16bit `sp`(`spl`, `spl`); origin: stack pointer `ebp`(32bit): 16bit `bp`(`bpl`, `bpl`); origin: base pointer #### understanding `movq` In x86asm, There are three operand types: **immediate, register, memory** * immediate: constant integer data like `$0x400` `$-533` * register: one of 16 integer regs * e.g., `%rax`, `%r13` * but `%rsp` is reserved for special use * memory: 8 bytes of memory at address given by register * e.g., `(%rax)` **movq usage** `movq $src, $dest` * limit: Cannot do memory-memory transfer with a single instruction(because memory is external device to cpu) **memory addressing modes** `(R)` means `Mem[Reg[R]]` `D(R)` means `Mem[Reg[R]+D]`, constant displacement `D` specifies offset `movq 8(%rbp), %rdx`
```c int swap (long *xp, long *yp) { long t0 = *xp; long t1 = *yp; *xp = t1; *yp = t0; } ``` ```nasm swap: movq (%rdi), %rax movq (%rsi), %rdx movq %rdx, (%rdi) movq %rax, (%rsi) ret ```
Complete form of memory addressing modes: `D(Rb, Ri, S)` means `Mem[Reg[Rb] + S*Reg[Ri] + D]` * `D`: Constant "displacement" * `Rb`: Base Register * `Ri`: Index Register * `S`: Scale Factor(1, 2, 4, or 8) for example: | `%rdx` | `%rcx` | | -------- | -------- | | `0xf000` | `0x0100` | * `0x8(%rdx)` = `0xf008` * `(%rdx, %rcx)` = `0xf100` * `(%rdx, %rcx, 4)` = `0xf400` * `0x80(,%rdx, 2)` = `0x1e080` #### Arithmetic & Logical Operations * `leaq $src, $dst` * computing address without memory reference like `p = &x[i]` * computing arithmetic expression `x + k * y` * `addq $src, $dst` * `subq $src, $dst` * `imulq $src, $dst` * `salq $src, $dst` * `sarq $src, $dst` * `shrq $src, $dst` * `xorq $src, $dst` * `andq $src, $dst` * `orq $src, $dst` all the above operator operates like `dest = dest # src` * `incq $dest` * `decq $dest` * `negq $dest` * `notq $dest` ## Control **Processor State(x86-64, Partial)** * Temporary data(`%rax`, ...) * Location of runtime stack(`%rsp`) * Location of current code control point(`%rip`, instruction point) * Status of recent tests(`CF`, `ZF`, `SF`, `OF`) ### Condition Codes * Single bit registers * `CF` Carry flag (for unsigned) * `SF` Sign flag (for signed) * `ZF` Zero flag * `OF` Overflow flag (for signed) **Conditional Codes(Implicit Setting)** Implicit setting is codes are set by arithmetic operations(`addq`, `subq`, `mulq`) for example: `addq`: `t = a + b` * `CF` set if carry out from most significant bit or unsigned overflow * `ZF` set if `t == 0` * `SF` set if `t < 0` (as signed) * `OF` set if two's-complement overflow or signed overflow `(a > 0 && b > 0 && (a + b) < 0) || (a < 0 && b < 0 && (a + b) >= 0)` The codes are not implictly set by `leaq`, because it is not designed to be used as arithmetic but used as **address calculation**. so it cannot affect to conditional codes. **Conditional Codes(Explicit Setting)** The codes are set explictly by compare instruction. `cmpq b, a` is computing `a - b` without setting destination. * `CF` set if carry out from most significant bit or unsigned overflow * `ZF` set if `a == b` or `a - b == 0` * `SF` set if `(a - b) < 0` (as signed) * `OF` set if two's-complement overflow or signed overflow `(a > 0 && b > 0 && (a - b) < 0) || (a < 0 && b < 0 && (a - b) >= 0)` And explictly set by test instruction `testq b, a` is computing `a & b` without setting destination. Sets condition codes based on value of `a & b` it is useful to have one of the operands be a mask. * `ZF` set when `a & b == 0` * `SF` set when `a & b < 0` **Reading Condition Codes** `setX`: set single byte based on combination of condition codes | setX | effect | desc | | ------- | ---------------- | ------------------------- | | `sete` | `ZF` | Equal / Zero | | `setne` | `~ZF` | Not Equal / Not Zero | | `sets` | `SF` | Negative | | `setns` | `~SF` | Nonnegative | | `setg` | `~(SF^OF) & ~ZF` | Greater (signed) | | `setge` | `~(SF^OF)` | Greater or Equal (signed) | | `setl` | `SF^OF` | Less (signed) | | `setle` | `(SF^OF) \| ZF` | Less or Equal (signed) | | `seta` | `~CF & ~ZF` | Above (unsigned) | | `setb` | `CF` | Below (unsigned) | it deos not alter remaining bytes of registers. only use 1 byte register(`%al`, `%bl`) ```nasm cmpq %rsi(y), %rdi(x) # compare x and y setg %al # set when >(greater) movzbl %al, %eax # move zero extend byte to long ret ``` ### Conditional Branches #### Jumping `jX` jump to different part of code depending on condition codes. | jX | condition | desc | | ----- | ---------------- | ------------------------- | | `jmp` | 1 | Unconditional | | `je` | `ZF` | Equal / Zero | | `jne` | `~ZF` | Not Equal / Not Zero | | `js` | `SF` | Negative | | `jns` | `~SF` | Nonnegative | | `jg` | `~(SF^OF) & ~ZF` | Greater (signed) | | `jge` | `~(SF^OF)` | Greater or Equal (signed) | | `jl` | `SF^OF` | Less (signed) | | `jle` | `(SF^OF) \| ZF` | Less or Equal (signed) | | `ja` | `~CF & ~ZF` | Above (unsigned) | | `jb` | `CF` | Below (unsigned) | Old Style Conditional Branch ```c {cmd=gcc args=[-Og -x c -fno-if-conversion -c $input_file -o 3_3.o]} long absdiff(long x, long y) { long result; if (x > y) result = x - y; else result = y - x; return result; } ``` ```sh { cmd hide } while ! [ -r 3_3.o ]; do sleep .1; done; objdump -d 3_3.o -Msuffix ``` **expressing with `goto`** ```c {cmd=gcc args=[-Og -x c -rno-if-conversion -c $input_file -o 3_4.o]} long absdiff_j(long x, long y) { long result; int ntest = x <= y; if (ntest) goto Else; result = x-y; goto Done; Else: result = y-x; Done: return result; } ``` #### Conditional Move But this branchings are very disruptive to instruction flow through pipelines, **Conditional Moves** are highly used because they do not require control transfer. ```c {cmd=gcc args=[-O3 -x c -c $input_file -o 3_5.o]} long absdiff(long x, long y) { long result; if (x > y) result = x - y; else result = y - x; return result; } ``` ```sh {cmd hide} while ! [ -r 3_5.o ]; do sleep .1; done; objdump -d 3_5.o -Msuffix ``` However, there are several *bad cases* for conditional move. * expansive computations ```c val = Test(x) ? Hard1(x) : Hard2(x); ``` because both values are get computed. only simple computations are effective for conditional moves. * risky computations ```c val = p ? *p : 0; ``` both values get computed may have undesiarable effects. * Computations with side effects ```c val = x > 0 ? x*=7 : x+=3; ``` each expression has side-effect. ### Loop #### do-while
```c long pcount_do(unsigned long x) { long result = 0; do { result += x & 0x1; x >>= 1; } while (x); return result; } ``` ```c {cmd=gcc args=[-Og -x c -c $input_file -o 3_6.o]} long pcount_goto(unsigned long x) { long result = 0; loop: result += x & 0x1; x >>= 1; if (x) goto loop; return result; } ```
```sh {cmd hide} while ! [ -r 3_6.o ]; do sleep .1; done; objdump -d 3_6.o -Msuffix ``` **general do-while translation**
```c do { Body } while (Test); ``` ```c loop: Body if (Test) goto loop; ```
#### while **general while translation#1** it is called **jump-to-middle translation**, used with `-O0` (or `-Og`) flag.
```c while(Test) { Body } ``` ```c goto test; loop: Body test: if (Test) goto loop; done: ```
```c {cmd=gcc args=[-Og -x c -c $input_file -o 3_7.o]} long pcount_while(unsigned long x) { long result = 0; while (x) { result += x & 0x1; x >>= 1; } return result; } ``` ```sh {cmd hide} echo "jmp-to-middle translation" while ! [ -r 3_7.o ]; do sleep .1; done; objdump -d 3_7.o -Msuffix ``` **general while translation#2** while to do-while conversion, used with `-O1` flag.
```c while(Test) { Body } ``` ```c if (!Test) goto done; do { Body } while (Test); done: ``` ```c if (!Test) goto done; loop: Body if (Test) goto loop; done: ```
```c {cmd=gcc args=[-O1 -x c -c $input_file -o 3_8.o]} long pcount_while(unsigned long x) { long result = 0; while (x) { result += x & 0x1; x >>= 1; } return result; } ``` ```sh {cmd hide} echo "while to do-while conversion" while ! [ -r 3_8.o ]; do sleep .1; done; objdump -d 3_8.o -Msuffix ``` #### for loop form
```c for (init; test; update) { Body } ```
**for-to-while conversion**
```c for (Init; Test; Update) { Body } ``` ```c Init; while(Test) { Body Update; } ```
```c {cmd=gcc args=[-O3 -x c -c $input_file -o 3_9.o]} #include #define WSIZE 8 * sizeof(int) long pcount_for(unsigned long x) { size_t i; long result = 0; for (i = 0; i < WSIZE; i++) { unsigned bit = (x >> i) & 0x1; result += bit; } return result; } ``` ```c {cmd=gcc args=[-O3 -x c -c $input_file -o 3_10.o]} #include #define WSIZE 8 * sizeof(int) long pcount_for(unsigned long x) { size_t i; long result = 0; i = 0; while(i < WSIZE) { unsigned bit = (x >> i) & 0x1; result += bit; i++; } return result; } ```
```sh {cmd hide} while ! [ -r 3_9.o ]; do sleep .1; done; objdump -d 3_9.o -Msuffix ``` ```sh {cmd hide} while ! [ -r 3_10.o ]; do sleep .1; done; objdump -d 3_10.o -Msuffix ```
for to do-while conversion, initial test can be optimized away. ### Switch #### Jump Table Structure Switch form
```c {cmd=gcc args=[-Og -fno-asynchronous-unwind-tables -fno-stack-protector -x c -S $input_file -o 3_11.s]} long switch_eg (long x, long y, long z) { long w = 1; switch(x) { case 1: w = y*z; break; case 2: w = y/z; /* Fall Through */ case 3: w += z; break; case 5: case 6: w -= z; break; case 7: w *= z; break; default: w = 2; } return w; } ``` ```sh {cmd hide} while ! [ -r 3_11.s ]; do sleep .1; done; cat 3_11.s ```
## Procedures Mechanisms in Procedures * **Passing control** * to beginning of procedure code * back to return point * **Passing data** * procedure arguments * return value * **Memory management** * allocate during procedure execution * deallocate upon return this mechanisms are all implemented with machine instructions. **x86-64 implementation** of a procedure used only those mechanisms required. ### Stack Structure **x86-64 Stack** Region of memory managed with *stack discipline*. It grows toward lower addresses. `%rsp` contains lowest stack address(address of top element). `pushq $src` * fetches operand at src * decrement `%rsp` by 8 * write operand at address given by `%rsp` `popq $dest` * read value at address given by `%rsp` * increment `%rsp` by 8 * store value at dest(must be register) ### Procedure Control Flow ```c {cmd=gcc args=[-Og -x c -c $input_file -o 3_12.o]} long mult2(long x, long y) { long t = x * y; return t; } void multstore(long x, long y, long *dest) { long t = mult2(x, y); *dest = t; } ``` ```sh {cmd hide} while ! [ -r 3_12.o ]; do sleep .1; done; objdump -d 3_12.o -Msuffix ``` Procedure call `call label` * push return address on stack * jmp to label Return address: * Address of the next instruction right after call Procedure return: `ret` ### Procedure Data Flow * registers * first 6 args: `%rdi`, `%rsi`, `%rdx`, `%rcx`, `%r8`, `%r9` * return value: `rax` * stack for example with above example ```sh {cmd hide} while ! [ -r 3_12.o ]; do sleep .1; done; objdump -d 3_12.o -Msuffix ``` * with above `mult2` variable `t` is already stored in `%rax` * so `movq %rax,(%rbx)` where `%rbx` is `long*dest` ### Managing local data **Stack-Based Languages** In languages that support recursion * Code must be "reentrant", which means multiple simultaneous instantiations of single procedure. * Need some place to store ***state*** of each instantiation: (**args**, **local variables**, **return pointer**) In order to get this, **stack discipline** is used. state for given procedure needed for limited time(from called to return): Calle returns before caller does. Stack allocated in **frames**, state for single procdure instantiation. When function is called, a new stack frame is created above stack top. And then when the function is returned, a corresponding frame is popped. and return to previous call state. #### Stack Frame is consist of **return information**, **local storage(if needed)** and **temporary space(if needed)**. * `%rbp` frame pointer * `%rsp` stack pointer Space allocated when enter procedure, "set-up" code and includes push by `call`. Deallocated when return, "finish" code and includes pop by `ret`. #### x86-64/Linux Stack Frame * for compile w/o *stack canary*, add option `-fno-stack-protector` ```c {cmd=gcc args=[-Og -x c -fno-stack-protector -c $input_file -o 3_13.o]} long incr(long *p, long val) { long x = *p; long y = x + val; *p = y; return x; } long call_incr() { long v1 = 15213; long v2 = incr(&v1, 3000); return v1 + v2; } ``` ```sh {cmd hide} while ! [ -r 3_13.o ]; do sleep .1; done; objdump -d 3_13.o -Msuffix ```