Files
2025-02-SystemProgramming/notes/3.md
2025-10-11 08:39:17 +09:00

14 KiB

Machine Level Programming

History of Intel Processors

  • Eveolutionary design: Backwards compatible up until 8086 in 1978

  • Complex Instruction Set Computer (CISC)

RISC vs CISC

  • CISC has variable length instructions
  • RISC has constant length instructions
  1. Intel x86(8086)
    1. IA32 to IA64
    2. (after x86-64) EM64T(almost same as AMD x86-64)
  2. AMD x86-64

C, Assembly, machine code

  • Architecture: The parts of a processor design that one needs to understand or write assembly/machine code
    • ISA(Instruction Set Architecture)
      • e.g., x86, IA32, Itanium, x86-64, ARM
  • Microarchitecture: Implementation of the architecture

form of code:

  • Machine Code: the byte-level programs that a processor executes
  • Assembly Code: A text representation of machine code

Assembly/Machine Code View

Programmer-Visible State (shown by ISAs)

  • PC(Program Counter)
    • Address of next instruction
    • RIP in (x86-64)
  • Register file
    • Heavily used program data
  • Condition codes
    • store status information about most recent arithmetic or logical op
    • Used for conditional branching
  • Memory(external)

Compiling Into Assembly

long plus(long x, long y);

void sumstore(long x, long y, long *dest) {
    long t = plus(x, y);
    *dest = t;
}
while ! [ -f 3_1.o ]; do sleep .1; done; objdump -d 3_1.o

Integer Registers

  • In x86-64 ax bx cx dx si di sp bp (in 8bytes r 4bytes e) r8 r9 r10 r11 r12 r13 r14 r15 (in 4bytes: add d)

  • In IA32 eax(32bit): 16bit ax(ah, al); origin: accumulate ecx(32bit): 16bit cx(ch, cl); origin: counter edx(32bit): 16bit bx(bh, bl); origin: data ebx(32bit): 16bit dx(dh, dl); origin: base esi(32bit): 16bit si(sih, sil); origin: source index edi(32bit): 16bit di(dil, dil); origin: destination index esp(32bit): 16bit sp(spl, spl); origin: stack pointer ebp(32bit): 16bit bp(bpl, bpl); origin: base pointer

understanding movq

In x86asm, There are three operand types: immediate, register, memory

  • immediate: constant integer data like $0x400 $-533
  • register: one of 16 integer regs
    • e.g., %rax, %r13
    • but %rsp is reserved for special use
  • memory: 8 bytes of memory at address given by register
    • e.g., (%rax)

movq usage movq $src, $dest

  • limit: Cannot do memory-memory transfer with a single instruction(because memory is external device to cpu)

memory addressing modes

(R) means Mem[Reg[R]] D(R) means Mem[Reg[R]+D], constant displacement D specifies offset

movq 8(%rbp), %rdx

int swap (long *xp, long *yp) {
    long t0 = *xp;
    long t1 = *yp;
    *xp = t1;
    *yp = t0;
}
swap:
    movq (%rdi), %rax
    movq (%rsi), %rdx
    movq %rdx, (%rdi)
    movq %rax, (%rsi)
    ret

Complete form of memory addressing modes: D(Rb, Ri, S) means Mem[Reg[Rb] + S*Reg[Ri] + D]

  • D: Constant "displacement"
  • Rb: Base Register
  • Ri: Index Register
  • S: Scale Factor(1, 2, 4, or 8)

for example:

%rdx %rcx
0xf000 0x0100
  • 0x8(%rdx) = 0xf008
  • (%rdx, %rcx) = 0xf100
  • (%rdx, %rcx, 4) = 0xf400
  • 0x80(,%rdx, 2) = 0x1e080

Arithmetic & Logical Operations

  • leaq $src, $dst

    • computing address without memory reference like p = &x[i]
    • computing arithmetic expression x + k * y
  • addq $src, $dst

  • subq $src, $dst

  • imulq $src, $dst

  • salq $src, $dst

  • sarq $src, $dst

  • shrq $src, $dst

  • xorq $src, $dst

  • andq $src, $dst

  • orq $src, $dst all the above operator operates like dest = dest # src

  • incq $dest

  • decq $dest

  • negq $dest

  • notq $dest

Control

Processor State(x86-64, Partial)

  • Temporary data(%rax, ...)
  • Location of runtime stack(%rsp)
  • Location of current code control point(%rip, instruction point)
  • Status of recent tests(CF, ZF, SF, OF)

Condition Codes

  • Single bit registers
    • CF Carry flag (for unsigned)
    • SF Sign flag (for signed)
    • ZF Zero flag
    • OF Overflow flag (for signed)

Conditional Codes(Implicit Setting)

Implicit setting is codes are set by arithmetic operations(addq, subq, mulq) for example: addq: t = a + b

  • CF set if carry out from most significant bit or unsigned overflow
  • ZF set if t == 0
  • SF set if t < 0 (as signed)
  • OF set if two's-complement overflow or signed overflow (a > 0 && b > 0 && (a + b) < 0) || (a < 0 && b < 0 && (a + b) >= 0)

The codes are not implictly set by leaq, because it is not designed to be used as arithmetic but used as address calculation. so it cannot affect to conditional codes.

Conditional Codes(Explicit Setting)

The codes are set explictly by compare instruction.

cmpq b, a is computing a - b without setting destination.

  • CF set if carry out from most significant bit or unsigned overflow
  • ZF set if a == b or a - b == 0
  • SF set if (a - b) < 0 (as signed)
  • OF set if two's-complement overflow or signed overflow (a > 0 && b > 0 && (a - b) < 0) || (a < 0 && b < 0 && (a - b) >= 0)

And explictly set by test instruction

testq b, a is computing a & b without setting destination.

Sets condition codes based on value of a & b it is useful to have one of the operands be a mask.

  • ZF set when a & b == 0
  • SF set when a & b < 0

Reading Condition Codes

setX: set single byte based on combination of condition codes

setX effect desc
sete ZF Equal / Zero
setne ~ZF Not Equal / Not Zero
sets SF Negative
setns ~SF Nonnegative
setg ~(SF^OF) & ~ZF Greater (signed)
setge ~(SF^OF) Greater or Equal (signed)
setl SF^OF Less (signed)
setle (SF^OF) | ZF Less or Equal (signed)
seta ~CF & ~ZF Above (unsigned)
setb CF Below (unsigned)

it deos not alter remaining bytes of registers. only use 1 byte register(%al, %bl)

cmpq %rsi(y), %rdi(x)  # compare x and y
setg %al         # set when >(greater)
movzbl %al, %eax # move zero extend byte to long
ret

Conditional Branches

Jumping

jX jump to different part of code depending on condition codes.

jX condition desc
jmp 1 Unconditional
je ZF Equal / Zero
jne ~ZF Not Equal / Not Zero
js SF Negative
jns ~SF Nonnegative
jg ~(SF^OF) & ~ZF Greater (signed)
jge ~(SF^OF) Greater or Equal (signed)
jl SF^OF Less (signed)
jle (SF^OF) | ZF Less or Equal (signed)
ja ~CF & ~ZF Above (unsigned)
jb CF Below (unsigned)

Old Style Conditional Branch

long absdiff(long x, long y) {
  long result;
  if (x > y) result = x - y;
  else result = y - x;
  return result;
}
while ! [ -f 3_3.o ]; do sleep .1; done; objdump -d 3_3.o -Msuffix

expressing with goto

long absdiff_j(long x, long y) {
  long result;
  int ntest = x <= y;
  if (ntest) goto Else;
  result = x-y;
  goto Done;
Else:
  result = y-x;
Done:
  return result;
}

Conditional Move

But this branchings are very disruptive to instruction flow through pipelines, Conditional Moves are highly used because they do not require control transfer.

long absdiff(long x, long y) {
  long result;
  if (x > y) result = x - y;
  else result = y - x;
  return result;
}
while ! [ -f 3_5.o ]; do sleep .1; done; objdump -d 3_5.o -Msuffix

However, there are several bad cases for conditional move.

  • expansive computations
val = Test(x) ? Hard1(x) : Hard2(x);

because both values are get computed. only simple computations are effective for conditional moves.

  • risky computations
val = p ? *p : 0;

both values get computed may have undesiarable effects.

  • Computations with side effects
val = x > 0 ? x*=7 : x+=3;

each expression has side-effect.

Loop

do-while

long pcount_do(unsigned long x) {
  long result = 0;
  do {
    result += x & 0x1;
    x >>= 1;
  } while (x);
  return result;
}
long pcount_goto(unsigned long x) {
  long result = 0;
loop:
  result += x & 0x1;
  x >>= 1;
  if (x) goto loop;
  return result;
}
while ! [ -f 3_6.o ]; do sleep .1; done; objdump -d 3_6.o -Msuffix

general do-while translation

do {
  Body
} while (Test);
loop:
  Body
  if (Test) goto loop;

while

general while translation#1

it is called jump-to-middle translation, used with -O0 (or -Og) flag.

while(Test) {
  Body
}
  goto test;
loop:
  Body
test:
  if (Test)
    goto loop;
done:
long pcount_while(unsigned long x) {
  long result = 0;
  while (x) {
    result += x & 0x1;
    x >>= 1;
  }
  return result;
}
echo "jmp-to-middle translation"
while ! [ -f 3_7.o ]; do sleep .1; done; objdump -d 3_7.o -Msuffix

general while translation#2

while to do-while conversion, used with -O1 flag.

while(Test) {
  Body
}
if (!Test) goto done;
do {
  Body
} while (Test);
done:
if (!Test) goto done;
loop:
  Body
  if (Test) goto loop;
done:
long pcount_while(unsigned long x) {
  long result = 0;
  while (x) {
    result += x & 0x1;
    x >>= 1;
  }
  return result;
}
echo "while to do-while conversion"
while ! [ -f 3_8.o ]; do sleep .1; done; objdump -d 3_8.o -Msuffix

for loop form

for (init; test; update) {
  Body
}

for-to-while conversion

for (Init; Test; Update) {
  Body
}
Init;
while(Test) {
  Body
  Update;
}
#include <stddef.h>
#define WSIZE 8 * sizeof(int)

long pcount_for(unsigned long x) {
  size_t i;
  long result = 0;
  for (i = 0; i < WSIZE; i++) {
    unsigned bit = (x >> i) & 0x1;
    result += bit;
  }
  return result;
}
#include <stddef.h>
#define WSIZE 8 * sizeof(int)
long pcount_for(unsigned long x) {
  size_t i;
  long result = 0;
  i = 0;
  while(i < WSIZE) {
    unsigned bit = (x >> i) & 0x1;
    result += bit;
    i++;
  }
  return result;
}
while ! [ -f 3_9.o ]; do sleep .1; done; objdump -d 3_9.o -Msuffix
while ! [ -f 3_10.o ]; do sleep .1; done; objdump -d 3_10.o -Msuffix

for to do-while conversion, initial test can be optimized away.

Switch

Jump Table Structure

Switch form

long switch_eg (long x, long y, long z) {
  long w = 1;
  switch(x) {
  case 1:
    w = y*z;
    break;
  case 2:
    w = y/z;
    /* Fall Through */
  case 3:
    w += z;
    break;
  case 5:
  case 6:
    w -= z;
    break;
  case 7:
    w *= z;
    break;
  default:
    w = 2;
  }
  return w;
}
while ! [ -f 3_11.s ]; do sleep .1; done; cat 3_11.s
## Procedures

Mechanisms in Procedures

  • Passing control
    • to beginning of procedure code
    • back to return point
  • Passing data
    • procedure arguments
    • return value
  • Memory management
    • allocate during procedure execution
    • deallocate upon return

this mechanisms are all implemented with machine instructions. x86-64 implementation of a procedure used only those mechanisms required.

Stack Structure

x86-64 Stack

Region of memory managed with stack discipline. It grows toward lower addresses. %rsp contains lowest stack address(address of top element).

pushq $src

  • fetches operand at src
  • decrement %rsp by 8
  • write operand at address given by %rsp

popq $dest

  • read value at address given by %rsp
  • increment %rsp by 8
  • store value at dest(must be register)

Procedure Control Flow

long mult2(long x, long y) {
  long t = x * y;
  return t;
}

void multstore(long x, long y, long *dest) {
  long t = mult2(x, y);
  *dest = t;
}

while ! [ -f 3_12.o ]; do sleep .1; done; objdump -d 3_12.o -Msuffix

Procedure call call label

  • push return address on stack
  • jmp to label Return address:
  • Address of the next instruction right after call Procedure return: ret