15 KiB
Machine Level Programming
History of Intel Processors
-
Eveolutionary design: Backwards compatible up until
8086in 1978 -
Complex Instruction Set Computer (CISC)
RISC vs CISC
- CISC has variable length instructions
- RISC has constant length instructions
- Intel x86(8086)
- IA32 to IA64
- (after x86-64) EM64T(almost same as AMD x86-64)
- AMD x86-64
C, Assembly, machine code
- Architecture: The parts of a processor design that one needs to understand or write assembly/machine code
- ISA(Instruction Set Architecture)
- e.g., x86, IA32, Itanium, x86-64, ARM
- ISA(Instruction Set Architecture)
- Microarchitecture: Implementation of the architecture
form of code:
- Machine Code: the byte-level programs that a processor executes
- Assembly Code: A text representation of machine code
Assembly/Machine Code View
Programmer-Visible State (shown by ISAs)
- PC(Program Counter)
- Address of next instruction
- RIP in (x86-64)
- Register file
- Heavily used program data
- Condition codes
- store status information about most recent arithmetic or logical op
- Used for conditional branching
- Memory(external)
Compiling Into Assembly
long plus(long x, long y);
void sumstore(long x, long y, long *dest) {
long t = plus(x, y);
*dest = t;
}
while ![ -r 3_1.o ]; do sleep .1; done; objdump -d 3_1.o
Integer Registers
-
In x86-64
axbxcxdxsidispbp(in 8bytesr4bytese)r8r9r10r11r12r13r14r15(in 4bytes: addd) -
In IA32
eax(32bit): 16bitax(ah,al); origin: accumulateecx(32bit): 16bitcx(ch,cl); origin: counteredx(32bit): 16bitbx(bh,bl); origin: dataebx(32bit): 16bitdx(dh,dl); origin: baseesi(32bit): 16bitsi(sih,sil); origin: source indexedi(32bit): 16bitdi(dil,dil); origin: destination indexesp(32bit): 16bitsp(spl,spl); origin: stack pointerebp(32bit): 16bitbp(bpl,bpl); origin: base pointer
understanding movq
In x86asm, There are three operand types: immediate, register, memory
- immediate: constant integer data like
$0x400$-533 - register: one of 16 integer regs
- e.g.,
%rax,%r13 - but
%rspis reserved for special use
- e.g.,
- memory: 8 bytes of memory at address given by register
- e.g.,
(%rax)
- e.g.,
movq usage
movq $src, $dest
- limit: Cannot do memory-memory transfer with a single instruction(because memory is external device to cpu)
memory addressing modes
(R) means Mem[Reg[R]]
D(R) means Mem[Reg[R]+D], constant displacement D specifies offset
movq 8(%rbp), %rdx
|
|
Complete form of memory addressing modes:
D(Rb, Ri, S) means Mem[Reg[Rb] + S*Reg[Ri] + D]
D: Constant "displacement"Rb: Base RegisterRi: Index RegisterS: Scale Factor(1, 2, 4, or 8)
for example:
%rdx |
%rcx |
|---|---|
0xf000 |
0x0100 |
0x8(%rdx)=0xf008(%rdx, %rcx)=0xf100(%rdx, %rcx, 4)=0xf4000x80(,%rdx, 2)=0x1e080
Arithmetic & Logical Operations
-
leaq $src, $dst- computing address without memory reference like
p = &x[i] - computing arithmetic expression
x + k * y
- computing address without memory reference like
-
addq $src, $dst -
subq $src, $dst -
imulq $src, $dst -
salq $src, $dst -
sarq $src, $dst -
shrq $src, $dst -
xorq $src, $dst -
andq $src, $dst -
orq $src, $dstall the above operator operates likedest = dest # src -
incq $dest -
decq $dest -
negq $dest -
notq $dest
Control
Processor State(x86-64, Partial)
- Temporary data(
%rax, ...) - Location of runtime stack(
%rsp) - Location of current code control point(
%rip, instruction point) - Status of recent tests(
CF,ZF,SF,OF)
Condition Codes
- Single bit registers
CFCarry flag (for unsigned)SFSign flag (for signed)ZFZero flagOFOverflow flag (for signed)
Conditional Codes(Implicit Setting)
Implicit setting is codes are set by arithmetic operations(addq, subq, mulq)
for example: addq: t = a + b
CFset if carry out from most significant bit or unsigned overflowZFset ift == 0SFset ift < 0(as signed)OFset if two's-complement overflow or signed overflow(a > 0 && b > 0 && (a + b) < 0) || (a < 0 && b < 0 && (a + b) >= 0)
The codes are not implictly set by leaq, because it is not designed to be used as arithmetic but used as address calculation. so it cannot affect to conditional codes.
Conditional Codes(Explicit Setting)
The codes are set explictly by compare instruction.
cmpq b, a is computing a - b without setting destination.
CFset if carry out from most significant bit or unsigned overflowZFset ifa == bora - b == 0SFset if(a - b) < 0(as signed)OFset if two's-complement overflow or signed overflow(a > 0 && b > 0 && (a - b) < 0) || (a < 0 && b < 0 && (a - b) >= 0)
And explictly set by test instruction
testq b, a is computing a & b without setting destination.
Sets condition codes based on value of a & b it is useful to have one of the operands be a mask.
ZFset whena & b == 0SFset whena & b < 0
Reading Condition Codes
setX: set single byte based on combination of condition codes
| setX | effect | desc |
|---|---|---|
sete |
ZF |
Equal / Zero |
setne |
~ZF |
Not Equal / Not Zero |
sets |
SF |
Negative |
setns |
~SF |
Nonnegative |
setg |
~(SF^OF) & ~ZF |
Greater (signed) |
setge |
~(SF^OF) |
Greater or Equal (signed) |
setl |
SF^OF |
Less (signed) |
setle |
(SF^OF) | ZF |
Less or Equal (signed) |
seta |
~CF & ~ZF |
Above (unsigned) |
setb |
CF |
Below (unsigned) |
it deos not alter remaining bytes of registers. only use 1 byte register(%al, %bl)
cmpq %rsi(y), %rdi(x) # compare x and y
setg %al # set when >(greater)
movzbl %al, %eax # move zero extend byte to long
ret
Conditional Branches
Jumping
jX jump to different part of code depending on condition codes.
| jX | condition | desc |
|---|---|---|
jmp |
1 | Unconditional |
je |
ZF |
Equal / Zero |
jne |
~ZF |
Not Equal / Not Zero |
js |
SF |
Negative |
jns |
~SF |
Nonnegative |
jg |
~(SF^OF) & ~ZF |
Greater (signed) |
jge |
~(SF^OF) |
Greater or Equal (signed) |
jl |
SF^OF |
Less (signed) |
jle |
(SF^OF) | ZF |
Less or Equal (signed) |
ja |
~CF & ~ZF |
Above (unsigned) |
jb |
CF |
Below (unsigned) |
Old Style Conditional Branch
long absdiff(long x, long y) {
long result;
if (x > y) result = x - y;
else result = y - x;
return result;
}
while ![ -r 3_3.o ]; do sleep .1; done; objdump -d 3_3.o -Msuffix
expressing with goto
long absdiff_j(long x, long y) {
long result;
int ntest = x <= y;
if (ntest) goto Else;
result = x-y;
goto Done;
Else:
result = y-x;
Done:
return result;
}
Conditional Move
But this branchings are very disruptive to instruction flow through pipelines, Conditional Moves are highly used because they do not require control transfer.
long absdiff(long x, long y) {
long result;
if (x > y) result = x - y;
else result = y - x;
return result;
}
while ![ -r 3_5.o ]; do sleep .1; done; objdump -d 3_5.o -Msuffix
However, there are several bad cases for conditional move.
- expansive computations
val = Test(x) ? Hard1(x) : Hard2(x);
because both values are get computed. only simple computations are effective for conditional moves.
- risky computations
val = p ? *p : 0;
both values get computed may have undesiarable effects.
- Computations with side effects
val = x > 0 ? x*=7 : x+=3;
each expression has side-effect.
Loop
do-while
|
|
while ![ -r 3_6.o ]; do sleep .1; done; objdump -d 3_6.o -Msuffix
general do-while translation
|
|
while
general while translation#1
it is called jump-to-middle translation, used with -O0 (or -Og) flag.
|
|
long pcount_while(unsigned long x) {
long result = 0;
while (x) {
result += x & 0x1;
x >>= 1;
}
return result;
}
echo "jmp-to-middle translation"
while ![ -r 3_7.o ]; do sleep .1; done; objdump -d 3_7.o -Msuffix
general while translation#2
while to do-while conversion, used with -O1 flag.
|
|
|
long pcount_while(unsigned long x) {
long result = 0;
while (x) {
result += x & 0x1;
x >>= 1;
}
return result;
}
echo "while to do-while conversion"
while ![ -r 3_8.o ]; do sleep .1; done; objdump -d 3_8.o -Msuffix
for loop form
|
for-to-while conversion
|
|
|
|
|
|
for to do-while conversion, initial test can be optimized away.
Switch
Jump Table Structure
Switch form
|
|
Mechanisms in Procedures
- Passing control
- to beginning of procedure code
- back to return point
- Passing data
- procedure arguments
- return value
- Memory management
- allocate during procedure execution
- deallocate upon return
this mechanisms are all implemented with machine instructions. x86-64 implementation of a procedure used only those mechanisms required.
Stack Structure
x86-64 Stack
Region of memory managed with stack discipline. It grows toward lower addresses. %rsp contains lowest stack address(address of top element).
pushq $src
- fetches operand at src
- decrement
%rspby 8 - write operand at address given by
%rsp
popq $dest
- read value at address given by
%rsp - increment
%rspby 8 - store value at dest(must be register)
Procedure Control Flow
long mult2(long x, long y) {
long t = x * y;
return t;
}
void multstore(long x, long y, long *dest) {
long t = mult2(x, y);
*dest = t;
}
while ![ -r 3_12.o ]; do sleep .1; done; objdump -d 3_12.o -Msuffix
Procedure call call label
- push return address on stack
- jmp to label Return address:
- Address of the next instruction right after call
Procedure return:
ret
Procedure Data Flow
- registers
- first 6 args:
%rdi,%rsi,%rdx,%rcx,%r8,%r9 - return value:
rax
- first 6 args:
- stack
for example with above example
while ![ -r 3_12.o ]; do sleep .1; done; objdump -d 3_12.o -Msuffix
- with above
mult2variabletis already stored in%rax - so
movq %rax,(%rbx)where%rbxislong*dest
Managing local data
Stack-Based Languages
In languages that support recursion
- Code must be "reentrant", which means multiple simultaneous instantiations of single procedure.
- Need some place to store state of each instantiation: (args, local variables, return pointer)
In order to get this, stack discipline is used. state for given procedure needed for limited time(from called to return): Calle returns before caller does.
Stack allocated in frames, state for single procdure instantiation. When function is called, a new stack frame is created above stack top. And then when the function is returned, a corresponding frame is popped. and return to previous call state.
Stack Frame
is consist of return information, local storage(if needed) and temporary space(if needed).
%rbpframe pointer%rspstack pointer
Space allocated when enter procedure, "set-up" code and includes push by call.
Deallocated when return, "finish" code and includes pop by ret.
x86-64/Linux Stack Frame
- for compile w/o stack canary, add option
-fno-stack-protector
long incr(long *p, long val) {
long x = *p;
long y = x + val;
*p = y;
return x;
}
long call_incr() {
long v1 = 15213;
long v2 = incr(&v1, 3000);
return v1 + v2;
}
while ![ -r 3_13.o ]; do sleep .1; done; objdump -d 3_13.o -Msuffix