X64汇编入门

这篇文章大抵是翻译的 x64_cheatsheet 这一篇教案, 添加了一些自己的注解，并修复了代码的错误之处

要求对于汇编指令、C的函数栈有一定的了解

1. X64 Register

8-byte register	Bytes 0-3	Bytes 0-1	Byte 0
%rax	%eax	%ax	%al
%rcx	%ecx	%cx	%cl
%rdx	%edx	%dx	%dl
%rbx	%ebx	%bx	%bl
%rsi	%esi	%si	%sil
%rdi	%edi	%di	%dil
%rsp	%esp	%sp	%spl
%rbp	%ebp	%bp	%bpl
%r8	%r8d	%r8w	%r8b
%r9	%r9d	%r9w	%r9b
%r10	%r10d	%r10w	%r10b
%r11	%r11d	%r11w	%r11b
%r12	%r12d	%r12w	%r12b
%r13	%r13d	%r13w	%r13b
%r14	%r14d	%r14w	%r14b
%r15	%r15d	%r15w	%r15b

2. 操作符

Imm refers to a constant value, e.g. 0x8048d8e or 48,
E_x refers to a register, e.g. %rax,
R[E_x] refers to the value stored in register Ex, and
M[x] refers to the value stored at memory address x.

3. X64 指令

操作符后缀

“byte” refers to a one-byte integer (suffix b),
“word” refers to a two-byte integer (suffix w),
“doubleword” refers to a four-byte integer (suffix l), and
“quadword” refers to an eight-byte value (suffix q).

大多数指令（例如 mov）使用后缀来显示操作数的大小。例如，将 quadword 从 %rax 移动到 %rbx 形为 movq %rax, %rbx。有些指令（例如 ret）不使用后缀，因为没有必要。其他的，例如 movs 和 movz 将使用两个后缀，意为它们将第一个后缀类型的操作数转换为第二个后缀的类型。因此，将 %al 中的字节转换为 %ebx 中具有零扩展名的双字的汇编将是 movzbl %al, %ebx。

movzbl: move byte to doubleword, with zero extended

在以下表格中，非注明时，指令只有单个后缀

3.1 数据移动

Instruction	Description	Page #
	Instructions with one suffix
mov S, D	Move source to destination	171
push S	Push source onto stack	171
pop D	Pop top of stack into destination	171
	Instructions with two suffixes
mov S, D	Move byte to word (sign extended)	171
push S	Move byte to word (zero extended)	171
	Instructions with no suffixes
cwtl	Convert word in %ax to doubleword in %eax (sign-extended)	182
cltq	Convert doubleword in %eax to quadword in %rax (sign-extended)	182
cqto	Convert quadword in %rax to octoword in %rdx:%rax	182

3.2 算术操作

非注明时，算术操作均仅包含单个操作数说明符

3.2.1 Unary Operations

Instruction	Description	Page #
inc D	Increment by 1	178
dec D	Decrement by 1	178
neg D	Arithmetic negation	178
not D	Bitwise complement	178

3.2.2 Binary Operations

Instruction	Description	Page #
leaq S, D	Load effective address of source into destination	178
add S, D	Add source to destination	178
sub S, D	Subtract source from destination	178
imul S, D	Multiply destination by source	178
xor S, D	Bitwise XOR destination by source	178
or S, D	Bitwise OR destination by source	178
and S, D	Bitwise AND destination by source	178

3.2.3 Shift Operations

Instruction	Description	Page #
sal/shl k , D	Left shift destination by k bits	179
sar k, D	Arithmetic right shift destination by k bits	179
shr k, D	Logical right shift destination by k bits	179

3.2.4 Special Arithmetic Operations

Instruction	Description	Page #
imulq S	Signed full multiply of %rax by S Result stored in %rdx:%rax	182
mulq S	Unsigned full multiply of %rax by S Result stored in %rdx:%rax	182
idivq S	Signed divide %rdx:%rax by S Quotient stored in %rax Remainder stored in %rdx	182
divq S	Unsigned divide %rdx:%rax by S Quotient stored in %rax Remainder stored in %rdx	182

3.3 Comparison and Test Instructions

比较指令也有单个后缀

Instruction	Description	Page #
cmp S2, S1	Set condition codes according to S1 - S2	185
test S2, S1	Set condition codes according to S1 & S2	185

3.4 Accessing Condition Codes

Condition Codes: http://web.cse.ohio-state.edu/~reeves.92/CSE2421au12/SlidesDay41.pdf
以下指令均无后缀

特殊的suffix

n: not
g: greater
l: less
e: equal
a: above
b: below

setnle : set if not less or equal then, this is same as setg

3.4.1 Conditional Set Instructions

Instruction	Description	Page #
sete / setz D	Set if equal/zero ZF	187
setne / setnz D	Set if not equal/nonzero ~ZF	187
sets D	Set if negative SF	187
setns D	Set if nonnegative ~SF	187
setg / setnle D	Set if greater (signed) ~~(SF^0F)&~~ZF	187
setge / setnl D	Set if greater or equal (signed) ~(SF^0F)	187
setl / setnge D	Set if less (signed) SF^0F	187
setle / setng D	Set if less or equal (SF^OF)	ZF
seta / setnbe D	Set if above (unsigned) ~~CF&~~ZF	187
setae / setnb D	Set if above or equal (unsigned) ~CF	187
setb / setnae D	Set if below (unsigned) CF	187
setbe / setna D	Set if below or equal (unsigned) CF	ZF

3.4.2 Jump Instructions

Instruction	Description	Page #
jmp Label	Jump to label	189
jmp *Operand	Jump to specified location	189
je / jz Label	Jump if equal/zero ZF	189
jne / jnz Label	Jump if not equal/nonzero ~ZF	189
js Label	Jump if negative SF	189
jns Label	Jump if nonnegative ~SF	189
jg / jnle	Label Jump if greater (signed) ~~(SF^0F)&~~ZF	189
jge / jnl	Label Jump if greater or equal (signed) ~(SF^0F)	189
jl / jnge	Label Jump if less (signed) SF^0F	189
jle / jng	Label Jump if less or equal (SF^OF)	ZF
ja / jnbe	Label Jump if above (unsigned) ~~CF&~~ZF	189
jae / jnb	Label Jump if above or equal (unsigned) ~CF	189
jb / jnae	Label Jump if below (unsigned) CF	189
jbe / jna	Label Jump if below or equal (unsigned) CF	ZF

3.4.3 Conditional Move Instructions

条件数据移动指令均为后缀，但隐式要求两个操作数有相同的大小

Instruction	Description	Page #
cmove / cmovz S, D	Move if equal/zero ZF	206
cmovne / cmovnz S, D	Move if not equal/nonzero ~ZF	206
cmovs S, D	Move if negative SF	206
cmovns S, D	Move if nonnegative ~SF	206
cmovg / cmovnle S, D	Move if greater (signed) ~~(SF^0F)&~~ZF	206
cmovge / cmovnl S, D	Move if greater or equal (signed) ~(SF^0F)	206
cmovl / cmovnge S, D	Move if less (signed) SF^0F	206
cmovle / cmovng S, D	Move if less or equal (SF^OF)	ZF
cmova / cmovnbe S, D	Move if above (unsigned) ~~CF&~~ZF	206
cmovae / cmovnb S, D	Move if above or equal (unsigned) ~CF	206
cmovb / cmovnae S, D	Move if below (unsigned) CF	206
cmovbe / cmovna S, D	Move if below or equal (unsigned) CF	ZF

3.5 Procedure Call Instruction

调用指令无后缀

Instruction	Description	Page #
call Label	Push return address and jump to label	221
call *Operand	Push return address and jump to specified location	221
leave	Set %rsp to %rbp, then pop top of stack into %rbp	221
ret	Pop return address from stack and jump there	221

4. Coding Practices

4.1 Commenting

建议在合适的位置添加注释

4.2 Arrays

数组作为连续的数据块存储在内存中。通常，数组变量等价于指向内存中数组第一个元素的指针。要访问给定的数组元素，请将索引值乘以元素大小并添加到数组指针。例如，如果 arr 是一个整数数组，则语句：

1	arr[i] = 3;

可以被解释为如下x86-64汇编 (假设数组地址与索引值分别存储在 %rax，%rcx):

1	movq $3, (%rax, %rcx, 8)

4.3 Register Usage

x86-64 中有 16 个 64 位寄存器：%rax、%rbx、%rcx、%rdx、%rdi、%rsi、%rbp、%rsp 和 %r8-r15。其中，%rax、%rcx、%rdx、%rdi、%rsi、%rsp 和 %r8-r11 被视为调用者保存寄存器(caller-save registers)，这意味着它们不一定在函数调用之间保存。按照约定，%rax 用于存储函数的返回值（如果存在且长度不超过 64 位）。（较大的返回类型（如结构）是使用栈返回的）。寄存器 %rbx、%rbp 和 %r12-r15 是被调用者保存寄存器（callee-save registers），这意味着它们在函数调用之间保存。寄存器%rsp用作栈指针，指向栈中最顶层元素的指针。

此外，%rdi、%rsi、%rdx、%rcx、%r8 和 %r9 用于将前六个整数或指针参数传递给被调用的函数。附加参数（或大参数，例如按值传递的结构）在栈上传递。

在 32 位 x86 中，基指针（以前是 %ebp，现在是 %rbp）用于跟踪当前栈帧的基地址，被调用函数将在更新基指针之前保存其调用者的基指针到它自己的栈帧。随着 64 位体系结构的出现，这种情况基本上已被消除，除了一些特殊情况，即编译器无法提前确定需要为特定函数分配多少栈空间（请参阅动态栈分配）。

4.4 Stack Organization and Function Calls

4.4.1 Calling a Function

要调用函数，程序应将前六个整数或指针参数放入寄存器 %rdi、%rsi、%rdx、%rcx、%r8 和 %r9 中；后续参数（或大于 64 位的参数）应压入栈，第一个参数位于最上面。然后程序应该执行调用指令，该指令会将返回地址压入栈并跳转到指定函数的开头。

事实上这里的参数压栈跟函数调用约定有关系

Example:

# Call foo(1, 15)
movq $1, %rdi # Move 1 into %rdi
Movq $15, %rsi # Move 15 into %rsi
call foo # Push return address and jump to label foo

函数若有返回值，约定存储在 %rax（如果大于8字节，则压栈返回）

4.4.2 Writing a Function

x64 程序使用称为栈的内存区域来支持函数调用。顾名思义，该区域被组织为栈数据结构，栈的“顶部”向较低的内存地址增长。对于每个函数调用，都会在栈上创建新空间来存储局部变量和其他数据。这称为栈帧。为此，您需要在每个函数的开头和结尾编写一些代码来创建和销毁栈帧。
Setting Up: 当执行调用指令时，后续指令的地址作为返回地址被压入栈，并且控制权传递到指定的函数。如果函数要使用任何被调用者保存寄存器（%rbx、%rbp 或 %r12-r15），则应将每个寄存器的当前值压入栈，以便在最后恢复。例如:

1
2
3

Pushq %rbx
pushq %r12
pushq %r13

最后，可以在栈上为局部变量分配额外的空间。虽然可以根据需要在函数体中在栈上腾出空间，但在函数开始时一次性分配该空间通常会更有效。这可以使用调用subq $N, %rsp来完成，其中 N 是被调用者栈帧的大小。
For example:

1	subq $0x18, %rsp # Allocate 24 bytes of space on the stack

这被称作 function prologue.

使用栈帧：一旦设置了栈帧，就可以使用它来存储和访问局部变量：

无法容纳在寄存器中的参数（例如结构体）将在调用指令之前被推送到栈上，并且可以相对于 %rsp 进行访问。请记住，以这种方式引用参数时，您需要考虑栈帧的大小。
如果函数有超过六个整数或指针参数，它们也将被压入栈。
对于任何栈参数，编号较小的参数将更接近栈指针。也就是说，在适用时，参数按从右到左的顺序推送。
当从 %rsp 中减去一些量时，局部变量将存储在函数序言中分配的空间中。这些的组织由程序员决定。

Cleaning Up: 函数体完成并将返回值（如果有）放入 %rax 后，函数必须将控制权返回给调用者，将栈恢复到调用时的状态。首先，被调用者通过将相同的值添加到栈指针来释放其分配的栈空间：

addq $0x18, %rsp # Give back 24 bytes of stack space

# Then, it pops off the registers it saved earlier
popq %r13 # Remember that the stack is FILO!
popq %r12
popq %rbx

# Finally, the program should return to the call site, using the ret instruction:
ret

Summary: 合起来，一个函数的汇编应该如下:

foo:
pushq %rbx # Save registers, if needed
pushq %r12
pushq %r13
subq $0x18, %rsp # Allocate stack space
# Function body
addq $0x18, %rsp # Deallocate stack space
popq %r13 # Restore registers
popq %r12
popq %rbx
ret # Pop return address and return control to caller

4.4.3 Dynamic stack allocation

你可能会发现，为函数提供静态数量的栈空间并不能完全减少它。在这种情况下，我们需要借用 32 位 x86 的传统，将栈帧的基址保存到基址指针寄存器中。由于 %rbp 是一个被调用者保存寄存器，因此在更改它之前需要保存它。
如此，函数 prologue 应该如下开始:

1 2	pushq %rbp movq %rsp, %rbp

承上的，epilogue 应该放置如下代码在 ret之前:

1 2	movq %rbp, %rsp popq %rbp

这也可以通过一条称为leave的指令来完成。epilogue 确保无论您对函数体中的栈指针执行什么操作，返回时总会将其返回到正确的位置。请注意，这意味着您不再需要在尾声中添加栈指针。

这是一个在执行期间分配 8-248 字节随机栈空间的函数示例：

pushq %rbp # Use base pointer
movq %rsp, %rbp
pushq %rbx # Save registers
pushq %r12
subq $0x18, %rsp # Allocate some stack space
...
call rand # Get random number
andq $0xF8, %rax # Make sure the value is 8-248 bytes and
# aligned on 8 bytes
subq %rax, %rsp # Allocate space
...
movq (%rbp), %r12 # Restore registers from base of frame
movq 0x8(%rbp), %rbx
movq %rbp, %rsp # Reset stack pointer and restore base
# pointer
popq %rbp
ret

此处恢复r12 等寄存器的代码有错误,请见动态栈帧分配

这种行为可以通过调用像 alloca 这样的伪函数从 C 代码中访问，它根据其参数分配栈空间。

alloca: The alloca() function allocates size bytes of space in the stack frame of the caller. This temporary space is automatically freed when the function that called alloca() returns to its caller.

5. 注

动态栈帧分配

在gcc上可用的cpp嵌入式汇编：

extern "C" int func(int x);
asm(R"(
.globl func
    func:

    # prologue
    pushq   %rbp
    movq    %rsp,%rbp   # save stack pointer
    pushq   %rbx        # registers will be used

    # epilogue
    movq    -0x08(%rbp), %rbx
    movq    %rbp, %rsp
    popq    %rbp
    movl    $0x10, %eax # return 16
    ret

)");

引用

https://cs.brown.edu/courses/cs033/docs/guides/x64_cheatsheet.pdf
https://cons.mit.edu/fa17/x86-64-architecture-guide.html