Assembly
Introduction
Assembly is the lowest level of language (just above binary).
It is platform specific (different for each CPU architecture: x86, x86-64, ARM, ...).
It was used to write code close to the hardware, but is now mostly used for reverse engineering (understanding how malwares work).
It is one of fastest language, with C, but also the most difficult to write & maintain. Today, some C compilers can compile C code directly to binary, without using assembly. That's why assembly is less used than before.
Types
Byte
db
1 byte
Word
dw
2 bytes
Double word
dd
4 bytes
Quadruple word
dq
8 bytes
Ten bytes
dt
10 bytes
Example:
choice DB 'y'
number DW 65535
negative DD -2147483648
big DQ 9223372036854775807
very_big DT 18446744073709551615
Note:
Characters are stored as its ASCII value in hexadecimal;
There is no signed & unsigned types differentiation;
Negative values are automatically converted to its 2's complement representation;
There is no floating point & integer types differentiation;
float & double are represented using 32 bits & 64 bits respectively.
Arrays
Manual definition:
name: type value1, value2, ...
Automatic definition:
name: times number_of_elements type value
Note: ALL elements are initialized to value
.
Uninitialized definition:
name: res_type number_of_elements
Example:
array1: db 1, 2, 3, 4, 5 ; Define entire array manually
array2: times 5 db 0 ; 5 elements of type byte initialized to 0
array3: resb 5 ; 5 elements of type byte uninitialized
Format
%d
DWORD
%hd
WORD
%hhd
BYTE
Input
rdi
is used to store the format of the input (e.g.%d
).rsi
is used to store the address of the variable where the input will be stored.rax
is equal to1
if we scan a float,0
otherwise.
extern scanf
global main
section .data
fmt_scan: db '%hd', 0 ; 0 is the null terminator (end of string)
section .bss
user_input: resb 2 ; resb = reserve bytes for uninitialized data
section .text
main:
mov rdi, fmt_scan
mov rsi, user_input
mov rax, 0
call scanf
Output
rdi
is used to store the format of the output (e.g.%d
).rsi
is used to store the value to be printed.rax
is equal to1
if we print a float,0
otherwise.
extern printf
global main
section .data
fmt_print db 'The number is %hhd', 10, 0 ; 10 is the newline character
number db 10
main:
mov rdi, fmt_print
mov rsi, number
mov rax, 0
call printf
Conditions
cmp
is used to compare two values. It subtracts the second value from the first one and sets the flags ZF
, SF
, CF
& OF
. Then we can use j
instructions to jump to a label if a condition is true.
Equal
je
signed or unsigned
Not equal
jne
signed or unsigned
Above
ja
unsigned
Above or equal
jae
unsigned
Below
jb
unsigned
Below or equal
jbe
unsigned
Greater
jg
signed
Greater or equal
jge
signed
Lower
jl
signed
Lower or equal
jle
signed
greater:
mov eax, 1
main:
cmp eax, ebx
jg greater
Compilation
First, we need to assemble the assembly code to get the object file:
nasm -f elf64 -o file.o file.asm
-f elf64
is used to specify the format of the object file.elf64
is used for x86-64.-o file.o
is used to specify the object file.The last argument is the assembly code file.
Then, we need to link the object file to get the executable:
gcc -m64 -no-pie -o file file.o
-m64
is used to specify the architecture of the executable.64
is used for x86-64.-no-pie
is used to specify that we don't want a position independent executable.-o file
is used to specify the executable name we will create.The last argument is the object file.
Notes
If
[rax]
is used but no valid memory address is stored inrax
, there is a segmentation fault.[rax + 5]
means that we take the value stored 5 bytes after the address stored inrax
. Ifrax
is0x1000
, then[rax + 5]
is the value stored at0x1005
. Same for-
.
References
Last updated