Getting started with x86ASM!

A lot of people learned x86 assembly from a mix of debugger output, forum posts, scattered Stack Overflow answers, and Félix Cloutier's reference pages. That reference is still incredibly useful, but moments like a temporary outage are a good reminder: if your understanding depends on one convenience site, you're borrowing confidence instead of building it.

So this post is meant to be the kind of article I wish more people had when they first touched assembly seriously: not just a list of mnemonics, but a structured mental model for how x86 works in practice. We are going to cover the registers, how memory is actually addressed, binary and arithmetic operations, the role of flags, and the basics of XMM registers for scalar and packed work. The goal is not to turn this into a full ISA manual. The goal is to make the manual readable afterwards.

If you can read a line like mov rax, [rbx+rcx*4+20] and immediately understand what it reads, where it reads it from, and why somebody wrote it that way, you are already far beyond "just memorizing mnemonics".

Start here: what assembly actually is

Assembly is a human-readable representation of machine instructions. Every instruction is still just encoded bytes, but assembly gives those bytes names, operands, and structure. In x86, that structure grew over decades, which is why the architecture feels both powerful and occasionally chaotic. You are not looking at something designed in one clean pass. You are looking at a living fossil that kept evolving and somehow still works extremely well.

x86 code mostly moves data, transforms data, compares data, and changes control flow. That sounds simple, but nearly every "hard" reverse-engineering problem eventually becomes one of these questions:

Where does this value come from?
Which register or memory location holds it right now?
How is it transformed?
Which flags or comparisons decide the next branch?
Is the value treated as integer bits, signed data, unsigned data, or floating-point data?

Once you learn to ask those questions while reading code, assembly becomes much less mysterious.

The register model: what registers exist?

Registers are tiny storage locations inside the CPU that instructions operate on directly. They are the fastest and most convenient places to keep values, which is why compilers try very hard to keep frequently used data in registers instead of repeatedly touching memory.

General-purpose registers

In 64-bit mode, the core general-purpose register set is:

RAX RBX RCX RDX
RSI RDI RSP RBP
R8  R9  R10 R11 R12 R13 R14 R15

Each of these has smaller sub-register views. For example:

RAX  = 64-bit
EAX  = low 32-bit part of RAX
AX   = low 16-bit part of RAX
AH/AL = high/low 8-bit halves of AX

The same pattern exists for many of the classic registers: RBX → EBX → BX → BH/BL, RCX → ECX → CX → CH/CL, RDX → EDX → DX → DH/DL. The newer registers also have narrower forms, but their byte registers follow the modern naming style: R8B, R9D, R12W, and so on.

One detail that matters a lot

In 64-bit mode, writing to a 32-bit register like EAX zero-extends into the full 64-bit register. So mov eax, 1 leaves RAX = 1, not just the low half changed. That behavior is extremely important in compiler output and optimization.

Which register is used for what?

Architecturally, most general-purpose registers can do most general-purpose work. But in real code, some patterns show up constantly because of calling conventions, instruction encodings, or long-standing convention.

RAX

Think of RAX as the "default result" register. Function return values commonly come back in RAX. Some instructions also treat it specially. Multiplication and division are the classic examples: mul, imul, div, and idiv often use implicit operands in RAX and RDX.

RBX

RBX is just a normal general-purpose register, but in many calling conventions it is callee-saved, which makes it a common place to keep values that need to survive function calls.

RCX

Historically, RCX is the count register. Shift and rotate instructions can use CL as the variable count. String instructions also traditionally use it as a repeat counter. On Windows x64, it is also the first integer/pointer argument register for function calls.

RDX

RDX is often paired with RAX for wider results or wider dividends. It is also the second integer/pointer argument register on Windows x64.

RSI and RDI

These are historically source and destination index registers. You will still see that in string instructions like movsb, cmpsb, or stosb. In modern compiler output they are also just normal registers, but their names still hint at their original role.

RSP

This is the stack pointer. Treat it as special. It tracks the current top of the stack, and a lot of code depends on it being correct at all times. You can technically do arithmetic on it, and function prologues do exactly that, but random misuse of RSP is one of the fastest ways to break code.

RBP

Traditionally the frame pointer. In optimized code it may be used as a normal register, but in debug builds and hand-written code it often anchors stack-frame access: [rbp-20], [rbp+10], and similar patterns.

R8-R15

These are the extra 64-bit general-purpose registers added with x86-64. They drastically reduced register pressure compared to old 32-bit code. In modern assembly and compiler output, they are everywhere.

Other important register groups

General-purpose registers are only part of the picture. There are also:

RIP » the instruction pointer.
RFLAGS » the flags register that stores condition bits like zero, carry, sign, and overflow.
XMM registers » 128-bit SIMD/floating-point registers.
YMM/ZMM registers » wider AVX/AVX-512 extensions layered on top of the XMM family.
Segment, control, debug, and system registers » very important in system programming, much less important for ordinary application code.

Flags: the invisible side effects that drive branches

A huge part of x86 is not just what a result is, but which flags got updated while producing it. Many arithmetic and logic instructions write condition bits into RFLAGS, and branch instructions then consume those bits.

The most important flags for everyday reading are:

ZF » Zero Flag. Set when a result is zero.
CF » Carry Flag. Important for unsigned overflow and carries/borrows.
SF » Sign Flag. Mirrors the sign bit of the result.
OF » Overflow Flag. Important for signed overflow.
PF » Parity Flag. Exists, but is much less central in everyday code reading.

This is why cmp and test matter so much. They usually do not "store a result" anywhere you can see. They update flags, and then the next conditional jump uses those flags.

cmp rax, rbx
je  equal_path
jg  greater_path

Read that as: compare the two values by subtracting conceptually, keep the flags, then branch according to the flag outcome.

Binary operations: working on bits directly

Binary operations are where assembly starts to feel very "close to the metal". Instead of thinking in decimal arithmetic, you think in raw bit patterns. This matters constantly in reverse engineering because flags, masks, permissions, packed fields, and state values are often encoded as bits.

AND, OR, XOR, NOT

These are the fundamental logic operations.

AND

and keeps only the bits that are set in both operands. It is commonly used for masking.

and eax, 0xFF

This keeps only the low 8 bits of EAX and clears the rest. In practice, this is how you extract bit ranges, enforce alignment, or check whether a flag bit is present.

OR

or sets a bit if either operand has it set.

or eax, 0x20

This forces bit 5 on, regardless of its previous state.

XOR

xor sets a bit when the operands differ. Same bit in both operands means zero; different means one.

xor eax, eax

This is the classic zeroing idiom. It sets EAX to zero efficiently, and because it writes a 32-bit register, the full RAX becomes zero in 64-bit mode as well.

xor is also used to toggle flags:

xor eax, 0x4

That flips bit 2. If it was 0, it becomes 1. If it was 1, it becomes 0.

NOT

not inverts every bit.

not eax

Every 0 becomes 1, every 1 becomes 0. This is less common in ordinary application code than and, or, or xor, but it shows up in low-level transformations and bit-twiddling.

TEST: the "check bits without keeping a result" instruction

test behaves like an and for flag-setting purposes, but it discards the result. That makes it perfect for "does this bit pattern contain X?" style checks.

test eax, eax
jz   value_was_zero

This is one of the most common patterns you will ever see. It effectively asks: is EAX zero? If yes, jump.

test eax, 0x8
jnz  flag_is_set

That checks whether bit 3 is set.

Shifts and rotates

Shifts move bits left or right. Rotates move bits around in a circle.

SHL / SAL

Left shift. Bits move left, zeros enter from the right.

shl eax, 1

For unsigned arithmetic, this is equivalent to multiplying by 2, as long as you ignore overflow.

SHR

Logical right shift. Bits move right, zeros enter from the left.

shr eax, 1

For unsigned values, this is like dividing by 2 and rounding down.

SAR

Arithmetic right shift. Bits move right, but the sign bit is replicated from the left. That matters for signed values.

sar eax, 1

This preserves the sign of negative numbers in a way shr does not.

ROL / ROR

Rotate left and rotate right. Bits that fall off one side wrap around to the other side. These show up in hashing, encryption, obfuscation, and some bit-packed state transformations.

Important distinction

shr and sar are not interchangeable. If a value is signed, using the logical right shift can completely destroy the meaning of the number.

Bit-focused instructions you should recognize

Beyond the basic logic operators, x86 has many instructions for dealing with specific bits:

bt » test a specific bit
bts » test and set a bit
btr » test and reset a bit
btc » test and complement a bit
bsf » bit scan forward
bsr » bit scan reverse
popcnt » count set bits

You do not need to memorize all of them immediately. But when you see them, think: this code is not handling "numbers" in the everyday sense, it is handling structure encoded into bits.

Arithmetic operations: the obvious part that still has traps

add, sub, inc, dec, imul, mul, idiv, and div look straightforward at first glance. But arithmetic in assembly always forces one extra question: signed or unsigned?

The same bit pattern can mean very different things depending on interpretation. 0xFFFFFFFF can be 4,294,967,295 as unsigned, or -1 as signed. The CPU does not care about your intent. The instruction does.

ADD, SUB, CMP

add and sub do what you expect and update flags. cmp performs a subtraction for flag-setting purposes without storing the numeric result.

add eax, 5
sub rbx, rcx
cmp rdx, 10

In reverse engineering, cmp is often more interesting than add, because it usually precedes the branch that explains program behavior.

Multiplication and division

This is where implicit operands matter.

A simple one-operand mul or div often does not just use whatever register you feel like. The architecture uses fixed registers for part of the input and output. For example, integer division in many forms uses a dividend spread across RDX:RAX, then writes quotient and remainder back into those same registers.

That is why you often see preparation instructions like:

xor edx, edx
mov eax, 100
mov ecx, 7
div ecx

Here the dividend is formed in EDX:EAX. Clearing EDX makes the high half zero before an unsigned divide. For signed division, code commonly uses sign-extension helpers like cdq or cqo.

LEA: not a memory load, but an address arithmetic tool

lea means "load effective address", but beginners often misunderstand it. It does not read memory. It computes the address expression and places the resulting number into a register.

lea rax, [rbx+rcx*4+20]

This means:

rax = rbx + (rcx * 4) + 20

No dereference happens here. That is exactly why compilers love lea: it gives them fast arithmetic using the addressing hardware.

So when you see lea, mentally translate it as "compute this formula", not "load from memory".

Memory operations: where most beginners get lost

Registers are simple because the data is directly named. Memory is harder because the instruction usually tells you how to compute an address, not just which variable it is.

This is the central mental model:

[something]

Brackets mean dereference. They mean: treat the value inside as an address, then access memory at that address.

MOV and basic reads/writes

mov rax, [rbx]
mov [rcx], rdx
mov eax, [rbp-4]
mov [rsp+20], eax

These examples mean:

mov rax, [rbx] » read 8 bytes from memory at the address in RBX, store them in RAX.
mov [rcx], rdx » write 8 bytes from RDX into memory at the address in RCX.
mov eax, [rbp-4] » read 4 bytes from stack-frame memory 4 bytes below RBP.
mov [rsp+20], eax » write 4 bytes into memory 0x20 bytes above the current stack pointer.

The operand size matters. eax implies 4 bytes. rax implies 8 bytes. xmm0 implies a different class of access entirely.

Addressing form: base + index * scale + displacement

One of x86's most important addressing patterns is:

[base + index*scale + displacement]

Example:

mov eax, [rbx+rcx*4+10]

That computes an address like this:

address = rbx + (rcx * 4) + 0x10

Then it reads 4 bytes from that address into EAX.

This pattern is everywhere because it maps naturally to arrays and structures. If RBX is the start of an array and each element is 4 bytes wide, then RCX*4 selects element RCX. Add a displacement and suddenly you are indexing a field inside a structure inside an array.

Arrays, pointers, and structures

Consider this C-like structure:

struct Player
{
    int hp;        // offset 0x00
    int mana;      // offset 0x04
    float speed;   // offset 0x08
};

If RBX holds a pointer to a Player, then:

mov eax, [rbx]       ; hp
mov ecx, [rbx+4]     ; mana
movss xmm0, [rbx+8]  ; speed

That is how structure access looks in assembly: base pointer plus field offset.

For arrays of structures, you combine element size and field offset:

mov eax, [rdi+rsi*12+4]

If each structure is 12 bytes, RSI is the index, RDI is the base address, and +4 is the offset of the field inside each element.

Size extension: MOVZX, MOVSX, MOVSXD

Sometimes memory holds a smaller value than the destination register. Then the CPU must know how to extend it.

movzx » move with zero-extension
movsx » move with sign-extension
movsxd » special sign-extension form commonly used from 32-bit to 64-bit

movzx eax, byte ptr [rbx]
movsx eax, byte ptr [rbx]
movsxd rax, dword ptr [rbx]

This distinction is critical. If the stored byte is 0xFF, then:

zero-extended: 0x000000FF
sign-extended: 0xFFFFFFFF in 32-bit context, or all high bits set in 64-bit context

Same bits, different meaning.

Why you usually cannot move directly from memory to memory

One very common beginner question is why this is invalid:

mov [rax], [rbx]

In ordinary mov forms, x86 generally allows register-to-register, register-to-memory, or memory-to-register, but not a general memory-to-memory transfer. One side usually needs to be a register.

mov rcx, [rbx]
mov [rax], rcx

That extra step is not a random assembler limitation. It reflects how the instruction forms are defined.

The stack: temporary memory with rules

The stack is just memory, but it is used in a highly structured way. Function calls, return addresses, saved registers, local variables, shadow space, spill slots, and temporary buffers often live there.

push rbx
sub  rsp, 20h
mov  [rsp+18h], rax
...
add  rsp, 20h
pop  rbx
ret

Beginners often see stack accesses and think they are arbitrary. Usually they are not. They are part of the function's local layout.

XMM registers: where floating-point and SIMD start to matter

XMM registers are 128-bit registers introduced with SSE. They are used for scalar floating-point work, packed vector operations, and many data-conversion instructions. If you reverse modern games, multimedia code, physics, math-heavy logic, or compiler-generated floating-point code, you will see them constantly.

What does 128-bit actually mean here?

A 128-bit register is not "one giant number" by default. It is better to think of it as a 128-bit container whose interpretation depends on the instruction:

4 × 32-bit floats
2 × 64-bit doubles
16 × bytes
8 × 16-bit words
4 × 32-bit integers
2 × 64-bit integers
or one scalar value in the low lane while the rest is ignored or preserved

This is the first big conceptual shift with XMM registers: the register is just raw bits. The instruction decides how those bits are interpreted.

Scalar vs packed operations

XMM instructions come in two broad styles:

Scalar » operate on the low element only, such as addss or addsd.
Packed » operate on multiple lanes at once, such as addps or paddd.

addss xmm0, xmm1   ; add low single-precision float
addsd xmm0, xmm1   ; add low double-precision float
addps xmm0, xmm1   ; add four packed single-precision floats
paddd xmm0, xmm1   ; add packed 32-bit integers

The suffixes matter a lot:

ss » scalar single-precision float
sd » scalar double-precision float
ps » packed single-precision floats
pd » packed double-precision floats

Common XMM instructions you should understand early

MOVSS / MOVSD / MOVAPS / MOVUPS / MOVDQA / MOVDQU

These move data between XMM registers and memory. The exact mnemonic often tells you whether the access assumes alignment, scalar floating-point intent, or integer-packed intent.

movss xmm0, [rbx+8]
movaps xmm1, xmm2
movups xmm3, [rdi]
movdqu xmm4, [rax]

A useful beginner rule: scalar forms usually mean "the low element matters", while packed forms usually mean "the whole 128-bit register matters".

ADDSS / MULSS / SUBSS / DIVSS

These are common in game code because scalar float math is everywhere.

movss xmm0, [rbx+8]   ; load speed
mulss xmm0, [factor]  ; multiply by factor
movss [rbx+8], xmm0   ; store speed back

If you have ever patched movement speed, cooldown multipliers, or FOV math, you have likely seen this pattern.

CVTSI2SS / CVTTSS2SI and friends

These convert between integers and floating-point values.

cvtsi2ss xmm0, eax
cvttss2si eax, xmm0

Those instructions matter because real programs constantly cross the boundary between "math as float" and "state as integer".

PXOR / XORPS

XMM registers also support bitwise logic. That means SIMD registers are not only for math. Sometimes code uses them for masks, zeroing, sign-bit tricks, or data shuffling.

pxor xmm0, xmm0

This is the XMM equivalent of zeroing a register.

UNPCK, SHUF, BLEND, MIN, MAX, CMP

These instructions rearrange, compare, and combine lanes. They show up a lot in optimized math code and compiler-generated vector operations. When you first encounter them, do not panic. Usually the first thing to ask is: is the code trying to broadcast, rearrange, compare, or merge values?

How to think about XMM registers without getting lost

Do not think of xmm0 as "a float register". Think of it as 128 raw bits with a current interpretation.

Example:

movaps xmm0, [rax]
addps  xmm0, xmm1

Here xmm0 is being treated as four floats.

movdqu xmm0, [rax]
pxor   xmm0, xmm1

Here the same physical register is being treated as raw packed data.

Same register family. Different semantic meaning. The instruction tells you how to read it.

XMM registers in function calls

On 64-bit platforms, XMM registers are not just temporary math storage. Calling conventions also use them for floating-point arguments and return values. That is why you may see values appear in xmm0 or xmm1 before a call even when no obvious SIMD math is happening.

So when should you use which register or instruction family?

Here is the practical version.

Use a general-purpose register when...

you are handling pointers or addresses
you are doing ordinary integer arithmetic
you are managing counters, indices, loop state, and bitmasks
you are accessing structures, arrays, or stack variables
you are interacting with call/return mechanics and calling conventions

Use XMM registers when...

you are working with floating-point values
you need scalar float or double instructions
you are doing packed SIMD work
you need conversion between integer and float domains
you are following a calling convention that passes or returns FP values through XMM registers

Use bitwise instructions when...

you are dealing with flags or packed state
you want masking instead of arithmetic
you need fast set/clear/toggle checks
you are reading compiler output that encoded decisions as bit tests

Use memory operands carefully when...

you understand what the address expression resolves to
you know the access width
you know whether the loaded value should be signed, unsigned, integer, or float
you know whether you are reading a value, writing a value, or merely computing an address with lea

Common beginner mistakes that cause most confusion

1. Confusing an address with the value at that address

mov rax, rbx    ; copies the number in RBX
mov rax, [rbx]  ; reads memory at address RBX

One bracket pair changes everything.

2. Treating LEA as a dereference

lea computes. It does not load from memory.

3. Forgetting operand size

mov al, [rbx], mov eax, [rbx], and mov rax, [rbx] read different widths. If you ignore width, you will misread the code.

4. Ignoring signedness

jl and jb are not the same kind of comparison. Signed and unsigned branches read the same flags differently.

5. Assuming every XMM instruction is "vectorized game magic"

A lot of XMM usage is just scalar float code. Do not overcomplicate it. If the instruction is mulss, there is a very good chance only one float is being touched.

6. Forgetting that many instructions exist mainly for flags

cmp and test are often the setup for the real decision. The next jump is where the behavior lives.

A few reading patterns that help immediately

Pattern: compare then branch

cmp eax, 64
jl  too_small

Translate mentally to: "if eax < 64, jump".

Pattern: test against itself

test rax, rax
jz   was_zero

Translate to: "if rax == 0, jump".

Pattern: pointer plus field offset

mov ecx, [rbx+1C8h]

Translate to: "read a 32-bit field at offset 0x1C8 from the object pointed to by RBX".

Pattern: scalar float update

movss xmm0, [rbx+20]
addss xmm0, xmm1
movss [rbx+20], xmm0

Translate to: "load one float field, add another float, store it back".

Pattern: register zeroing

xor eax, eax
pxor xmm0, xmm0

Translate to: "clear the register".

Where this leads next

Once these basics feel natural, the next layer becomes much easier: calling conventions, stack frames, prologues/epilogues, conditional set instructions, string instructions, AVX/YMM registers, FPU leftovers in older code, and instruction encoding details.

But none of that helps if the foundation is shaky. The real baseline skill is still this: identify the operands, identify the width, identify whether memory is being dereferenced, identify which flags matter, and identify how the value is being interpreted.

Final thoughts

x86 assembly looks hostile at first because it compresses a lot of meaning into very small lines. A single instruction can tell you about data width, signedness, addressing, side effects on flags, and sometimes even compiler intent.

That is also why it becomes readable with practice. Not because the syntax gets prettier, but because your brain starts unpacking those layers automatically. At that point, assembly stops being "cryptic machine stuff" and starts becoming a very precise description of what the program is doing.

And honestly, that is when it gets fun.

Recommended baseline references

Convenience references are great for quick lookup, but for exact behavior, edge cases, operand rules, and extension details, the Intel and AMD architecture manuals are still the real source of truth.