Low-Level Development

3 Common ARM64 ADD Opcode Mistakes to Avoid in 2025

Master ARM64 assembly in 2025! Uncover 3 common ADD opcode mistakes, from flag updates (ADD vs ADDS) to immediate value limits, and write more efficient code.

D

Dr. Kenji Tanaka

Principal CPU Architect with 15+ years of experience in RISC instruction set design.

7 min read4 views

Introduction: Why Mastering ADD is Crucial for ARM64

The ARM64 (or AArch64) architecture is no longer just for mobile phones. It powers everything from Apple's latest silicon to high-performance computing servers at AWS. As its dominance grows, so does the need for developers to understand its instruction set architecture (ISA). Whether you're a compiler engineer, a security researcher, or a performance optimization specialist, a deep understanding of ARM64 assembly is an invaluable skill.

At the heart of any computation is arithmetic, and the most fundamental arithmetic operation is addition, represented by the ADD opcode. It seems simple—just add two numbers. However, the nuances of the ARM64 `ADD` instruction and its variants are a common source of subtle, hard-to-debug errors. As we head into 2025, with ARM64's feature set stabilizing and its ecosystem maturing, avoiding these foundational mistakes is more critical than ever.

This post will dissect the three most common mistakes developers make with the ADD opcode, providing clear examples and best practices to help you write more robust and efficient ARM64 code.

Mistake 1: Forgetting Flag Updates (ADD vs. ADDS)

Perhaps the most frequent error, especially for those coming from x86, is misunderstanding how ARM64 handles condition flags. This mistake can lead to conditional logic that never executes as intended.

The Silent Bug of Unchecked Conditions

The ARM architecture uses a set of condition flags in the Process State (PState) register to record the outcome of an operation. The most important flags for general arithmetic are:

  • N (Negative): Set if the result is negative.
  • Z (Zero): Set if the result is zero.
  • C (Carry): Set if the operation resulted in an unsigned overflow (a carry-out).
  • V (oVerflow): Set if the operation resulted in a signed overflow.

These flags are essential for implementing control flow, such as checking if two numbers are equal (by seeing if their difference is zero) or branching if an addition overflows. The mistake lies in assuming that every arithmetic instruction updates these flags.

ADD: The Non-Updating Variant

In ARM64, the standard ADD instruction does not modify the condition flags. It performs the addition and stores the result, but leaves the N, Z, C, and V flags untouched. This is a deliberate design choice to improve performance by not calculating flag information when it isn't needed.

Consider this incorrect code snippet, which attempts to add two registers and branch if the result is zero:


; WARNING: This code is buggy!
MOV     W0, #0xFFFFFFFF   ; Load W0 with -1
MOV     W1, #1            ; Load W1 with 1
ADD     W0, W0, W1        ; W0 = -1 + 1 = 0. Flags are NOT updated.
B.EQ    is_zero           ; This branch will NOT be taken based on the ADD result

; ... some other code

is_zero:
; This block may never be reached

The B.EQ (Branch if Equal, which checks if the Z flag is set) instruction's behavior will be determined by whatever instruction *before* the ADD last set the flags. The result of our ADD is ignored by the branch, creating a silent and potentially catastrophic bug.

ADDS: The Solution for Conditionals

To fix this, you must use the "Set Flags" variant of the instruction: ADDS. The 'S' suffix tells the processor to perform the addition and update the NZCV flags based on the result.

Here is the corrected, functional version of the code:


; This is the correct way
MOV     W0, #0xFFFFFFFF
MOV     W1, #1
ADDS    W0, W0, W1        ; W0 = 0. Z flag is set to 1.
B.EQ    is_zero           ; Branch is correctly taken!

; ... some other code

is_zero:
; This code now executes as expected

Rule of thumb for 2025: If your addition is part of a conditional logic sequence, you almost certainly need ADDS, not ADD.

Mistake 2: Mishandling Immediate Values

Another common tripwire is the limitation on immediate (constant) values that can be used directly within the ADD instruction. The A64 instruction encoding is fixed at 32 bits, which means there's a finite amount of space to encode the opcode, registers, and any immediate value.

The Deceptive Simplicity of Immediate Operands

The ADD (immediate) instruction has a specific format for its constant operand. It can encode:

  • An unsigned 12-bit immediate value (0 to 4095).
  • This same 12-bit value, but shifted left by 12 bits.

This means you can add small numbers like #100 or larger, specific numbers like #8192 (which is #2, LSL #12) in a single instruction. However, you cannot add an arbitrary number like #5000.

An assembler will catch this and throw an error, but understanding why it's an error is crucial. Attempting to write this will fail:


; This will fail to assemble
ADD     X0, X1, #5000 ; Error: immediate value '5000' cannot be encoded.

The value 5000 is greater than 4095 and cannot be represented as a 12-bit value shifted by 12. Relying on the assembler to catch this is fine, but not knowing the alternative can halt your development.

The Correct Approach for Large Constants

When you need to add a large or arbitrary constant, you must first construct it in a temporary register. There are two primary methods:

  1. Using MOVZ/MOVK: The preferred method for constructing 32-bit or 64-bit constants. MOVZ (Move with Zero) places a 16-bit value into a register and zeroes the other bits. MOVK (Move with Keep) places a 16-bit value into a specified position in the register, leaving the other bits untouched.
  2. 
    ; How to correctly add 5000 (0x1388)
    MOV     X2, #5000       ; Assembler pseudo-instruction, likely expands to MOVZ
    ADD     X0, X1, X2      ; X0 = X1 + 5000. Correct.
    
    ; For a larger constant like 0x123456789ABCDEF0
    MOVZ    X2, #0xDEF0, LSL #0   ; Load bottom 16 bits
    MOVK    X2, #0x9ABC, LSL #16  ; Load next 16 bits
    MOVK    X2, #0x5678, LSL #32  ; Load next 16 bits
    MOVK    X2, #0x1234, LSL #48  ; Load top 16 bits
    ADD     X0, X1, X2            ; Perform the addition
    
  3. Loading from Memory: If the constant is used frequently, it can be more efficient to load it from a literal pool in memory.
  4. 
    LDR     X2, =5000       ; Pseudo-instruction to load 5000 from a literal pool
    ADD     X0, X1, X2      ; X0 = X1 + 5000. Correct.
    

Mistake 3: Incorrectly Using the Shifted Register Operand

One of ARM's most powerful features is its ability to shift or rotate a register's value as part of another instruction, all within a single cycle. The `ADD (shifted register)` instruction is a prime example, but its syntax can be confusing.

The Power and Peril of In-Instruction Shifting

The instruction ADD Xd, Xn, Xm, # calculates Xd = Xn + (Xm shifted by amount). This is incredibly efficient for common patterns like array indexing (e.g., `base_address + index * element_size`).

For example, to calculate `X0 = X1 + (X2 * 8)`, you can do it in one instruction instead of two:


; Efficient calculation using a left shift
ADD     X0, X1, X2, LSL #3 ; LSL #3 is equivalent to multiplying by 2^3, or 8

The mistake arises from a misunderstanding of the operation's syntax and limitations.

Common Pitfalls with Shifted Operands

  1. Invalid Shift Amount: The shift amount is not unlimited. For 64-bit registers (X0-X30), the shift amount must be between 0 and 63. For 32-bit registers (W0-W30), it must be between 0 and 31. An assembler will catch an out-of-range static value, but it's a conceptual gap that can cause confusion.
  2. Misinterpreting the Order of Operations: A developer might mistakenly believe `ADD X0, X1, X2, LSL #3` computes `(X1 + X2) << 3`. This is incorrect. The shift operation only applies to the final register operand (Xm), not the result of the addition. The operation is always `Xd = Xn + op2`, where `op2` is the potentially shifted register.
  3. Using the Wrong Shift Type: ARM64 provides several shift types. Using the wrong one can have dramatic consequences.
    • LSL (Logical Shift Left): Fills with zeros. Used for multiplying by powers of 2.
    • LSR (Logical Shift Right): Fills with zeros. Used for unsigned division by powers of 2.
    • ASR (Arithmetic Shift Right): Fills with the sign bit (bit 63 or 31). Used for signed division by powers of 2. Using LSR on a negative number will corrupt its value.
    • ROR (Rotate Right): The bits shifted out from the right are inserted on the left.

Always double-check that you are applying the correct shift type and amount to the intended operand to avoid subtle data corruption bugs.

ARM64 Addition Opcode Comparison
Instruction Purpose Updates Flags? Typical Use Case
ADD Adds two operands without affecting flags. No Simple arithmetic where no conditional logic follows (e.g., calculating a pointer offset).
ADDS Adds two operands and updates the NZCV flags. Yes Arithmetic that is immediately followed by a conditional branch (e.g., ADDS X0, X1, X2 then B.EQ ...).
ADC Add with Carry. Adds two operands plus the value of the Carry flag. No Multi-word arithmetic for numbers larger than 64 bits. Used in a chain after an initial ADDS.
ADCS Add with Carry and Set Flags. Adds operands + Carry flag and updates flags. Yes The intermediate step in a multi-word addition chain that requires flag setting for the next link.

Conclusion: Writing Cleaner, More Efficient ARM64 Code

The ADD instruction is a building block of ARM64 assembly, but its apparent simplicity hides important details that can trip up even experienced programmers. By internalizing the distinction between ADD and ADDS, understanding the limitations of immediate operands, and correctly applying shifted registers, you can avoid a significant class of common bugs.

As we move further into 2025, proficiency in low-level ARM64 development will only become more valuable. Mastering these fundamentals is the first step toward writing the clean, efficient, and correct code that modern high-performance systems demand.