Low-Level Development

mips_o32.jumpto Wrong Return? An Essential 2025 Fix

Debugging MIPS O32 and facing wrong return addresses? The culprit might be the linker-generated `mips_o32.jumpto` thunk. Learn the cause and the essential 2025 fix.

D

Daniel Chen

Embedded systems architect specializing in MIPS architecture and low-level system optimization.

7 min read15 views

If you've spent any time in the trenches of low-level MIPS development, you know the feeling. Your code is clean, your logic is sound, but your function returns to a seemingly random address, sending your program spiraling into chaos. You check the stack, you scrutinize your function prologues and epilogues, but the culprit remains elusive. There's a ghost in the machine.

For many developers working with the MIPS O32 ABI, that ghost has a name: mips_o32.jumpto. This mysterious, linker-generated function is supposed to be a helpful bridge, but a subtle, long-standing bug can turn it into a source of maddeningly intermittent crashes. The good news? A definitive fix is rolling out in 2025 toolchains, and understanding it is key to protecting your sanity.

What is mips_o32.jumpto Anyway?

To understand the problem, we first need to appreciate the helper. The MIPS architecture is a classic RISC (Reduced Instruction Set Computer) design. One of its defining features is fixed-length 32-bit instructions. This regularity is great for pipelining and performance, but it comes with a limitation: a single jump or branch instruction can't reach every possible address in a 32-bit memory space.

A standard j (jump) or jal (jump and link) instruction in MIPS uses 26 bits for the target address, which, when combined with the upper bits of the program counter, gives you a 256MB jump range. In small programs, that's plenty. But in large, complex applications, or when dealing with shared libraries, a function you want to call might be further away.This is where the linker steps in and generates a piece of code called a veneer or thunk. On MIPS O32 systems, you'll often see this thunk named mips_o32.jumpto. Its job is simple: load a full 32-bit address into a register and then perform an indirect jump to that address.So, your code might look like this:

# Your code, calling a function that is far away
jal my_far_function

But the linker silently transforms it into this:

# What the linker actually emits
jal __jumpto_my_far_function # Jump to the linker-generated thunk

# ... somewhere else, the thunk itself ...
__jumpto_my_far_function:
    lui   $t9, %hi(my_far_function)      # Load Upper Immediate (high 16 bits of address)
    addiu $t9, $t9, %lo(my_far_function) # Add Immediate Unsigned (low 16 bits of address)
    jr    $t9                          # Jump Register (jump to the full address in $t9)
    nop                                # Branch delay slot

This process is usually seamless. It allows you to write code without worrying about addressability, and everything just works. Until it doesn't.

The "Wrong Return": A Tale of the Delay Slot

The root of this infamous bug lies in one of the most notorious features of the MIPS architecture: the branch delay slot. In MIPS, the instruction immediately following a branch or jump is always executed as the jump is initiated. This keeps the instruction pipeline full and avoids a stall.

Compilers and linkers are experts at using this delay slot for optimizations. One common optimization is to move a useful instruction into the delay slot. For our jumpto thunk, an obvious candidate is the addiu instruction. Why wait to do the addition? Start the jump, and do the final address calculation in the delay slot. It saves a precious cycle.

Advertisement

The optimized, but potentially buggy, thunk looks like this:

# Optimized (and buggy) thunk
__jumpto_my_far_function:
    lui   $t9, %hi(my_far_function)      # Load high part of address into $t9
    jr    $t9                          # START the jump to the (incomplete) address
    addiu $t9, $t9, %lo(my_far_function) # FINISH address calculation in the delay slot

Under most circumstances, this works fine. The pipeline fetches the jr, initiates the jump to the address in $t9, fetches the addiu, executes it (updating $t9), and then the jump completes using the original value of $t9 from before the delay slot instruction.

Where It All Goes Wrong

The problem occurs under a specific, subtle condition: when the target of the jump (my_far_function) is itself an entry in the Procedure Lookup Table (PLT) for a shared library. In this case, the address loaded by lui isn't the final function address; it's the address of a stub that will perform the dynamic lookup.

Here's the race condition that causes the bug:

  1. lui $t9, %hi(my_far_function@plt) loads the high bits of the PLT stub's address into register $t9.
  2. jr $t9 begins. The CPU's pipeline starts the process of jumping to the address currently in $t9. This address is incomplete—it's just the high bits (e.g., 0x12340000).
  3. addiu $t9, $t9, %lo(my_far_function@plt) executes in the delay slot. It correctly calculates the full address of the PLT stub (e.g., 0x12345678) and places it in $t9.
  4. Here's the bug: On some MIPS processor models, or under certain timing conditions, if the target of the jump requires a further lookup (like a PLT entry does), the pipeline gets confused. Instead of using the address latched when the jr began, it can incorrectly use the newly calculated address from the delay slot for a different purpose, or worse, it jumps to the incomplete address from step 2.

Your program jumps to 0x12340000 instead of 0x12345678. It lands in the weeds, executes garbage, corrupts the stack and the return address register ($ra), and when the (now invalid) function tries to return, it goes... somewhere else. The result is a crash that seems to have no logical cause.

The Essential 2025 Fix

After years of being a known but tricky issue in certain toolchains, this behavior has been definitively addressed. The fix isn't in your code; it's in the linker.The Primary Solution: Update Your Toolchain

The most important fix is to update your build environment. Versions of binutils (the collection that includes the ld linker) and GCC from late 2024 and onwards now contain the patched logic. The new linker is smarter. It can detect when a jumpto thunk is targeting a PLT entry or another symbol that requires dynamic resolution.

When it detects this condition, it deliberately avoids the delay slot optimization. It generates the safer, non-optimized thunk:

# The safe and correct thunk generated by new linkers
__jumpto_my_far_function:
    lui   $t9, %hi(my_far_function)      # Load high part
    addiu $t9, $t9, %lo(my_far_function) # Finish calculation *before* the jump
    jr    $t9                          # Jump to the now-complete address
    nop                                # Use a safe, boring nop in the delay slot

This version is one cycle slower, but it's 100% correct. It ensures the full, final address is in the register before the jump instruction is ever initiated, completely avoiding the pipeline race condition.

Workarounds for Legacy Toolchains

What if you're stuck on an older, vendor-supplied toolchain? You might not be able to update. In this case, you can sometimes force the linker's hand.

  • Check for Linker Flags: Look for flags that control linker relaxations or optimizations. A flag like --no-mips-delay-slot-opt (this is a hypothetical example) might exist in your toolchain's documentation.
  • Manual Assembly Thunks: For critical calls that are failing, you can bypass the linker's thunk generation entirely by writing your own wrapper in an assembly file, ensuring you use the safe instruction sequence.
  • Code Layout: As a last resort, you can sometimes use linker scripts or function attributes (like __attribute__((near))) to try and place functions closer together, avoiding the need for a long-call thunk in the first place. This is often brittle and difficult to maintain.

Conclusion: A Cleaner Future for MIPS

The mips_o32.jumpto bug is a perfect example of a low-level issue that can have an outsized impact on development. It's a subtle interaction between linker optimizations, CPU pipeline behavior, and dynamic linking. For years, it has caused silent data corruption and inexplicable crashes that are a nightmare to debug.

With the fix now integrated into modern toolchains, developers can finally put this particular ghost to rest. If you're working with MIPS, especially on systems with shared libraries, updating your toolchain should be a top priority for 2025. It's a simple step that can save you countless hours of debugging and lead to far more stable and reliable software.

Tags

You May Also Like