Embedded Systems

Fix 3 Critical ESP32 Dual Core Bugs with Atomics 2025

Tired of random crashes on your ESP32? Uncover how to fix 3 critical dual-core bugs using atomic operations for rock-solid, thread-safe embedded projects.

Daniel Peterson

Embedded software engineer specializing in RTOS and concurrent programming for IoT devices.

September 8, 20257 min read71 views

7 min read

1,664 words

71 views

Updated

Fix 3 Critical ESP32 Dual Core Bugs with Atomics 2025

Ever chased a bug on your ESP32 that vanishes the moment you plug in the debugger? Or a random crash that only happens once every few hours? You're not alone. The very feature that makes the ESP32 so powerful—its dual-core processor—is often the hidden source of these maddening, phantom issues.

The ESP32's two cores, running tasks simultaneously via FreeRTOS, can be a dream for performance. You can handle Wi-Fi on one core and sensor logic on the other, achieving a level of responsiveness that single-core microcontrollers can only dream of. But this parallel power comes with a dark side: race conditions. When both cores try to read or write to the same piece of data at the same time, chaos can ensue. The result? Corrupted data, unpredictable behavior, and a lot of late-night debugging sessions.

Many developers reach for mutexes (mutual exclusions) to protect shared data, and they are a vital tool. But for simple operations, they can be overkill, introducing performance overhead and the risk of deadlocks. There’s a more elegant, lightweight, and often overlooked solution baked right into the C11 standard and fully supported by the ESP-IDF: atomic operations. Today, we'll dive into what they are and how they can squash three of the most common dual-core bugs for good.

What Are Atomic Operations (And Why Aren't You Using Them More?)

In the world of computing, an operation is "atomic" if it is performed as a single, indivisible unit. From the perspective of every other thread or core in the system, it either hasn't happened yet, or it's already complete—there are no intermediate steps. This is a hardware-level guarantee.

Consider a seemingly simple operation like incrementing a variable:

shared_variable++;

You might think this is one instruction, but for the processor, it's typically three distinct steps:

Read: Load the current value of shared_variable from memory into a CPU register.
Modify: Increment the value in the register.
Write: Store the new value from the register back into memory.

Now, imagine Core 0 and Core 1 both trying to do this at the same time. A race condition can occur:

Core 0 reads the value (e.g., 5).
Before Core 0 can write its new value, the operating system switches tasks.
Core 1 reads the same value (still 5).
Core 1 increments it to 6 and writes 6 back to memory.
Core 0's task resumes. It still thinks the value was 5, so it increments its local copy to 6 and writes 6 back to memory.

Even though the variable was incremented twice, its final value is 6, not 7. One of the updates was completely lost. Atomic operations solve this by performing the entire read-modify-write cycle as a single, uninterruptible hardware instruction.

The 3 Dual-Core Bugs You Can Fix Right Now

Let's get practical. Here are three real-world bugs that are easily solved with atomics. To use them in ESP-IDF, you just need to include the standard C header: #include <stdatomic.h>.

Bug #1: The Corrupted Counter

This is the classic race condition we just described. Imagine you're counting button presses with an interrupt on one core and counting received MQTT messages in a task on the other. Both need to increment a global event counter.

The Buggy Code

A shared counter is declared, often with volatile in a hopeful attempt to prevent compiler optimizations from breaking things (which doesn't protect against race conditions).

// shared_data.h
volatile int g_event_count = 0;

// Task on Core 0
void task_core_0(void *pvParameters) {
  for (;;) {
    // ... some work ...
    g_event_count++; // DANGER!
    vTaskDelay(pdMS_TO_TICKS(10));
  }
}

// Task on Core 1
void task_core_1(void *pvParameters) {
  for (;;) {
    // ... other work ...
    g_event_count++; // DANGER!
    vTaskDelay(pdMS_TO_TICKS(13));
  }
}

Over time, g_event_count will be lower than the actual number of events that occurred. The bug might be rare, making it incredibly frustrating to debug.

The Atomic Fix

By changing the type to atomic_int and using an atomic function, you guarantee that every increment is fully completed without interruption.

// shared_data.h
#include <stdatomic.h>
atomic_int g_event_count = 0;

// Task on Core 0
void task_core_0(void *pvParameters) {
  for (;;) {
    // ... some work ...
    atomic_fetch_add(&g_event_count, 1); // SAFE!
    vTaskDelay(pdMS_TO_TICKS(10));
  }
}

// Task on Core 1
void task_core_1(void *pvParameters) {
  for (;;) {
    // ... other work ...
    atomic_fetch_add(&g_event_count, 1); // SAFE!
    vTaskDelay(pdMS_TO_TICKS(13));
  }
}

atomic_fetch_add does exactly what it sounds like: it atomically adds a value and returns the value the variable held *before* the addition. Simple, clean, and 100% thread-safe.

Bug #2: The 'Check-Then-Act' Race Condition

This subtle bug happens when you check a condition and then perform an action based on it, but another core changes the condition between your check and your action.

Imagine you have a shared resource, like an I2C bus, and you use a simple boolean flag to prevent two tasks from using it at once.

The Buggy Code

// shared_data.h
volatile bool g_i2c_bus_busy = false;

// Task on Core 0
void use_i2c_resource() {
  // Check if the bus is free
  if (g_i2c_bus_busy == false) { // CHECK
    // Problem: Core 1 could run right here!
    g_i2c_bus_busy = true; // ACT
    
    // ... use the I2C bus ...
    
    g_i2c_bus_busy = false; // Release the bus
  }
}

If two tasks call use_i2c_resource() at nearly the same time, both could see g_i2c_bus_busy as false, and both would proceed as if they have exclusive access. This leads to garbled I2C communication.

The Atomic Fix

We can use atomic_exchange to both check the old value and set the new value in a single, indivisible operation. This is effectively a simple, non-blocking lock (a spinlock).

// shared_data.h
#include <stdatomic.h>
atomic_bool g_i2c_bus_busy = false;

// Task on either core
void use_i2c_resource() {
  // Atomically set the flag to 'true' and get the old value.
  // If the old value was 'false', we successfully acquired the lock.
  if (atomic_exchange(&g_i2c_bus_busy, true) == false) {
    
    // SUCCESS! We have exclusive access.
    // ... use the I2C bus ...
    
    // Atomically release the lock.
    atomic_store(&g_i2c_bus_busy, false); 
  } else {
    // The bus was already busy, handle the failure.
    // (e.g., try again later, log an error, etc.)
  }
}

The atomic_exchange function guarantees that no other core can interfere between fetching the old value and storing the new one. The check and the act become one atomic unit.

Bug #3: The Lost Configuration Update

Bitmasks are a fantastic, memory-efficient way to store multiple boolean configuration flags in a single integer. But they are a prime target for race conditions when modified from multiple cores.

Imagine a uint32_t config_flags where one task enables Wi-Fi and another enables Bluetooth.

The Buggy Code

// shared_data.h
#define WIFI_ENABLED_FLAG (1 << 0)
#define BT_ENABLED_FLAG   (1 << 1)

volatile uint32_t g_config_flags = 0;

// Task on Core 0
void enable_wifi() {
  g_config_flags |= WIFI_ENABLED_FLAG; // DANGER!
}

// Task on Core 1
void enable_bluetooth() {
  g_config_flags |= BT_ENABLED_FLAG; // DANGER!
}

The |= operator is another read-modify-write operation. If both tasks run concurrently, one update can easily overwrite the other. If Core 0 enables Wi-Fi and Core 1 enables Bluetooth at the same time, you might end up with only Bluetooth enabled, and the Wi-Fi setting is lost.

The Atomic Fix

The C11 atomics library provides bitwise operations for exactly this purpose: atomic_fetch_or and atomic_fetch_and.

// shared_data.h
#include <stdatomic.h>
#define WIFI_ENABLED_FLAG (1 << 0)
#define BT_ENABLED_FLAG   (1 << 1)

atomic_uint g_config_flags = 0;

// Task on Core 0
void enable_wifi() {
  atomic_fetch_or(&g_config_flags, WIFI_ENABLED_FLAG); // SAFE!
}

// Task on Core 1
void disable_wifi() {
  atomic_fetch_and(&g_config_flags, ~WIFI_ENABLED_FLAG); // SAFE!
}

atomic_fetch_or performs a bitwise OR, and atomic_fetch_and performs a bitwise AND. Both are guaranteed to be atomic, preserving all concurrent changes to your configuration flags.

Atomics vs. Mutexes: Choosing the Right Tool

So, should you replace all your mutexes with atomics? Not at all. They are different tools for different jobs. Atomics are fast and non-blocking, but they only work on simple data types (integers, booleans, pointers). Mutexes are more versatile but come with more overhead.

Feature	Atomic Operations	Mutexes (FreeRTOS)
Use Case	Simple operations on a single variable (increment, toggle, set flag, bitmask).	Protecting complex data structures or blocks of code with multiple steps.
Performance	Extremely fast. Often a single hardware instruction.	Slower. Involves system calls and potential context switching.
Blocking	Non-blocking. The operation completes immediately.	Blocking. If a mutex is taken, other tasks must wait, potentially going to sleep.
Safety	Guards a single variable. Cannot protect a sequence of operations.	Can guard entire functions or critical sections, ensuring end-to-end consistency.

Rule of thumb: If you're just incrementing a counter, flipping a boolean, or updating a bitmask, an atomic is almost always the better choice. If you need to protect a whole data structure (like a linked list) or a multi-step process (like initializing a peripheral), a mutex is the way to go.

Conclusion: Write Thread-Safe Code with Confidence

The dual-core architecture of the ESP32 is a massive advantage, but it requires a shift in mindset. You must always assume another core could be accessing your shared data at any moment. While this sounds daunting, tools like atomic operations make it manageable.

By understanding and applying atomics for simple, shared data modifications, you can eliminate a whole class of annyoing, hard-to-find bugs. You'll build more stable, reliable, and performant applications. So next time you declare a global variable that might be touched by more than one task, take a moment to ask: could this be a race condition? And if so, can a simple, elegant atomic operation save me from a future headache?

Stop debugging race conditions and start building amazing features. Happy coding!

Fix 3 Critical ESP32 Dual Core Bugs with Atomics 2025

Fix 3 Critical ESP32 Dual Core Bugs with Atomics 2025

What Are Atomic Operations (And Why Aren't You Using Them More?)

The 3 Dual-Core Bugs You Can Fix Right Now

Bug #1: The Corrupted Counter

The Buggy Code

The Atomic Fix

Bug #2: The 'Check-Then-Act' Race Condition

The Buggy Code

The Atomic Fix

Bug #3: The Lost Configuration Update

The Buggy Code

The Atomic Fix

Atomics vs. Mutexes: Choosing the Right Tool

Conclusion: Write Thread-Safe Code with Confidence

Topics & Tags

Share this article

You May Also Like

Related Articles

5 Steps for Embedded Integer Sizing: Ultimate 2025 Guide

ESP32 Atomics in ISRs: 5 Rules for 2025 Safety Guide

The 2025 LVGL Checklist: 7 Critical Implementation Tips