Fix 3 Critical ESP32 Dual Core Bugs with Atomics 2025
Tired of random crashes on your ESP32? Uncover how to fix 3 critical dual-core bugs using atomic operations for rock-solid, thread-safe embedded projects.
Daniel Peterson
Embedded software engineer specializing in RTOS and concurrent programming for IoT devices.
Fix 3 Critical ESP32 Dual Core Bugs with Atomics 2025
Ever chased a bug on your ESP32 that vanishes the moment you plug in the debugger? Or a random crash that only happens once every few hours? You're not alone. The very feature that makes the ESP32 so powerful—its dual-core processor—is often the hidden source of these maddening, phantom issues.
The ESP32's two cores, running tasks simultaneously via FreeRTOS, can be a dream for performance. You can handle Wi-Fi on one core and sensor logic on the other, achieving a level of responsiveness that single-core microcontrollers can only dream of. But this parallel power comes with a dark side: race conditions. When both cores try to read or write to the same piece of data at the same time, chaos can ensue. The result? Corrupted data, unpredictable behavior, and a lot of late-night debugging sessions.
Many developers reach for mutexes (mutual exclusions) to protect shared data, and they are a vital tool. But for simple operations, they can be overkill, introducing performance overhead and the risk of deadlocks. There’s a more elegant, lightweight, and often overlooked solution baked right into the C11 standard and fully supported by the ESP-IDF: atomic operations. Today, we'll dive into what they are and how they can squash three of the most common dual-core bugs for good.
What Are Atomic Operations (And Why Aren't You Using Them More?)
In the world of computing, an operation is "atomic" if it is performed as a single, indivisible unit. From the perspective of every other thread or core in the system, it either hasn't happened yet, or it's already complete—there are no intermediate steps. This is a hardware-level guarantee.
Consider a seemingly simple operation like incrementing a variable:
shared_variable++;
You might think this is one instruction, but for the processor, it's typically three distinct steps:
- Read: Load the current value of
shared_variable
from memory into a CPU register. - Modify: Increment the value in the register.
- Write: Store the new value from the register back into memory.
Now, imagine Core 0 and Core 1 both trying to do this at the same time. A race condition can occur:
- Core 0 reads the value (e.g., 5).
- Before Core 0 can write its new value, the operating system switches tasks.
- Core 1 reads the same value (still 5).
- Core 1 increments it to 6 and writes 6 back to memory.
- Core 0's task resumes. It still thinks the value was 5, so it increments its local copy to 6 and writes 6 back to memory.
Even though the variable was incremented twice, its final value is 6, not 7. One of the updates was completely lost. Atomic operations solve this by performing the entire read-modify-write cycle as a single, uninterruptible hardware instruction.
The 3 Dual-Core Bugs You Can Fix Right Now
Let's get practical. Here are three real-world bugs that are easily solved with atomics. To use them in ESP-IDF, you just need to include the standard C header: #include <stdatomic.h>
.
Bug #1: The Corrupted Counter
This is the classic race condition we just described. Imagine you're counting button presses with an interrupt on one core and counting received MQTT messages in a task on the other. Both need to increment a global event counter.
The Buggy Code
A shared counter is declared, often with volatile
in a hopeful attempt to prevent compiler optimizations from breaking things (which doesn't protect against race conditions).
// shared_data.h
volatile int g_event_count = 0;
// Task on Core 0
void task_core_0(void *pvParameters) {
for (;;) {
// ... some work ...
g_event_count++; // DANGER!
vTaskDelay(pdMS_TO_TICKS(10));
}
}
// Task on Core 1
void task_core_1(void *pvParameters) {
for (;;) {
// ... other work ...
g_event_count++; // DANGER!
vTaskDelay(pdMS_TO_TICKS(13));
}
}
Over time, g_event_count
will be lower than the actual number of events that occurred. The bug might be rare, making it incredibly frustrating to debug.
The Atomic Fix
By changing the type to atomic_int
and using an atomic function, you guarantee that every increment is fully completed without interruption.
// shared_data.h
#include <stdatomic.h>
atomic_int g_event_count = 0;
// Task on Core 0
void task_core_0(void *pvParameters) {
for (;;) {
// ... some work ...
atomic_fetch_add(&g_event_count, 1); // SAFE!
vTaskDelay(pdMS_TO_TICKS(10));
}
}
// Task on Core 1
void task_core_1(void *pvParameters) {
for (;;) {
// ... other work ...
atomic_fetch_add(&g_event_count, 1); // SAFE!
vTaskDelay(pdMS_TO_TICKS(13));
}
}
atomic_fetch_add
does exactly what it sounds like: it atomically adds a value and returns the value the variable held *before* the addition. Simple, clean, and 100% thread-safe.
Bug #2: The 'Check-Then-Act' Race Condition
This subtle bug happens when you check a condition and then perform an action based on it, but another core changes the condition between your check and your action.
Imagine you have a shared resource, like an I2C bus, and you use a simple boolean flag to prevent two tasks from using it at once.
The Buggy Code
// shared_data.h
volatile bool g_i2c_bus_busy = false;
// Task on Core 0
void use_i2c_resource() {
// Check if the bus is free
if (g_i2c_bus_busy == false) { // CHECK
// Problem: Core 1 could run right here!
g_i2c_bus_busy = true; // ACT
// ... use the I2C bus ...
g_i2c_bus_busy = false; // Release the bus
}
}
If two tasks call use_i2c_resource()
at nearly the same time, both could see g_i2c_bus_busy
as false
, and both would proceed as if they have exclusive access. This leads to garbled I2C communication.
The Atomic Fix
We can use atomic_exchange
to both check the old value and set the new value in a single, indivisible operation. This is effectively a simple, non-blocking lock (a spinlock).
// shared_data.h
#include <stdatomic.h>
atomic_bool g_i2c_bus_busy = false;
// Task on either core
void use_i2c_resource() {
// Atomically set the flag to 'true' and get the old value.
// If the old value was 'false', we successfully acquired the lock.
if (atomic_exchange(&g_i2c_bus_busy, true) == false) {
// SUCCESS! We have exclusive access.
// ... use the I2C bus ...
// Atomically release the lock.
atomic_store(&g_i2c_bus_busy, false);
} else {
// The bus was already busy, handle the failure.
// (e.g., try again later, log an error, etc.)
}
}
The atomic_exchange
function guarantees that no other core can interfere between fetching the old value and storing the new one. The check and the act become one atomic unit.
Bug #3: The Lost Configuration Update
Bitmasks are a fantastic, memory-efficient way to store multiple boolean configuration flags in a single integer. But they are a prime target for race conditions when modified from multiple cores.
Imagine a uint32_t config_flags
where one task enables Wi-Fi and another enables Bluetooth.
The Buggy Code
// shared_data.h
#define WIFI_ENABLED_FLAG (1 << 0)
#define BT_ENABLED_FLAG (1 << 1)
volatile uint32_t g_config_flags = 0;
// Task on Core 0
void enable_wifi() {
g_config_flags |= WIFI_ENABLED_FLAG; // DANGER!
}
// Task on Core 1
void enable_bluetooth() {
g_config_flags |= BT_ENABLED_FLAG; // DANGER!
}
The |=
operator is another read-modify-write operation. If both tasks run concurrently, one update can easily overwrite the other. If Core 0 enables Wi-Fi and Core 1 enables Bluetooth at the same time, you might end up with only Bluetooth enabled, and the Wi-Fi setting is lost.
The Atomic Fix
The C11 atomics library provides bitwise operations for exactly this purpose: atomic_fetch_or
and atomic_fetch_and
.
// shared_data.h
#include <stdatomic.h>
#define WIFI_ENABLED_FLAG (1 << 0)
#define BT_ENABLED_FLAG (1 << 1)
atomic_uint g_config_flags = 0;
// Task on Core 0
void enable_wifi() {
atomic_fetch_or(&g_config_flags, WIFI_ENABLED_FLAG); // SAFE!
}
// Task on Core 1
void disable_wifi() {
atomic_fetch_and(&g_config_flags, ~WIFI_ENABLED_FLAG); // SAFE!
}
atomic_fetch_or
performs a bitwise OR, and atomic_fetch_and
performs a bitwise AND. Both are guaranteed to be atomic, preserving all concurrent changes to your configuration flags.
Atomics vs. Mutexes: Choosing the Right Tool
So, should you replace all your mutexes with atomics? Not at all. They are different tools for different jobs. Atomics are fast and non-blocking, but they only work on simple data types (integers, booleans, pointers). Mutexes are more versatile but come with more overhead.
Feature | Atomic Operations | Mutexes (FreeRTOS) |
---|---|---|
Use Case | Simple operations on a single variable (increment, toggle, set flag, bitmask). | Protecting complex data structures or blocks of code with multiple steps. |
Performance | Extremely fast. Often a single hardware instruction. | Slower. Involves system calls and potential context switching. |
Blocking | Non-blocking. The operation completes immediately. | Blocking. If a mutex is taken, other tasks must wait, potentially going to sleep. |
Safety | Guards a single variable. Cannot protect a sequence of operations. | Can guard entire functions or critical sections, ensuring end-to-end consistency. |
Rule of thumb: If you're just incrementing a counter, flipping a boolean, or updating a bitmask, an atomic is almost always the better choice. If you need to protect a whole data structure (like a linked list) or a multi-step process (like initializing a peripheral), a mutex is the way to go.
Conclusion: Write Thread-Safe Code with Confidence
The dual-core architecture of the ESP32 is a massive advantage, but it requires a shift in mindset. You must always assume another core could be accessing your shared data at any moment. While this sounds daunting, tools like atomic operations make it manageable.
By understanding and applying atomics for simple, shared data modifications, you can eliminate a whole class of annyoing, hard-to-find bugs. You'll build more stable, reliable, and performant applications. So next time you declare a global variable that might be touched by more than one task, take a moment to ask: could this be a race condition? And if so, can a simple, elegant atomic operation save me from a future headache?
Stop debugging race conditions and start building amazing features. Happy coding!