← Back to blog
EmbeddedMarch 12, 2026· 6 min read

How an E-Paper Display Keeps Time: The RMT Peripheral Story

E-paper displays don't refresh like LCDs. Every pixel change requires a precisely timed electrical pulse — too short and the ink doesn't move, too long and you get ghosting.

I was porting an e-paper display library to a newer ESP32 framework when the display started drawing garbage. The pixels looked like a half-loaded JPEG from 2003. The root cause turned out to be a timing peripheral I'd never thought much about — one originally designed for TV remotes.

This is the story of that peripheral, the physics it serves, and the full API migration that fixed the display.


Ink that moves

If you crack open an e-ink display under a microscope, you'd see millions of tiny capsules — each about the diameter of a human hair. Inside each capsule: black particles and white particles floating in clear oil.

The black particles carry a negative charge. The white ones carry a positive charge.

Apply a positive voltage to the top electrode and the white particles rise to the surface. The pixel appears white. Reverse the voltage and the black particles float up instead. The pixel turns dark.

That's the entire trick. No backlight, no liquid crystals, no constantly refreshing framebuffer. Just tiny charged balls of ink, pushed up or down by an electric field.

    Positive voltage          Negative voltage
    ┌─────────────┐           ┌─────────────┐
    │ ○ ○ ○ ○ ○ ○ │ white     │ ● ● ● ● ● ● │ black
    │ ● ● ● ● ● ● │ black     │ ○ ○ ○ ○ ○ ○ │ white
    └─────────────┘           └─────────────┘
       WHITE pixel               BLACK pixel

But here's the catch: the voltage must be applied for a precise duration. Think of it like baking — leave the bread in too short and the middle is raw, leave it too long and it burns. Except here, "too short" means the particles don't fully migrate and you get a washed-out gray. "Too long" means neighboring capsules bleed into each other — that's ghosting, the faint shadow of the previous image that refuses to leave.

The timing window is measured in microseconds. Not milliseconds. A microsecond is one millionth of a second. In the time it takes you to blink, roughly 400,000 microseconds pass.

Getting this timing right isn't optional. It's the difference between a crisp display and an unreadable mess.

The naive approach (and why it fails)


When I first thought about generating precise pulses, the obvious approach was:

c
// The "just use a delay" approach gpio_set_level(CKV_PIN, 1); // pin goes HIGH delayMicroseconds(50); // wait 50 microseconds gpio_set_level(CKV_PIN, 0); // pin goes LOW

Three lines. Beautifully simple. And on a bare-metal microcontroller with nothing else running, this actually works. The CPU sits in a tight loop counting clock cycles, and your 50µs delay is genuinely 50µs.

But the ESP32 isn't bare-metal. It runs FreeRTOS — a real-time operating system juggling multiple tasks simultaneously. And "simultaneously" is where the trouble starts.

Imagine you're halfway through that 50µs delay. You're at microsecond 27. Then:

  • The WiFi radio fires an interrupt because a packet arrived
  • The RTOS scheduler decides another task needs CPU time
  • A timer interrupt fires for the system tick
  • The CPU hits a cache miss and stalls while fetching code from flash

Your 50µs delay just became 73µs. Or 48µs. Or 112µs. You don't control it and you can't predict it.

I measured the jitter on an ESP32-S3 running WiFi and a display task:

Requested delay:  50 µs
Actual range:     41 µs – 89 µs
Average:          54 µs

For blinking an LED, nobody would notice. For driving e-ink pixels through a carefully tuned voltage waveform, this jitter produces visible artifacts — smeared edges, inconsistent gray levels, and occasional lines of corrupted pixels.

The fundamental problem: software timing depends on the CPU being available, and on a multitasking system, the CPU is never reliably available.

What you need is a piece of hardware that generates pulses on its own — something that keeps running even when the CPU is busy handling WiFi packets or switching tasks.

A TV remote to the rescue


The ESP32 has a peripheral called RMT — Remote Control Transceiver.

It was designed for a completely different problem: sending and receiving infrared remote control signals. When you press a button on your TV remote, the IR LED blinks in a specific pattern. "Power" might be a 9ms burst followed by a 4.5ms pause followed by a series of short/long pulses encoding the button code. The NEC protocol, used by most TV remotes, looks something like this:

TV Remote IR signal (NEC protocol):

  ┌─────────────────┐     ┌──┐  ┌────┐  ┌──┐  ┌──┐
  │     9ms burst   │     │  │  │    │  │  │  │  │
──┘                 └─────┘  └──┘    └──┘  └──┘  └──
  |     leader      |space| 0  |  1   | 0  | 0  |

The RMT peripheral is essentially a hardware state machine for generating waveforms. You give it a list of instructions — "stay HIGH for N ticks, then LOW for M ticks" — and the hardware executes them with clock-cycle precision. No CPU involvement after setup.

Here's the mental model: think of it as a player piano roll. You punch the pattern into a paper roll (program the RMT buffer), feed it into the piano (start the transmission), and the piano plays the notes with perfect timing while you go make coffee (the CPU handles other tasks).

RMT buffer:
┌───────────────┬───────────────┬───────────────┬───────────────┐
│ HIGH, 50 ticks│ LOW, 100 ticks│ HIGH, 20 ticks│ LOW, 80 ticks │
└───────────────┴───────────────┴───────────────┴───────────────┘
                        │
                        ▼
GPIO output:
        ┌──────┐            ┌──┐
        │      │            │  │
   ─────┘      └────────────┘  └──────────
        | 5µs  |   10µs     |2µs|  8µs

The timing is deterministic. Not "pretty accurate" — exact. The RMT clock runs off the APB bus at a known frequency, and each tick is a fixed fraction of a microsecond. No interrupts can delay it. No task switch can stretch it. The hardware drives the GPIO pin directly.

For the e-paper display, this is exactly what we need. Define the CKV clock waveform, hand it to RMT, and let the hardware maintain microsecond-accurate timing while the CPU decompresses the next row of pixel data.

How it worked in IDF 4


ESP-IDF is Espressif's official development framework for the ESP32. Version 4 had a straightforward RMT API that felt very "C-struct, fill in the fields, call the function."

Setting up the RMT channel looked like this:

c
void rmt_pulse_init(gpio_num_t pin) { rmt_config_t config = RMT_DEFAULT_CONFIG_TX(pin, RMT_CHANNEL_1); config.clk_div = 8; // divide the 80 MHz clock by 8 rmt_config(&config); // apply configuration rmt_driver_install(config.channel, 0, 0); // install driver }

Let's trace the clock math, because this is where the precision comes from:

Step 1: ESP32 APB clock           = 80,000,000 Hz  (80 MHz)
Step 2: Divide by 8               = 10,000,000 Hz  (10 MHz)
Step 3: Time per tick              = 1 / 10,000,000
                                   = 0.0000001 seconds
                                   = 0.1 microseconds
                                   = 100 nanoseconds

So each RMT tick is exactly 100 nanoseconds. If we want a 5µs pulse, that's 50 ticks. If we want 10µs, that's 100 ticks. Simple multiplication.

To actually send a pulse, you'd fill in an rmt_item32_t struct and fire it off:

c
static void pulse_ckv_ticks(uint16_t high_ticks, uint16_t low_ticks) { volatile rmt_item32_t item; item.level0 = 1; // first half: HIGH item.duration0 = high_ticks; // for this many ticks item.level1 = 0; // second half: LOW item.duration1 = low_ticks; // for this many ticks rmt_write_items(RMT_CHANNEL_1, (rmt_item32_t *)&item, 1, // number of items true); // wait until done }

That last argument — true — means "block until the hardware finishes transmitting." The function returns only after the pulse is complete on the wire. This made it easy to reason about: when pulse_ckv_ticks() returns, the pulse has been sent. Period.

Calling pulse_ckv_ticks(50, 100) produced:

        ┌──────┐
        │ 5 µs │
   ─────┘      └──────────────
               │    10 µs     │

Exactly 5µs HIGH, then 10µs LOW. Every time. No jitter.

This API was simple, direct, and it worked reliably across thousands of LilyGo displays for years. Then Espressif shipped IDF 5.

The day the build broke


When Arduino core 3.x shipped (built on ESP-IDF 5.x), users started opening issues on the LilyGo EPD47 repository. The library wouldn't compile anymore.

The error message was blunt:

rmt: CONFLICT! driver_ng is not allowed to be used with the legacy driver

Not a deprecation warning. Not a "please update at your convenience." A hard compilation error. The old RMT API was gone.

Espressif had replaced the entire RMT driver with a new one they internally called driver_ng — "next generation." And they made it impossible to use both in the same build. Include the old header? Error. Call the old function? Error. The old driver and new driver couldn't coexist.

This is unusual. Most framework migrations offer a transition period — deprecated warnings, compatibility shims, a version or two of overlap. Espressif chose to rip the bandaid off.

Why? Because the old API had fundamental limitations:

  • Hardcoded channel numbers — you had to manually pick RMT_CHANNEL_0 through RMT_CHANNEL_7 and hope nothing else was using that channel. On the ESP32-S3, there are only 4 TX channels. Conflicts were common.
  • No DMA support — the old driver copied waveform data one item at a time. For long transmissions (like WS2812 LED strips with hundreds of pixels), this was a bottleneck.
  • Tightly coupled encoding — waveform data and transmission logic were tangled together. You couldn't easily reuse an encoding scheme across different channels.
  • No multi-chip abstraction — the ESP32, ESP32-S2, ESP32-S3, and ESP32-C3 all have subtly different RMT hardware. The old API papered over the differences with #ifdef soup.

The new API fixed all of this. But it meant every library using RMT needed a full rewrite.

Rewriting for IDF 5


Here's what the rewritten initialization looks like. It's more verbose, but each piece has a clear purpose:

c
static rmt_channel_handle_t rmt_chan = NULL; static rmt_encoder_handle_t rmt_encoder = NULL; void rmt_pulse_init(gpio_num_t pin) { // Step 1: Configure the TX channel rmt_tx_channel_config_t tx_config = { .gpio_num = pin, .clk_src = RMT_CLK_SRC_DEFAULT, .resolution_hz = 10 * 1000 * 1000, // 10 MHz = 0.1µs/tick .mem_block_symbols = 64, .trans_queue_depth = 1, }; rmt_new_tx_channel(&tx_config, &rmt_chan); // Step 2: Create an encoder rmt_copy_encoder_config_t enc_cfg = {}; rmt_new_copy_encoder(&enc_cfg, &rmt_encoder); // Step 3: Enable the channel rmt_enable(rmt_chan); }

Notice the differences. There's no RMT_CHANNEL_1 — you don't pick a channel number anymore. The driver allocates one for you and returns an opaque handle. This eliminates channel conflicts entirely.

The clock is specified as resolution_hz = 10000000 instead of clk_div = 8. Same math, different direction — you tell the driver what resolution you want and it figures out the divider. More portable across chips with different APB clock speeds.

The encoder is new. In IDF 4, encoding was implicit — you just passed raw rmt_item32_t structs. In IDF 5, Espressif separated the concept of "how to encode data" from "how to transmit it." A copy_encoder is the simplest type — it passes your waveform symbols through unchanged. More complex encoders exist for protocols like WS2812 where data needs to be translated into pulse patterns.

The pulse function:

c
static void pulse_ckv_ticks(uint16_t high_ticks, uint16_t low_ticks) { // Define the waveform symbol rmt_symbol_word_t symbol = { .level0 = 1, .duration0 = high_ticks, .level1 = 0, .duration1 = low_ticks, }; // Transmit configuration (no looping) rmt_transmit_config_t tx_cfg = { .loop_count = 0, }; // Send the symbol through the encoder rmt_transmit(rmt_chan, rmt_encoder, &symbol, sizeof(symbol), &tx_cfg); // Block until the pulse is done rmt_tx_wait_all_done(rmt_chan, portMAX_DELAY); }

The struct changed from rmt_item32_t to rmt_symbol_word_t. The fields inside are identical — level0, duration0, level1, duration1. The blocking call split from one function into two: rmt_transmit() starts the transmission, rmt_tx_wait_all_done() waits for completion.

More code, more ceremony. But the electrical output is identical: the same 10 MHz tick, the same HIGH/LOW pattern, the same microsecond-accurate pulse on the same GPIO pin.

Old vs. new at a glance


Here's the full migration mapped out side by side:

Setting up the channel:

IDF 4                              IDF 5
─────                              ─────
rmt_config_t struct                rmt_tx_channel_config_t struct
  + clk_div = 8                      + resolution_hz = 10MHz
  + channel = RMT_CHANNEL_1          + (auto-allocated)
rmt_config(&config)                rmt_new_tx_channel(&config, &handle)
rmt_driver_install(ch, 0, 0)       rmt_new_copy_encoder(&enc_cfg, &enc)
(no explicit enable)               rmt_enable(handle)

Sending a pulse:

IDF 4                              IDF 5
─────                              ─────
rmt_item32_t item                  rmt_symbol_word_t symbol
  .level0 = 1                        .level0 = 1
  .duration0 = ticks                 .duration0 = ticks
  .level1 = 0                        .level1 = 0
  .duration1 = ticks                 .duration1 = ticks
rmt_write_items(ch, &item,         rmt_transmit(handle, encoder,
                1, true)                        &symbol, size, &cfg)
                                   rmt_tx_wait_all_done(handle, MAX)

What didn't change at all:

  • 10 MHz tick resolution — each tick is still 0.1µs
  • Same level + duration waveform model
  • Same blocking semantics (wait for hardware to finish)
  • Same electrical output on the pin — an oscilloscope can't tell the difference

The API surface changed completely. The physics didn't change at all. The display kept working because the pulses arriving at the charge pump were bit-for-bit identical.

Supporting everyone with one file


Here's the practical problem: the LilyGo EPD47 library is used by Arduino users. Some are on Arduino core 2.x (which bundles IDF 4). Others have updated to Arduino core 3.x (which bundles IDF 5). They don't choose their IDF version — it's a transitive dependency of their Arduino core.

A library that only supports one IDF version will break for half its users. We need both implementations in the same source file.

The solution is the classic C preprocessor version guard:

c
#include "esp_idf_version.h" #if ESP_IDF_VERSION_MAJOR >= 5 // ===== IDF 5 implementation ===== #include "driver/rmt_tx.h" static rmt_channel_handle_t rmt_chan = NULL; static rmt_encoder_handle_t rmt_encoder = NULL; void rmt_pulse_init(gpio_num_t pin) { // ... IDF 5 channel setup ... } static void pulse_ckv_ticks(uint16_t high, uint16_t low) { // ... IDF 5 transmit ... } #else // ===== IDF 4 implementation ===== #include "driver/rmt.h" void rmt_pulse_init(gpio_num_t pin) { // ... IDF 4 channel setup ... } static void pulse_ckv_ticks(uint16_t high, uint16_t low) { // ... IDF 4 write_items ... } #endif

The compiler sees only one implementation — whichever matches the target IDF version. The other code doesn't exist as far as the build is concerned. No runtime overhead, no dead code, no conditional branching.

It's not elegant. The file has two copies of every function. But it's the standard pattern in the ESP32 ecosystem, and it works. Libraries like FastLED, NeoPixelBus, and Espressif's own examples all use this approach.

The alternative — maintaining separate git branches for each IDF version — is worse in every way. More merge conflicts, more places for bugs to hide, more confusion for users trying to figure out which branch to use.

This version guard was part of the fix in LilyGo-EPD47 PR #181, alongside four other IDF 5 compatibility fixes.

The pulse that moves ink


Somewhere inside the ESP32 on my desk, a hardware state machine is cycling through a tiny buffer of pulse definitions right now. It toggles a GPIO pin HIGH for exactly 50 ticks, then LOW for exactly 100 ticks, then HIGH again. Each tick is precisely 100 nanoseconds.

The CPU doesn't know this is happening. It's busy decompressing a JPEG, calculating a moon phase, or formatting a Hindi date string.

On the other end of that GPIO pin, a charge pump amplifies the 3.3V logic signal into a 20V swing. That voltage reaches a transparent electrode on the display panel. Inside a microcapsule smaller than a grain of sand, black particles sink through clear oil and white particles rise to the surface.

A pixel turns white.

Multiply by 540,960 pixels — the resolution of a 4.7-inch e-paper panel — and you have a full screen refresh. Each row driven by a sequence of precisely timed RMT pulses.

The display doesn't care which API generated those pulses. It doesn't care whether the driver used rmt_write_items() or rmt_transmit(). It doesn't know about IDF version numbers, encoder abstractions, or the driver_ng migration that consumed a weekend of my time.

It only cares that the pulse arrived at the right time.

And thanks to a peripheral originally designed for TV remotes, it always does.