UART Protocol: From Basics to Advanced Implementation
The UART (Universal Asynchronous Receiver/Transmitter) is the oldest and most ubiquitous serial communication interface in electronics. Despite the rise of high-speed serial standards, UART remains indispensable for debug consoles, GPS modules, Bluetooth/Wi-Fi modems, industrial sensors, and inter-processor links. Its defining feature is that it is asynchronous: there is no shared clock line. Instead, transmitter and receiver agree in advance on a common bit rate (baud rate) and rely on start/stop framing to align byte boundaries.
This guide walks from the physical bit-level frame all the way to a production-grade UART IP core, covering baud rate generation, 16x oversampling, hardware flow control, FIFO buffering, and the RS-232/RS-485 physical layers used in industrial deployments.
Quick Summary
| Topology | Point-to-point, full-duplex, 2 data wires (TX, RX) plus optional RTS/CTS |
| Clocking | Asynchronous — no shared clock; both ends configured to the same baud rate |
| Frame | 1 start bit, 5–9 data bits, optional parity, 1/1.5/2 stop bits |
| Typical Rates | 9600 bps to 921600 bps (TTL/RS-232); up to 12+ Mbps on RS-485 transceivers |
The UART Frame Format
UART transmits one frame per character. The line idles HIGH (logic 1, also called the mark state). A complete frame consists of the following fields, transmitted LSB-first:
Start Bit
A single logic-0 (space) bit signals the beginning of a frame. The receiver detects the HIGH-to-LOW transition on the idle line and uses it to synchronize its internal bit-sampling clock. Because there is no shared clock, this falling edge is the only timing reference the receiver gets for the entire frame.
Data Bits
Between 5 and 9 data bits follow, transmitted least-significant bit first. The 8-bit configuration is by far the most common (one byte per frame). A 9th data bit is sometimes repurposed as an address/data flag in multi-drop RS-485 networks.
Parity Bit (Optional)
An optional parity bit provides single-bit error detection. The transmitter sets it so that the total number of 1s (data + parity) is even (even parity) or odd (odd parity). Mark parity forces it to 1 and space parity to 0. Parity catches all single-bit errors but cannot detect an even number of flipped bits.
Stop Bit(s)
One, one-and-a-half, or two stop bits (logic 1) terminate the frame and guarantee a minimum idle period before the next start bit. The stop bit returns the line to the idle state so the receiver can reliably detect the next falling edge.
Baud Rate Generation
The baud rate defines the number of signaling elements (bits) transmitted per second. For UART, because each symbol carries exactly one bit, baud rate equals bit rate. Both ends must be configured to the same value; a mismatch greater than roughly 2–3% will cause the receiver to sample bits at the wrong instants and corrupt the data.
The Baud Rate Divisor
A UART derives its bit timing from a high-frequency reference clock by dividing it down. To support oversampling (see below), the internal baud generator runs at 16x the target baud rate, so the divisor is computed as:
Baud Rate Divisor
Divisor = fclk / (16 × Baud)
Example: For fclk = 50 MHz and a target of 115200 baud:
Divisor = 50,000,000 / (16 × 115200) = 27.13 → 27
Actual baud = 50,000,000 / (16 × 27) = 115,740 bps → error = +0.47% (well within tolerance)
The fractional remainder is the source of baud error. A 16-bit integer divisor handles most cases, but high-quality IP cores add a fractional baud rate generator (an accumulator that periodically inserts an extra clock period) to drive the error close to zero even for awkward clock/baud combinations.
Standard Baud Rates and Divisor Error
The table below shows integer divisors and resulting error for a 50 MHz reference clock with 16x oversampling:
| Target Baud | Ideal Divisor | Rounded | Actual Baud | Error |
|---|---|---|---|---|
| 9600 | 325.52 | 326 | 9,586 | -0.15% |
| 19200 | 162.76 | 163 | 19,172 | -0.15% |
| 38400 | 81.38 | 81 | 38,580 | +0.47% |
| 57600 | 54.25 | 54 | 57,870 | +0.47% |
| 115200 | 27.13 | 27 | 115,740 | +0.47% |
| 921600 | 3.39 | 3 | 1,041,667 | +13.0% (use fractional!) |
Note how the integer divisor collapses at very high baud rates relative to the clock — the 921600 case demonstrates exactly when a fractional divider or a higher reference clock becomes mandatory.
16x Oversampling and Bit Sampling
Because the receiver has no clock from the transmitter, it must recover timing from the start-bit edge alone. The classic technique is 16x oversampling: the receiver runs an internal clock at 16 times the baud rate and counts ticks within each bit period.
How Sampling Works
- Edge detection: The receiver waits for the idle-to-start falling edge.
- Start-bit validation: It counts 8 oversample ticks to reach the center of the start bit and re-checks that the line is still LOW — this rejects narrow noise glitches.
- Mid-bit sampling: From there it samples every 16 ticks, landing each sample at the center of each data bit, where the signal is most stable.
- Majority voting: Robust cores sample at ticks 7, 8, and 9 and take a 2-of-3 majority vote to further immunize against transient noise.
Why Center Sampling Tolerates Clock Drift
Sampling at the bit center leaves a half-bit (8 ticks) of margin on each side. Cumulative timing error builds across the frame; for the worst-case bit (the last data/stop bit of a 10-bit frame), the combined transmitter + receiver baud error must stay below roughly ±5% to keep the final sample inside its bit window. This is why ~2–3% per-end error is the practical design target.
Some low-power designs use 8x oversampling to halve the internal clock; this is acceptable at low baud rates but reduces noise margin and edge-placement resolution.
Hardware Flow Control: RTS/CTS
At high baud rates a receiver can be overrun if its buffer fills faster than software drains it. Hardware flow control uses two extra signals to throttle the data stream:
- RTS (Request To Send): Asserted (LOW) by a device to tell the far end "I have buffer space — you may send."
- CTS (Clear To Send): An input; the transmitter only sends when its CTS is asserted by the partner's RTS.
In the modern crossed-wiring convention, one device's RTS connects to the other's CTS. When a receiver's FIFO crosses a high-water mark, it de-asserts RTS; the transmitter sees CTS de-assert and pauses after the current frame, preventing overrun without losing data.
Software Flow Control (XON/XOFF)
An alternative uses in-band control characters: XOFF (0x13) tells the sender to pause and XON (0x11) resumes it. This needs no extra wires but consumes two byte values from the data stream and reacts more slowly, making it unsuitable for binary protocols or high throughput.
FIFO Design in UART IP Cores
A bare UART interrupts (or polls) the CPU on every single byte — at 115200 baud that is over 11,000 interrupts per second per direction. FIFO buffers on both TX and RX paths dramatically reduce this overhead and tolerate interrupt latency.
Key FIFO Parameters
- Depth: Common depths are 16, 32, 64, or 128 bytes. The legacy 16550 UART popularized the 16-byte FIFO.
- Trigger / watermark level: The RX FIFO raises an interrupt when it reaches a programmable threshold (e.g. 1/4, 1/2, 3/4 full), batching many bytes per interrupt.
- Timeout interrupt: If the FIFO holds fewer bytes than the trigger level but no new data arrives for ~4 character times, a timeout interrupt flushes the residue so the last few bytes are not stranded.
- Status flags: Empty, Full, Almost-Full, and Almost-Empty flags let software and DMA engines manage flow precisely.
Sizing the FIFO Against Interrupt Latency
Minimum RX FIFO Depth
Depth ≥ (Baud / bits_per_frame) × tlatency
Example: 115200 baud, 8N1 (10 bits/frame), worst-case service latency of 1 ms:
Depth ≥ (115200 / 10) × 0.001 = 11.52 → choose a 16-byte FIFO minimum (round up to next power of two for headroom)
For DMA-driven systems, deeper FIFOs (64–128 bytes) plus flow control let the core ride out long bus-arbitration stalls without dropping data — critical in industrial controllers running an RTOS with high interrupt load.
Physical Layers: TTL, RS-232, and RS-485
The UART logic block produces simple TTL/CMOS levels. To drive real cables, an external line transceiver translates those levels to a robust electrical standard.
| Parameter | TTL/CMOS UART | RS-232 | RS-485 |
|---|---|---|---|
| Signaling | Single-ended | Single-ended | Differential |
| Logic Levels | 0V / 3.3V (or 5V) | +3 to +15V (0) / -3 to -15V (1) | ±1.5 to ±6V differential |
| Max Distance | ~0.3 m (on-board) | ~15 m | ~1200 m |
| Max Speed | Several Mbps | ~115–230 kbps typical | 10–12 Mbps (short runs) |
| Topology | Point-to-point | Point-to-point | Multi-drop (up to 32+ nodes) |
| Noise Immunity | Low | Moderate | High (common-mode rejection) |
RS-485 is the workhorse of industrial and building automation (Modbus RTU, PROFIBUS DP physical layer). Its differential pair rejects common-mode noise and supports long, multi-drop buses. Because it is typically half-duplex on a single twisted pair, the UART must control a Driver Enable (DE) pin: assert DE before transmitting and release it immediately after the stop bit so other nodes can drive the bus. Mistiming this turnaround is the single most common RS-485 bug, which is why robust IP cores include automatic, baud-aware DE/RE turnaround logic.
Error Detection and Status Reporting
A well-designed UART receiver flags several distinct error conditions in its status register:
- Parity Error: The received parity bit disagrees with the recomputed parity of the data — indicates a single-bit corruption.
- Framing Error: The expected stop bit is read as logic 0 instead of 1, usually caused by a baud-rate mismatch or a break condition.
- Overrun Error: A new byte arrives before the previous one (or the FIFO) was read — data is lost. Flow control and FIFOs exist to prevent this.
- Break Detection: The line is held LOW for longer than a full frame time, used to signal line resets or attention conditions.
- Noise/Glitch Error: Detected when oversample majority voting disagrees, hinting at marginal signal integrity.
UART itself provides only per-byte parity. Multi-byte integrity for protocols like Modbus RTU is layered on top using a frame CRC-16, while text protocols often append a checksum — UART moves the bytes; the application layer guarantees message integrity.
Implementation Best Practices
- Choose a clean reference clock: Select fclk so target baud rates land on near-integer divisors; 1.8432 MHz multiples and 50/100 MHz with fractional dividers are common choices that minimize baud error.
- Always use 16x oversampling with majority voting: It is cheap in logic and buys substantial noise immunity and clock-drift tolerance.
- Validate the start bit at its center: Re-sampling the start bit rejects glitches that would otherwise trigger spurious frames.
- Size FIFOs to interrupt/DMA latency: Compute depth from worst-case service time and round up to a power of two; add hardware flow control above 230400 baud.
- Implement RS-485 DE turnaround in hardware: Tie driver-enable timing to the baud generator so it releases exactly one stop-bit time after the last bit, independent of software jitter.
- Expose all error flags and sticky status: Latch parity, framing, and overrun errors per byte so software can correlate faults precisely.
- Add a configurable timeout interrupt: Prevents the last few bytes of a burst from being stranded below the FIFO trigger level.
- Clock-domain crossing: Synchronize the asynchronous RX input through at least two flip-flops before edge detection to avoid metastability.
Conclusion
UART endures because it is simple, low-cost, and astonishingly robust when implemented correctly. The subtleties — accurate baud generation, 16x oversampling with center sampling and majority voting, properly sized FIFOs, and disciplined RS-485 turnaround — are exactly what separate a toy UART from one that runs flawlessly in a noisy industrial cabinet for years.
From a 9600-baud debug console to a 12 Mbps multi-drop RS-485 field bus, a well-architected UART core with configurable framing, fractional baud rates, deep FIFOs, and hardware flow control covers the entire spectrum of serial communication needs.
Vcores offers silicon-proven UART IP cores featuring fractional baud rate generation, configurable FIFOs, hardware flow control, full RS-232/RS-485 support with automatic driver-enable timing, and comprehensive verification — ready for seamless integration into your FPGA and ASIC designs for demanding industrial applications.