Introduction to AES Hardware Implementation
AES (Advanced Encryption Standard), also known as Rijndael, is the most widely used symmetric encryption algorithm for securing sensitive data. While software implementations are flexible, hardware implementations in FPGAs and ASICs offer significant advantages in throughput, latency, and security against software-based attacks.
AES Specifications (FIPS 197)
- Block Size: 128 bits (fixed)
- Key Sizes: 128, 192, or 256 bits
- Rounds: 10 (AES-128), 12 (AES-192), 14 (AES-256)
- Structure: Substitution-Permutation Network (SPN)
AES Algorithm Overview
Each AES round consists of four operations:
1. SubBytes (S-Box Substitution)
Non-linear byte substitution using a lookup table:
- Each byte replaced independently
- Provides confusion (non-linearity)
- Based on multiplicative inverse in GF(28)
2. ShiftRows
Cyclic shift of rows in state matrix:
- Row 0: No shift
- Row 1: Shift left by 1
- Row 2: Shift left by 2
- Row 3: Shift left by 3
3. MixColumns
Linear transformation mixing column bytes:
- Matrix multiplication in GF(28)
- Provides diffusion across columns
- Skipped in final round
4. AddRoundKey
XOR state with round key:
- Simple bitwise XOR operation
- Round keys derived from key expansion
Hardware Implementation Architectures
1. Iterative Architecture (Area-Optimized)
Single round logic reused for all rounds:
- Area: Smallest (~3,000 LUTs)
- Throughput: Lowest (128 bits per 10-14 cycles)
- Latency: 10-14 clock cycles
- Use Case: Resource-constrained designs
2. Pipelined Architecture (Throughput-Optimized)
Each round in separate pipeline stage:
- Area: Largest (~30,000 LUTs)
- Throughput: Highest (128 bits per cycle)
- Latency: 10-14 clock cycles (initial)
- Use Case: High-bandwidth encryption
3. Loop-Unrolled Architecture (Balanced)
Partial unrolling (e.g., 2 or 5 rounds):
- Area: Medium
- Throughput: Medium
- Latency: Reduced
- Use Case: Balanced performance/area
Performance Comparison (AES-128, Xilinx 7-Series)
| Architecture | LUTs | Fmax | Throughput |
|---|---|---|---|
| Iterative | ~3,000 | 300 MHz | 3.8 Gbps |
| Pipelined | ~30,000 | 400 MHz | 51.2 Gbps |
S-Box Implementation Strategies
The S-Box is the most critical component, with several implementation options:
1. Look-Up Table (LUT)
// 256x8 ROM-based S-Box
logic [7:0] sbox_rom [0:255] = '{
8'h63, 8'h7c, 8'h77, 8'h7b, 8'hf2, 8'h6b, 8'h6f, 8'hc5, ...
};
assign sbox_out = sbox_rom[sbox_in];
Pros: Simple, fast. Cons: Memory usage, vulnerable to cache timing attacks.
2. Composite Field Arithmetic
Compute inverse in GF(28) using smaller subfields:
- Uses GF(24) or GF(22) arithmetic
- Reduces memory, increases logic
- Better for ASIC (gate-based)
3. Canright's Compact S-Box
Most area-efficient combinational implementation:
- ~100 gates per S-Box
- Uses tower field representation
- Ideal for resource-constrained designs
Key Expansion Implementation
Key expansion generates round keys from the cipher key:
On-the-Fly Key Expansion
- Compute round keys during encryption
- Saves memory (no key storage)
- Adds latency to first block
Pre-computed Key Schedule
- Store all round keys in memory
- Faster encryption start
- Required for decryption (reverse order)
Block Cipher Modes of Operation
AES operates on 128-bit blocks, but real data requires modes of operation:
| Mode | Parallelizable | Authentication | Use Case |
|---|---|---|---|
| ECB | Yes | No | Not recommended |
| CBC | Decrypt only | No | Legacy systems |
| CTR | Yes | No | High-throughput encryption |
| GCM | Yes | Yes (AEAD) | TLS, IPsec, storage |
| XTS | Yes | No | Disk encryption |
AES-GCM is the most popular mode for hardware implementation, providing both encryption and authentication.
Security Considerations
Side-Channel Attack Countermeasures
- Masking: Randomize intermediate values
- Hiding: Constant-time execution, noise injection
- Shuffling: Randomize S-Box access order
Fault Attack Protection
- Redundancy: Duplicate computations, compare results
- Detection: Check intermediate values against expected ranges
- Response: Zeroize keys on fault detection
Conclusion
Hardware AES implementation offers significant advantages for applications requiring high throughput, low latency, or enhanced security. The choice of architecture depends on your specific requirements for area, performance, and power consumption.
Vcores offers silicon-proven AES IP cores supporting all key sizes, multiple modes of operation (ECB, CBC, CTR, GCM, XTS), and optional side-channel countermeasures. Our cores are available in both high-throughput pipelined and compact iterative versions.