Bandwidth
Bandwidth refers to the maximum rate of data transfer across a given path or interface in a computing system, typically measured in bytes per second (B/s) or bits per second (b/s).
Types of Bandwidth
Memory Bandwidth
- HBM Bandwidth: Rate at which data can be transferred between on-chip memory and compute cores
- GDDR Bandwidth: For GPUs using GDDR memory instead of HBM
- DRAM Bandwidth: For CPU memory access
Network Bandwidth
- ICI (Inter-Chip Interconnect): Communication between chips in the same node
- DCN (Data Center Network): Communication between nodes in a data center
- PCIe Bandwidth: Communication between CPU and accelerators (GPU/TPU)
Typical Values (2024-2025)
| Component | Bandwidth Range |
|---|---|
| HBM (H100) | 3.35 TB/s |
| HBM (TPU v6e) | 1.6 TB/s |
| PCIe Gen5 x16 | ~128 GB/s |
| ICI (TPU) | ~40-50 GB/s |
| Ethernet | 100-400 Gb/s |
| InfiniBand | 100-400 Gb/s |
Impact on Performance
- Roofline Analysis: Sets the slope of the memory-bound region
- Communication Time:
- Critical Intensity: Higher bandwidth raises the arithmetic intensity required to be compute-bound
Bandwidth Challenges
- Scaling Gap: Compute performance scaling outpaces bandwidth scaling
- Bandwidth Wall: Fundamental limitation on data movement
- Energy Cost: Data movement often costs more energy than computation
- Contention: Shared bandwidth resources can cause performance variability
Optimization Strategies
- Data Reuse: Maximize computations per loaded byte
- Caching: Use faster, closer memory hierarchies effectively
- Compression: Reduce bandwidth requirements through data compression
- Quantization: Lower precision formats reduce bandwidth needs
- Sparsity: Skip moving zero values to save bandwidth