Bandwidth

Bandwidth refers to the maximum rate of data transfer across a given path or interface in a computing system, typically measured in bytes per second (B/s) or bits per second (b/s).

Types of Bandwidth

Memory Bandwidth

HBM Bandwidth: Rate at which data can be transferred between on-chip memory and compute cores
GDDR Bandwidth: For GPUs using GDDR memory instead of HBM
DRAM Bandwidth: For CPU memory access

Network Bandwidth

ICI (Inter-Chip Interconnect): Communication between chips in the same node
DCN (Data Center Network): Communication between nodes in a data center
PCIe Bandwidth: Communication between CPU and accelerators (GPU/TPU)

Typical Values (2024-2025)

Component	Bandwidth Range
HBM (H100)	3.35 TB/s
HBM (TPU v6e)	1.6 TB/s
PCIe Gen5 x16	~128 GB/s
ICI (TPU)	~40-50 GB/s
Ethernet	100-400 Gb/s
InfiniBand	100-400 Gb/s

Impact on Performance

Roofline Analysis: Sets the slope of the memory-bound region
Communication Time: $T_{co mm s} = \frac{Communication Bytes}{Bandwidth Bytes/s}$
Critical Intensity: Higher bandwidth raises the arithmetic intensity required to be compute-bound

Bandwidth Challenges

Scaling Gap: Compute performance scaling outpaces bandwidth scaling
Bandwidth Wall: Fundamental limitation on data movement
Energy Cost: Data movement often costs more energy than computation
Contention: Shared bandwidth resources can cause performance variability

Optimization Strategies

Data Reuse: Maximize computations per loaded byte
Caching: Use faster, closer memory hierarchies effectively
Compression: Reduce bandwidth requirements through data compression
Quantization: Lower precision formats reduce bandwidth needs
Sparsity: Skip moving zero values to save bandwidth