Bandwidth

Bandwidth refers to the maximum rate of data transfer across a given path or interface in a computing system, typically measured in bytes per second (B/s) or bits per second (b/s).

Types of Bandwidth

Memory Bandwidth

  • HBM Bandwidth: Rate at which data can be transferred between on-chip memory and compute cores
  • GDDR Bandwidth: For GPUs using GDDR memory instead of HBM
  • DRAM Bandwidth: For CPU memory access

Network Bandwidth

  • ICI (Inter-Chip Interconnect): Communication between chips in the same node
  • DCN (Data Center Network): Communication between nodes in a data center
  • PCIe Bandwidth: Communication between CPU and accelerators (GPU/TPU)

Typical Values (2024-2025)

ComponentBandwidth Range
HBM (H100)3.35 TB/s
HBM (TPU v6e)1.6 TB/s
PCIe Gen5 x16~128 GB/s
ICI (TPU)~40-50 GB/s
Ethernet100-400 Gb/s
InfiniBand100-400 Gb/s

Impact on Performance

  • Roofline Analysis: Sets the slope of the memory-bound region
  • Communication Time:
  • Critical Intensity: Higher bandwidth raises the arithmetic intensity required to be compute-bound

Bandwidth Challenges

  • Scaling Gap: Compute performance scaling outpaces bandwidth scaling
  • Bandwidth Wall: Fundamental limitation on data movement
  • Energy Cost: Data movement often costs more energy than computation
  • Contention: Shared bandwidth resources can cause performance variability

Optimization Strategies

  • Data Reuse: Maximize computations per loaded byte
  • Caching: Use faster, closer memory hierarchies effectively
  • Compression: Reduce bandwidth requirements through data compression
  • Quantization: Lower precision formats reduce bandwidth needs
  • Sparsity: Skip moving zero values to save bandwidth