Shashank Shekhar

❯

❯

TPU (Tensor Processing Unit)

TPU (Tensor Processing Unit)

May 01, 20251 min read

Hardware
Accelerator
Google

TPU (Tensor Processing Unit)

A Tensor Processing Unit is a specialized application-specific integrated circuit (ASIC) developed by Google specifically for neural network machine learning workloads.

Key Components

MXU (Matrix Multiply Unit): Specialized for matrix multiplication operations
VPU (Vector Processing Unit): Handles elementwise operations
HBM (High Bandwidth Memory): On-chip memory with high throughput

Performance Characteristics

Different TPU versions have varying specifications:

TPU v5e:
- ~1.97e14 FLOPs/s (bfloat16)
- ~8.2e11 bytes/s HBM bandwidth
- Peak arithmetic intensity: ~240 FLOPs/byte
TPU v6e:
- ~9.1e14 FLOPs/s
- ~1.6TB/s HBM bandwidth

Communication Types

ICI: Inter-chip interconnect for communication between TPU chips
DCN: Data center network for communication between TPU pods

Efficiency Considerations

Compute-bound operations generally require batch sizes >240 tokens for matrix multiplications
Different operations run on different units (MXU vs. VPU) with different performance characteristics

Graph View

TPU (Tensor Processing Unit)
Key Components
Performance Characteristics
Communication Types
Efficiency Considerations

Backlinks

AllGather
AllReduce
AllToAll
MXU (Matrix Multiply Unit)
ReduceScatter
Systolic Array
VMEM (Vector Memory)
Scaling Book Part 1. All About Rooflines
Scaling Book Part 2. How to Think About TPUs
Scaling Book Part 3. Sharded Matrices and How to Multiply Them

Created with Quartz v4.4.0 © 2025

GitHub
Discord Community