BFloat16 (Brain Floating Point)
BFloat16 (Brain Floating Point) is a 16-bit floating point format developed by Google specifically for deep learning applications. It maintains the same exponent range as 32-bit floating point (FP32) but with reduced precision.
Format Specification
- Size: 16 bits total
- Sign: 1 bit
- Exponent: 8 bits (same as FP32)
- Mantissa: 7 bits (reduced from 23 bits in FP32)
Advantages
- Dynamic Range: Same as FP32 (±1.7e±38) due to identical exponent bits
- Memory Efficiency: Half the storage of FP32
- Computation Speed: Accelerators can typically perform 2-4× more BF16 operations than FP32
- Conversion Efficiency: Simple truncation from FP32 (unlike IEEE FP16)
Comparison to Other Formats
| Format | Total Bits | Sign Bits | Exponent Bits | Mantissa Bits | Approx. Range |
|---|---|---|---|---|---|
| FP32 | 32 | 1 | 8 | 23 | ±1.7e±38 |
| BF16 | 16 | 1 | 8 | 7 | ±1.7e±38 |
| FP16 | 16 | 1 | 5 | 10 | ±6.5e±4 |
Hardware Support
- TPUs: Native support since early versions
- NVIDIA GPUs: Native support in Ampere architecture and newer
- AMD GPUs: Native support in CDNA architecture
- Intel CPUs: Support in Cooper Lake and newer architectures
Usage in Deep Learning
- Training: Often used for weights and activations
- Mixed Precision: Commonly paired with FP32 accumulation to maintain accuracy
- Inference: Provides good balance between accuracy and performance
- Arithmetic Intensity: BF16 operations contribute to higher FLOPs but with lower memory bandwidth requirements