BFloat16 (Brain Floating Point)

BFloat16 (Brain Floating Point) is a 16-bit floating point format developed by Google specifically for deep learning applications. It maintains the same exponent range as 32-bit floating point (FP32) but with reduced precision.

Format Specification

  • Size: 16 bits total
  • Sign: 1 bit
  • Exponent: 8 bits (same as FP32)
  • Mantissa: 7 bits (reduced from 23 bits in FP32)

Advantages

  • Dynamic Range: Same as FP32 (±1.7e±38) due to identical exponent bits
  • Memory Efficiency: Half the storage of FP32
  • Computation Speed: Accelerators can typically perform 2-4× more BF16 operations than FP32
  • Conversion Efficiency: Simple truncation from FP32 (unlike IEEE FP16)

Comparison to Other Formats

FormatTotal BitsSign BitsExponent BitsMantissa BitsApprox. Range
FP32321823±1.7e±38
BF1616187±1.7e±38
FP16161510±6.5e±4

Hardware Support

  • TPUs: Native support since early versions
  • NVIDIA GPUs: Native support in Ampere architecture and newer
  • AMD GPUs: Native support in CDNA architecture
  • Intel CPUs: Support in Cooper Lake and newer architectures

Usage in Deep Learning

  • Training: Often used for weights and activations
  • Mixed Precision: Commonly paired with FP32 accumulation to maintain accuracy
  • Inference: Provides good balance between accuracy and performance
  • Arithmetic Intensity: BF16 operations contribute to higher FLOPs but with lower memory bandwidth requirements