BFloat16 (Brain Floating Point)

BFloat16 (Brain Floating Point) is a 16-bit floating point format developed by Google specifically for deep learning applications. It maintains the same exponent range as 32-bit floating point (FP32) but with reduced precision.

Format Specification

Size: 16 bits total
Sign: 1 bit
Exponent: 8 bits (same as FP32)
Mantissa: 7 bits (reduced from 23 bits in FP32)

Advantages

Dynamic Range: Same as FP32 (±1.7e±38) due to identical exponent bits
Memory Efficiency: Half the storage of FP32
Computation Speed: Accelerators can typically perform 2-4× more BF16 operations than FP32
Conversion Efficiency: Simple truncation from FP32 (unlike IEEE FP16)

Comparison to Other Formats

Format	Total Bits	Sign Bits	Exponent Bits	Mantissa Bits	Approx. Range
FP32	32	1	8	23	±1.7e±38
BF16	16	1	8	7	±1.7e±38
FP16	16	1	5	10	±6.5e±4

Hardware Support

TPUs: Native support since early versions
NVIDIA GPUs: Native support in Ampere architecture and newer
AMD GPUs: Native support in CDNA architecture
Intel CPUs: Support in Cooper Lake and newer architectures

Usage in Deep Learning

Training: Often used for weights and activations
Mixed Precision: Commonly paired with FP32 accumulation to maintain accuracy
Inference: Provides good balance between accuracy and performance
Arithmetic Intensity: BF16 operations contribute to higher FLOPs but with lower memory bandwidth requirements