FLOPs (Floating Point Operations)
FLOPs (Floating Point Operations) are a measure of computational work, specifically counting the number of floating-point addition, subtraction, multiplication, or division operations performed.
Significance
- Performance Metric: Hardware performance is often measured in FLOPs/second
- Workload Estimation: Used to quantify computational complexity of algorithms
- Efficiency Analysis: Helps determine how effectively hardware is being utilized
Common Precision Formats
- FP32: 32-bit floating point (single precision)
- FP64: 64-bit floating point (double precision)
- FP16: 16-bit floating point (half precision)
- BF16: 16-bit brain floating point (used in ML)
- INT8: 8-bit integer (used in quantized operations)
Measurement Scales
- GFLOPS: Gigaflops (10^9 FLOPs/second)
- TFLOPS: Teraflops (10^12 FLOPs/second)
- PFLOPS: Petaflops (10^15 FLOPs/second)
Counting Conventions
- Matrix multiplication of size M×K by K×N requires approximately 2MKN FLOPs
- M×K×N multiplications
- M×K×N additions
- Modern hardware often combines multiply and add operations (FMA - Fused Multiply-Add)
Notes
- Peak theoretical FLOPs/s rarely achieved in practice due to memory bottlenecks
- Different precision formats offer different FLOPs performance (e.g., INT8 operations typically 2-4× faster than FP32)
- Mixed precision operations combine different formats to balance performance and accuracy