FLOPs (Floating Point Operations)

FLOPs (Floating Point Operations) are a measure of computational work, specifically counting the number of floating-point addition, subtraction, multiplication, or division operations performed.

Significance

  • Performance Metric: Hardware performance is often measured in FLOPs/second
  • Workload Estimation: Used to quantify computational complexity of algorithms
  • Efficiency Analysis: Helps determine how effectively hardware is being utilized

Common Precision Formats

  • FP32: 32-bit floating point (single precision)
  • FP64: 64-bit floating point (double precision)
  • FP16: 16-bit floating point (half precision)
  • BF16: 16-bit brain floating point (used in ML)
  • INT8: 8-bit integer (used in quantized operations)

Measurement Scales

  • GFLOPS: Gigaflops (10^9 FLOPs/second)
  • TFLOPS: Teraflops (10^12 FLOPs/second)
  • PFLOPS: Petaflops (10^15 FLOPs/second)

Counting Conventions

  • Matrix multiplication of size M×K by K×N requires approximately 2MKN FLOPs
    • M×K×N multiplications
    • M×K×N additions
  • Modern hardware often combines multiply and add operations (FMA - Fused Multiply-Add)

Notes

  • Peak theoretical FLOPs/s rarely achieved in practice due to memory bottlenecks
  • Different precision formats offer different FLOPs performance (e.g., INT8 operations typically 2-4× faster than FP32)
  • Mixed precision operations combine different formats to balance performance and accuracy