Courses
- High Performance LLMs in JAX Session 2: Single-Chip Performance & Rooflines
- High Performance LLMs in JAX Session 3: Multi-Chip Performance & Rooflines
Exploratory
Read a bit about fused multiply add operations on ARM SIMD CPUs for patching a Pytorch bug: https://github.com/pytorch/pytorch/issues/149292