Books
- CUDA PMPP book ch5-exercises
- CUDA PMPP book ch6-exercises
Courses
- LLM Systems Course Ex 1
- LLM Systems Course
- Distributed Model Training 1, 2 slides
Exploratory
- Tested out a few different mat mul kernels (naive, global memory coalescing, 2D block tiling) on tensara.org