Shashank Shekhar

❯

❯

14 April, 2025 Readings

14 April, 2025 Readings

Apr 14, 20251 min read

Courses

High Performance LLMs in JAX Session 2: Single-Chip Performance & Rooflines
High Performance LLMs in JAX Session 3: Multi-Chip Performance & Rooflines

Exploratory

How CUDA Programming Works https://www.nvidia.com/en-us/on-demand/session/gtcspring22-s41487/

discusses some of the physics reasons behind why memory i/o works the way it days, why coalescing is fast, why 128 threads/4 warps are important (1024 Byes page size) etc etc

Graph View

Courses
Exploratory

Created with Quartz v4.4.0 © 2025

GitHub
Discord Community