Exploratory
DeepGEMM library https://github.com/deepseek-ai/DeepGEMM
- Understanding warp level primitives used
Eleuther AI Scalability Reading Group S1: GPU Architecture, CUDA, NCCL https://youtu.be/Cp7g1Ll4v0M?si=-RwH-g6YK2AYqdJR
- not much new to learn, but importantly learned that NCCL implements the collectives (AllReduce, AllToAll, ReduceScatter, AllGather etc) relevant for FSDP, ZeRo etc
The 2024 Technical Interview Guide for AI Researchers
- Rora handbook, went through the preparation schedule of researchers and found some relevant resources