Shashank Shekhar
Search
Search
Dark mode
Light mode
Explorer
Atomics Notes
AllGather
AllReduce
AllToAll
Bandwidth
BFloat16 (Brain Floating Point)
Compute-Bound
FLOPs (Floating Point Operations)
FLOPs (Floating Point Operations)
GPU (Graphics Processing Unit)
Memory-Bound
MXU (Matrix Multiply Unit)
Numba
ReduceScatter
Roofline Model
Sharding
Systolic Array
TPU (Tensor Processing Unit)
VMEM (Vector Memory)
CUDA PMPP Book Guide
CUDA PMPP Book Exercise Solutions
Chapter 2 Heterogeneous Data Parallel Computing Exercise Solutions
Chapter 3 Multidimensional Grids and Data Exercise Solutions
Chapter 4 Compute Architecture and Scheduling Exercise Solutions
Chapter 5 Memory Architecture and Data Locality Exercise Solutions
Chapter 6 Performance Considerations Exercise Solutions
Chapter 7 Convolution Exercise Solutions
CUDA PMPP Book Notes
Chapter 1 Introduction Notes
Chapter 2 Heterogeneous Data Parallel Computing Notes
Chapter 3 Multidimensional Grids and Data Notes
Chapter 4 Compute Architecture and Scheduling Notes
Chapter 5 Memory Architecture and Data Locality Notes
Excalidraw
Drawing 2025-03-28 13.46.45.excalidraw
How to Scale Your Model Book Guide
Scaling Book Part 0. Introductions
Scaling Book Part 1. All About Rooflines
Scaling Book Part 1. All About Rooflines Exercises
Scaling Book Part 2. How to Think About TPUs
Scaling Book Part 2. TPU Questions
Scaling Book Part 3 - Sharding Questions
Scaling Book Part 3. Sharded Matrices and How to Multiply Them
Paper Notes
PEFT Review by Xu et al, Dec 2023
Speculative Decoding (DeepMind) by Chen et al 2023
Speculative Decoding (Google Brain) by Leviathan et al 2023
Project Notes
MiniTorch
Reading List
5 Apr, 2025 Readings
7 April, 2025 Readings
8 April, 2025 Readings
9 April, 2025 Readings
10 March, 2025 Readings
11 March, 2025 Readings
12 March, 2025 Readings
13 March, 2025 Readings
14 April, 2025 Readings
14 March, 2025 Readings
14 March, 2025 Readings
16 March, 2025 Readings
17 April, 2025 Readings
17 March, 2025 Readings
18 April, 2025 Readings
18 March, 2025 Readings
19 March, 2025 Readings
20 March, 2025 Readings
21 April, 2025 Readings
21 March, 2025 Readings
22 April, 2025 Readings
22 March, 2025 Readings
24 April, 2025 Readings
24 March, 2025 Readings
25 April, 2025 Readings
25 March, 2025 Readings
26 March, 2025 Readings
27 April, 2025 Readings
27 March, 2025 Readings
28 March, 2025 Readings
29 March, 2025 Readings
30 March, 2025 Readings
31 March, 2025 Readings
Working-Blogs
CUDAvMetal
Optimizing a Metal Matmul kernel for 5TFLOPs+ performance on my M1 Max
Machine Learning Compilation Course
Home
❯
Atomics Notes
Atomics Notes
Short Notes about anything and everything.
18 items under this folder.
May 01, 2025
AllGather
Communication
Distributed
Collective
May 01, 2025
AllReduce
Communication
Distributed
Collective
May 01, 2025
AllToAll
Communication
Distributed
Collective
May 01, 2025
FLOPs (Floating Point Operations)
Performance
Computation
Hardware
May 01, 2025
BFloat16 (Brain Floating Point)
Numerical
Format
Machine-Learning
May 01, 2025
Bandwidth
Performance
Hardware
Communication
May 01, 2025
Compute-Bound
Performance
Optimization
Roofline
May 01, 2025
FLOPs (Floating Point Operations)
Performance
Computation
Hardware
May 01, 2025
MXU (Matrix Multiply Unit)
Hardware
TPU
Matrix-Multiplication
May 01, 2025
GPU (Graphics Processing Unit)
Hardware
Accelerator
Parallel
May 01, 2025
Memory-Bound
Performance
Optimization
Roofline
May 01, 2025
ReduceScatter
Communication
Distributed
Collective
May 01, 2025
Roofline Model
Performance
Hardware
Analysis
May 01, 2025
Sharding
Distributed
Parallelism
Tensor
May 01, 2025
Systolic Array
Hardware
TPU
Architecture
Matrix-Multiplication
May 01, 2025
TPU (Tensor Processing Unit)
Hardware
Accelerator
Google
May 01, 2025
VMEM (Vector Memory)
Hardware
TPU
Memory
Mar 13, 2025
Numba
Numba
CUDA
Parallel