Shashank Shekhar
Search
Search
Dark mode
Light mode
Explorer
Atomics Notes
AllGather
AllReduce
AllToAll
Bandwidth
BFloat16 (Brain Floating Point)
Compute-Bound
FLOPs (Floating Point Operations)
FLOPs (Floating Point Operations)
GPU (Graphics Processing Unit)
Memory-Bound
MXU (Matrix Multiply Unit)
Numba
ReduceScatter
Roofline Model
Sharding
Systolic Array
TPU (Tensor Processing Unit)
VMEM (Vector Memory)
CUDA PMPP Book Guide
CUDA PMPP Book Exercise Solutions
Chapter 2 Heterogeneous Data Parallel Computing Exercise Solutions
Chapter 3 Multidimensional Grids and Data Exercise Solutions
Chapter 4 Compute Architecture and Scheduling Exercise Solutions
Chapter 5 Memory Architecture and Data Locality Exercise Solutions
Chapter 6 Performance Considerations Exercise Solutions
Chapter 7 Convolution Exercise Solutions
CUDA PMPP Book Notes
Chapter 1 Introduction Notes
Chapter 2 Heterogeneous Data Parallel Computing Notes
Chapter 3 Multidimensional Grids and Data Notes
Chapter 4 Compute Architecture and Scheduling Notes
Chapter 5 Memory Architecture and Data Locality Notes
Excalidraw
Drawing 2025-03-28 13.46.45.excalidraw
How to Scale Your Model Book Guide
Scaling Book Part 0. Introductions
Scaling Book Part 1. All About Rooflines
Scaling Book Part 1. All About Rooflines Exercises
Scaling Book Part 2. How to Think About TPUs
Scaling Book Part 2. TPU Questions
Scaling Book Part 3 - Sharding Questions
Scaling Book Part 3. Sharded Matrices and How to Multiply Them
Paper Notes
PEFT Review by Xu et al, Dec 2023
Speculative Decoding (DeepMind) by Chen et al 2023
Speculative Decoding (Google Brain) by Leviathan et al 2023
Project Notes
MiniTorch
Reading List
5 Apr, 2025 Readings
7 April, 2025 Readings
8 April, 2025 Readings
9 April, 2025 Readings
10 March, 2025 Readings
11 March, 2025 Readings
12 March, 2025 Readings
13 March, 2025 Readings
14 April, 2025 Readings
14 March, 2025 Readings
14 March, 2025 Readings
16 March, 2025 Readings
17 April, 2025 Readings
17 March, 2025 Readings
18 April, 2025 Readings
18 March, 2025 Readings
19 March, 2025 Readings
20 March, 2025 Readings
21 April, 2025 Readings
21 March, 2025 Readings
22 April, 2025 Readings
22 March, 2025 Readings
24 April, 2025 Readings
24 March, 2025 Readings
25 April, 2025 Readings
25 March, 2025 Readings
26 March, 2025 Readings
27 April, 2025 Readings
27 March, 2025 Readings
28 March, 2025 Readings
29 March, 2025 Readings
30 March, 2025 Readings
31 March, 2025 Readings
Working-Blogs
CUDAvMetal
Optimizing a Metal Matmul kernel for 5TFLOPs+ performance on my M1 Max
Machine Learning Compilation Course
Home
❯
tags
❯
Tag: Optimization
Tag: Optimization
3 items with this tag.
May 01, 2025
Compute-Bound
Performance
Optimization
Roofline
May 01, 2025
Memory-Bound
Performance
Optimization
Roofline
Mar 16, 2025
Chapter 6 Performance Considerations Exercise Solutions
CUDA
Performance
Memory-Coalescing
Optimization