Shashank Shekhar
Search
Search
Dark mode
Light mode
Explorer
Atomics Notes
AllGather
AllReduce
AllToAll
Bandwidth
BFloat16 (Brain Floating Point)
Compute-Bound
FLOPs (Floating Point Operations)
FLOPs (Floating Point Operations)
GPU (Graphics Processing Unit)
Memory-Bound
MXU (Matrix Multiply Unit)
Numba
ReduceScatter
Roofline Model
Sharding
Systolic Array
TPU (Tensor Processing Unit)
VMEM (Vector Memory)
CUDA PMPP Book Guide
CUDA PMPP Book Exercise Solutions
Chapter 2 Heterogeneous Data Parallel Computing Exercise Solutions
Chapter 3 Multidimensional Grids and Data Exercise Solutions
Chapter 4 Compute Architecture and Scheduling Exercise Solutions
Chapter 5 Memory Architecture and Data Locality Exercise Solutions
Chapter 6 Performance Considerations Exercise Solutions
Chapter 7 Convolution Exercise Solutions
CUDA PMPP Book Notes
Chapter 1 Introduction Notes
Chapter 2 Heterogeneous Data Parallel Computing Notes
Chapter 3 Multidimensional Grids and Data Notes
Chapter 4 Compute Architecture and Scheduling Notes
Chapter 5 Memory Architecture and Data Locality Notes
Excalidraw
Drawing 2025-03-28 13.46.45.excalidraw
How to Scale Your Model Book Guide
Scaling Book Part 0. Introductions
Scaling Book Part 1. All About Rooflines
Scaling Book Part 1. All About Rooflines Exercises
Scaling Book Part 2. How to Think About TPUs
Scaling Book Part 2. TPU Questions
Scaling Book Part 3 - Sharding Questions
Scaling Book Part 3. Sharded Matrices and How to Multiply Them
Paper Notes
PEFT Review by Xu et al, Dec 2023
Speculative Decoding (DeepMind) by Chen et al 2023
Speculative Decoding (Google Brain) by Leviathan et al 2023
Project Notes
MiniTorch
Reading List
5 Apr, 2025 Readings
7 April, 2025 Readings
8 April, 2025 Readings
9 April, 2025 Readings
10 March, 2025 Readings
11 March, 2025 Readings
12 March, 2025 Readings
13 March, 2025 Readings
14 April, 2025 Readings
14 March, 2025 Readings
14 March, 2025 Readings
16 March, 2025 Readings
17 April, 2025 Readings
17 March, 2025 Readings
18 April, 2025 Readings
18 March, 2025 Readings
19 March, 2025 Readings
20 March, 2025 Readings
21 April, 2025 Readings
21 March, 2025 Readings
22 April, 2025 Readings
22 March, 2025 Readings
24 April, 2025 Readings
24 March, 2025 Readings
25 April, 2025 Readings
25 March, 2025 Readings
26 March, 2025 Readings
27 April, 2025 Readings
27 March, 2025 Readings
28 March, 2025 Readings
29 March, 2025 Readings
30 March, 2025 Readings
31 March, 2025 Readings
Working-Blogs
CUDAvMetal
Optimizing a Metal Matmul kernel for 5TFLOPs+ performance on my M1 Max
Machine Learning Compilation Course
Home
❯
tags
❯
Tag: CUDA
Tag: CUDA
17 items with this tag.
Mar 16, 2025
Chapter 4 Compute Architecture and Scheduling Exercise Solutions
CUDA
GPGPU
Compute-Architecture
Scheduling
Mar 16, 2025
Chapter 5 Memory Architecture and Data Locality Exercise Solutions
CUDA
Memory
Shared-Memory
Tiling
Mar 16, 2025
Chapter 6 Performance Considerations Exercise Solutions
CUDA
Performance
Memory-Coalescing
Optimization
Mar 16, 2025
Chapter 7 Convolution Exercise Solutions
CUDA
Convolution
Filters
Tiling
Mar 13, 2025
Numba
Numba
CUDA
Parallel
Mar 13, 2025
13 March, 2025 Readings
CUDA
Mar 13, 2025
14 March, 2025 Readings
CUDA
Mar 13, 2025
14 March, 2025 Readings
CUDA
Mar 12, 2025
12 March, 2025 Readings
CUDA
Mar 10, 2025
Chapter 2 Heterogeneous Data Parallel Computing Exercise Solutions
CUDA
Thread
Block
Grid
Mar 10, 2025
Chapter 3 Multidimensional Grids and Data Exercise Solutions
Row-Major
Column-Major
Matrix-Vector-Multiplication
Matrix-Multiplication
CUDA
Mar 10, 2025
CUDA PMPP Book Guide
CUDA
Parallel-Programming
Textbook
Mar 10, 2025
Chapter 1 Introduction Notes
CUDA
Parallel-Programming
PMPP
Notes
Mar 10, 2025
Chapter 2 Heterogeneous Data Parallel Computing Notes
CUDA
Data-Parallel
SIMD
Notes
Mar 10, 2025
Chapter 3 Multidimensional Grids and Data Notes
Indexing
Convolution
Matrix-Multiplication
CUDA
Mar 10, 2025
10 March, 2025 Readings
LORA
PEFT
CUDA
NVCC
Quartz
Mar 10, 2025
Welcome to my Digital Garden
CUDA