Shashank Shekhar

      • AllGather
      • AllReduce
      • AllToAll
      • Bandwidth
      • BFloat16 (Brain Floating Point)
      • Compute-Bound
      • FLOPs (Floating Point Operations)
      • FLOPs (Floating Point Operations)
      • GPU (Graphics Processing Unit)
      • Memory-Bound
      • MXU (Matrix Multiply Unit)
      • Numba
      • ReduceScatter
      • Roofline Model
      • Sharding
      • Systolic Array
      • TPU (Tensor Processing Unit)
      • VMEM (Vector Memory)
        • Chapter 2 Heterogeneous Data Parallel Computing Exercise Solutions
        • Chapter 3 Multidimensional Grids and Data Exercise Solutions
        • Chapter 4 Compute Architecture and Scheduling Exercise Solutions
        • Chapter 5 Memory Architecture and Data Locality Exercise Solutions
        • Chapter 6 Performance Considerations Exercise Solutions
        • Chapter 7 Convolution Exercise Solutions
        • Chapter 1 Introduction Notes
        • Chapter 2 Heterogeneous Data Parallel Computing Notes
        • Chapter 3 Multidimensional Grids and Data Notes
        • Chapter 4 Compute Architecture and Scheduling Notes
        • Chapter 5 Memory Architecture and Data Locality Notes
      • Drawing 2025-03-28 13.46.45.excalidraw
      • Scaling Book Part 0. Introductions
      • Scaling Book Part 1. All About Rooflines
      • Scaling Book Part 1. All About Rooflines Exercises
      • Scaling Book Part 2. How to Think About TPUs
      • Scaling Book Part 2. TPU Questions
      • Scaling Book Part 3 - Sharding Questions
      • Scaling Book Part 3. Sharded Matrices and How to Multiply Them
      • PEFT Review by Xu et al, Dec 2023
      • Speculative Decoding (DeepMind) by Chen et al 2023
      • Speculative Decoding (Google Brain) by Leviathan et al 2023
      • MiniTorch
      • 5 Apr, 2025 Readings
      • 7 April, 2025 Readings
      • 8 April, 2025 Readings
      • 9 April, 2025 Readings
      • 10 March, 2025 Readings
      • 11 March, 2025 Readings
      • 12 March, 2025 Readings
      • 13 March, 2025 Readings
      • 14 April, 2025 Readings
      • 14 March, 2025 Readings
      • 14 March, 2025 Readings
      • 16 March, 2025 Readings
      • 17 April, 2025 Readings
      • 17 March, 2025 Readings
      • 18 April, 2025 Readings
      • 18 March, 2025 Readings
      • 19 March, 2025 Readings
      • 20 March, 2025 Readings
      • 21 April, 2025 Readings
      • 21 March, 2025 Readings
      • 22 April, 2025 Readings
      • 22 March, 2025 Readings
      • 24 April, 2025 Readings
      • 24 March, 2025 Readings
      • 25 April, 2025 Readings
      • 25 March, 2025 Readings
      • 26 March, 2025 Readings
      • 27 April, 2025 Readings
      • 27 March, 2025 Readings
      • 28 March, 2025 Readings
      • 29 March, 2025 Readings
      • 30 March, 2025 Readings
      • 31 March, 2025 Readings
        • CUDAvMetal
        • Optimizing a Metal Matmul kernel for 5TFLOPs+ performance on my M1 Max
      • Machine Learning Compilation Course
    Home

    ❯

    tags

    ❯

    Tag: CUDA

    Tag: CUDA

    17 items with this tag.

    • Mar 16, 2025

      Chapter 4 Compute Architecture and Scheduling Exercise Solutions

      • CUDA
      • GPGPU
      • Compute-Architecture
      • Scheduling
    • Mar 16, 2025

      Chapter 5 Memory Architecture and Data Locality Exercise Solutions

      • CUDA
      • Memory
      • Shared-Memory
      • Tiling
    • Mar 16, 2025

      Chapter 6 Performance Considerations Exercise Solutions

      • CUDA
      • Performance
      • Memory-Coalescing
      • Optimization
    • Mar 16, 2025

      Chapter 7 Convolution Exercise Solutions

      • CUDA
      • Convolution
      • Filters
      • Tiling
    • Mar 13, 2025

      Numba

      • Numba
      • CUDA
      • Parallel
    • Mar 13, 2025

      13 March, 2025 Readings

      • CUDA
    • Mar 13, 2025

      14 March, 2025 Readings

      • CUDA
    • Mar 13, 2025

      14 March, 2025 Readings

      • CUDA
    • Mar 12, 2025

      12 March, 2025 Readings

      • CUDA
    • Mar 10, 2025

      Chapter 2 Heterogeneous Data Parallel Computing Exercise Solutions

      • CUDA
      • Thread
      • Block
      • Grid
    • Mar 10, 2025

      Chapter 3 Multidimensional Grids and Data Exercise Solutions

      • Row-Major
      • Column-Major
      • Matrix-Vector-Multiplication
      • Matrix-Multiplication
      • CUDA
    • Mar 10, 2025

      CUDA PMPP Book Guide

      • CUDA
      • Parallel-Programming
      • Textbook
    • Mar 10, 2025

      Chapter 1 Introduction Notes

      • CUDA
      • Parallel-Programming
      • PMPP
      • Notes
    • Mar 10, 2025

      Chapter 2 Heterogeneous Data Parallel Computing Notes

      • CUDA
      • Data-Parallel
      • SIMD
      • Notes
    • Mar 10, 2025

      Chapter 3 Multidimensional Grids and Data Notes

      • Indexing
      • Convolution
      • Matrix-Multiplication
      • CUDA
    • Mar 10, 2025

      10 March, 2025 Readings

      • LORA
      • PEFT
      • CUDA
      • NVCC
      • Quartz
    • Mar 10, 2025

      Welcome to my Digital Garden

      • CUDA

    Created with Quartz v4.4.0 © 2025

    • GitHub
    • Discord Community