Shashank Shekhar

      • AllGather
      • AllReduce
      • AllToAll
      • Bandwidth
      • BFloat16 (Brain Floating Point)
      • Compute-Bound
      • FLOPs (Floating Point Operations)
      • FLOPs (Floating Point Operations)
      • GPU (Graphics Processing Unit)
      • Memory-Bound
      • MXU (Matrix Multiply Unit)
      • Numba
      • ReduceScatter
      • Roofline Model
      • Sharding
      • Systolic Array
      • TPU (Tensor Processing Unit)
      • VMEM (Vector Memory)
        • Chapter 2 Heterogeneous Data Parallel Computing Exercise Solutions
        • Chapter 3 Multidimensional Grids and Data Exercise Solutions
        • Chapter 4 Compute Architecture and Scheduling Exercise Solutions
        • Chapter 5 Memory Architecture and Data Locality Exercise Solutions
        • Chapter 6 Performance Considerations Exercise Solutions
        • Chapter 7 Convolution Exercise Solutions
        • Chapter 1 Introduction Notes
        • Chapter 2 Heterogeneous Data Parallel Computing Notes
        • Chapter 3 Multidimensional Grids and Data Notes
        • Chapter 4 Compute Architecture and Scheduling Notes
        • Chapter 5 Memory Architecture and Data Locality Notes
      • Drawing 2025-03-28 13.46.45.excalidraw
      • Scaling Book Part 0. Introductions
      • Scaling Book Part 1. All About Rooflines
      • Scaling Book Part 1. All About Rooflines Exercises
      • Scaling Book Part 2. How to Think About TPUs
      • Scaling Book Part 2. TPU Questions
      • Scaling Book Part 3 - Sharding Questions
      • Scaling Book Part 3. Sharded Matrices and How to Multiply Them
      • PEFT Review by Xu et al, Dec 2023
      • Speculative Decoding (DeepMind) by Chen et al 2023
      • Speculative Decoding (Google Brain) by Leviathan et al 2023
      • MiniTorch
      • 5 Apr, 2025 Readings
      • 7 April, 2025 Readings
      • 8 April, 2025 Readings
      • 9 April, 2025 Readings
      • 10 March, 2025 Readings
      • 11 March, 2025 Readings
      • 12 March, 2025 Readings
      • 13 March, 2025 Readings
      • 14 April, 2025 Readings
      • 14 March, 2025 Readings
      • 14 March, 2025 Readings
      • 16 March, 2025 Readings
      • 17 April, 2025 Readings
      • 17 March, 2025 Readings
      • 18 April, 2025 Readings
      • 18 March, 2025 Readings
      • 19 March, 2025 Readings
      • 20 March, 2025 Readings
      • 21 April, 2025 Readings
      • 21 March, 2025 Readings
      • 22 April, 2025 Readings
      • 22 March, 2025 Readings
      • 24 April, 2025 Readings
      • 24 March, 2025 Readings
      • 25 April, 2025 Readings
      • 25 March, 2025 Readings
      • 26 March, 2025 Readings
      • 27 April, 2025 Readings
      • 27 March, 2025 Readings
      • 28 March, 2025 Readings
      • 29 March, 2025 Readings
      • 30 March, 2025 Readings
      • 31 March, 2025 Readings
        • CUDAvMetal
        • Optimizing a Metal Matmul kernel for 5TFLOPs+ performance on my M1 Max
      • Machine Learning Compilation Course
    Home

    ❯

    Reading List

    ❯

    14 March, 2025 Readings

    14 March, 2025 Readings

    Mar 13, 20251 min read

    • CUDA

    Courses

    • MiniTorch Module 3 parts 3 and 4

    • LLM System lecture 1a: https://llmsystem.github.io/llmsystem2025spring/assets/files/llmsys-01-intro-494cf8038c6bbd0cbef448899ef34864.pdf

    • LLM System lecture 1b: https://llmsystem.github.io/llmsystem2025spring/assets/files/llmsys-02-gpu-programming-5fba63d213cdb0da3a74246309497470.pdf

    Tutorials

    • Using Shared Memory in CUDA C/C++ https://developer.nvidia.com/blog/using-shared-memory-cuda-cc/

    Exploratory

    • NUMBA Writing CUDA kernels https://numba.readthedocs.io/en/stable/cuda/kernels.html
    • NUMBA CUDA Memory Management https://numba.readthedocs.io/en/stable/cuda/memory.html
    • NUMBA CUDA examples for vector addition, sum reduction and matmul https://numba.readthedocs.io/en/stable/cuda/examples.html

    Random

    https://forum.obsidian.md/t/a-guide-on-links-vs-tags-in-obsidian/28231/2


    Graph View

    • Courses
    • Tutorials
    • Exploratory
    • Random

    Created with Quartz v4.4.0 © 2025

    • GitHub
    • Discord Community