Shashank Shekhar

      • AllGather
      • AllReduce
      • AllToAll
      • Bandwidth
      • BFloat16 (Brain Floating Point)
      • Compute-Bound
      • FLOPs (Floating Point Operations)
      • FLOPs (Floating Point Operations)
      • GPU (Graphics Processing Unit)
      • Memory-Bound
      • MXU (Matrix Multiply Unit)
      • Numba
      • ReduceScatter
      • Roofline Model
      • Sharding
      • Systolic Array
      • TPU (Tensor Processing Unit)
      • VMEM (Vector Memory)
        • Chapter 2 Heterogeneous Data Parallel Computing Exercise Solutions
        • Chapter 3 Multidimensional Grids and Data Exercise Solutions
        • Chapter 4 Compute Architecture and Scheduling Exercise Solutions
        • Chapter 5 Memory Architecture and Data Locality Exercise Solutions
        • Chapter 6 Performance Considerations Exercise Solutions
        • Chapter 7 Convolution Exercise Solutions
        • Chapter 1 Introduction Notes
        • Chapter 2 Heterogeneous Data Parallel Computing Notes
        • Chapter 3 Multidimensional Grids and Data Notes
        • Chapter 4 Compute Architecture and Scheduling Notes
        • Chapter 5 Memory Architecture and Data Locality Notes
      • Drawing 2025-03-28 13.46.45.excalidraw
      • Scaling Book Part 0. Introductions
      • Scaling Book Part 1. All About Rooflines
      • Scaling Book Part 1. All About Rooflines Exercises
      • Scaling Book Part 2. How to Think About TPUs
      • Scaling Book Part 2. TPU Questions
      • Scaling Book Part 3 - Sharding Questions
      • Scaling Book Part 3. Sharded Matrices and How to Multiply Them
      • PEFT Review by Xu et al, Dec 2023
      • Speculative Decoding (DeepMind) by Chen et al 2023
      • Speculative Decoding (Google Brain) by Leviathan et al 2023
      • MiniTorch
      • 5 Apr, 2025 Readings
      • 7 April, 2025 Readings
      • 8 April, 2025 Readings
      • 9 April, 2025 Readings
      • 10 March, 2025 Readings
      • 11 March, 2025 Readings
      • 12 March, 2025 Readings
      • 13 March, 2025 Readings
      • 14 April, 2025 Readings
      • 14 March, 2025 Readings
      • 14 March, 2025 Readings
      • 16 March, 2025 Readings
      • 17 April, 2025 Readings
      • 17 March, 2025 Readings
      • 18 April, 2025 Readings
      • 18 March, 2025 Readings
      • 19 March, 2025 Readings
      • 20 March, 2025 Readings
      • 21 April, 2025 Readings
      • 21 March, 2025 Readings
      • 22 April, 2025 Readings
      • 22 March, 2025 Readings
      • 24 April, 2025 Readings
      • 24 March, 2025 Readings
      • 25 April, 2025 Readings
      • 25 March, 2025 Readings
      • 26 March, 2025 Readings
      • 27 April, 2025 Readings
      • 27 March, 2025 Readings
      • 28 March, 2025 Readings
      • 29 March, 2025 Readings
      • 30 March, 2025 Readings
      • 31 March, 2025 Readings
        • CUDAvMetal
        • Optimizing a Metal Matmul kernel for 5TFLOPs+ performance on my M1 Max
      • Machine Learning Compilation Course
    Home

    ❯

    Reading List

    ❯

    29 March, 2025 Readings

    29 March, 2025 Readings

    Mar 29, 20251 min read

    Exploratory

    Lots of exploratory reading on GPU architecture, in particular Apple Silicon GPU architecture. References included in article published on blog:

    https://www.shashankshekhar.com/blog/apple-metal-vs-nvidia-cuda


    Graph View

    Created with Quartz v4.4.0 © 2025

    • GitHub
    • Discord Community