Shashank Shekhar

      • AllGather
      • AllReduce
      • AllToAll
      • Bandwidth
      • BFloat16 (Brain Floating Point)
      • Compute-Bound
      • FLOPs (Floating Point Operations)
      • FLOPs (Floating Point Operations)
      • GPU (Graphics Processing Unit)
      • Memory-Bound
      • MXU (Matrix Multiply Unit)
      • Numba
      • ReduceScatter
      • Roofline Model
      • Sharding
      • Systolic Array
      • TPU (Tensor Processing Unit)
      • VMEM (Vector Memory)
        • Chapter 2 Heterogeneous Data Parallel Computing Exercise Solutions
        • Chapter 3 Multidimensional Grids and Data Exercise Solutions
        • Chapter 4 Compute Architecture and Scheduling Exercise Solutions
        • Chapter 5 Memory Architecture and Data Locality Exercise Solutions
        • Chapter 6 Performance Considerations Exercise Solutions
        • Chapter 7 Convolution Exercise Solutions
        • Chapter 1 Introduction Notes
        • Chapter 2 Heterogeneous Data Parallel Computing Notes
        • Chapter 3 Multidimensional Grids and Data Notes
        • Chapter 4 Compute Architecture and Scheduling Notes
        • Chapter 5 Memory Architecture and Data Locality Notes
      • Drawing 2025-03-28 13.46.45.excalidraw
      • Scaling Book Part 0. Introductions
      • Scaling Book Part 1. All About Rooflines
      • Scaling Book Part 1. All About Rooflines Exercises
      • Scaling Book Part 2. How to Think About TPUs
      • Scaling Book Part 2. TPU Questions
      • Scaling Book Part 3 - Sharding Questions
      • Scaling Book Part 3. Sharded Matrices and How to Multiply Them
      • PEFT Review by Xu et al, Dec 2023
      • Speculative Decoding (DeepMind) by Chen et al 2023
      • Speculative Decoding (Google Brain) by Leviathan et al 2023
      • MiniTorch
      • 5 Apr, 2025 Readings
      • 7 April, 2025 Readings
      • 8 April, 2025 Readings
      • 9 April, 2025 Readings
      • 10 March, 2025 Readings
      • 11 March, 2025 Readings
      • 12 March, 2025 Readings
      • 13 March, 2025 Readings
      • 14 April, 2025 Readings
      • 14 March, 2025 Readings
      • 14 March, 2025 Readings
      • 16 March, 2025 Readings
      • 17 April, 2025 Readings
      • 17 March, 2025 Readings
      • 18 April, 2025 Readings
      • 18 March, 2025 Readings
      • 19 March, 2025 Readings
      • 20 March, 2025 Readings
      • 21 April, 2025 Readings
      • 21 March, 2025 Readings
      • 22 April, 2025 Readings
      • 22 March, 2025 Readings
      • 24 April, 2025 Readings
      • 24 March, 2025 Readings
      • 25 April, 2025 Readings
      • 25 March, 2025 Readings
      • 26 March, 2025 Readings
      • 27 April, 2025 Readings
      • 27 March, 2025 Readings
      • 28 March, 2025 Readings
      • 29 March, 2025 Readings
      • 30 March, 2025 Readings
      • 31 March, 2025 Readings
        • CUDAvMetal
        • Optimizing a Metal Matmul kernel for 5TFLOPs+ performance on my M1 Max
      • Machine Learning Compilation Course
    Home

    ❯

    Atomics Notes

    Atomics Notes

    Short Notes about anything and everything.

    18 items under this folder.

    • May 01, 2025

      AllGather

      • Communication
      • Distributed
      • Collective
    • May 01, 2025

      AllReduce

      • Communication
      • Distributed
      • Collective
    • May 01, 2025

      AllToAll

      • Communication
      • Distributed
      • Collective
    • May 01, 2025

      FLOPs (Floating Point Operations)

      • Performance
      • Computation
      • Hardware
    • May 01, 2025

      BFloat16 (Brain Floating Point)

      • Numerical
      • Format
      • Machine-Learning
    • May 01, 2025

      Bandwidth

      • Performance
      • Hardware
      • Communication
    • May 01, 2025

      Compute-Bound

      • Performance
      • Optimization
      • Roofline
    • May 01, 2025

      FLOPs (Floating Point Operations)

      • Performance
      • Computation
      • Hardware
    • May 01, 2025

      MXU (Matrix Multiply Unit)

      • Hardware
      • TPU
      • Matrix-Multiplication
    • May 01, 2025

      GPU (Graphics Processing Unit)

      • Hardware
      • Accelerator
      • Parallel
    • May 01, 2025

      Memory-Bound

      • Performance
      • Optimization
      • Roofline
    • May 01, 2025

      ReduceScatter

      • Communication
      • Distributed
      • Collective
    • May 01, 2025

      Roofline Model

      • Performance
      • Hardware
      • Analysis
    • May 01, 2025

      Sharding

      • Distributed
      • Parallelism
      • Tensor
    • May 01, 2025

      Systolic Array

      • Hardware
      • TPU
      • Architecture
      • Matrix-Multiplication
    • May 01, 2025

      TPU (Tensor Processing Unit)

      • Hardware
      • Accelerator
      • Google
    • May 01, 2025

      VMEM (Vector Memory)

      • Hardware
      • TPU
      • Memory
    • Mar 13, 2025

      Numba

      • Numba
      • CUDA
      • Parallel

    Created with Quartz v4.4.0 © 2025

    • GitHub
    • Discord Community