Notes from the Minitorch project which involved implementing a small PyTorch style library from scratch.

Module 0

Module 1

Module 2

Module 3

  • The first assignment involved parallelizing (on the CPU) the tensor map, reduce, zip functions implemented the last module with Numba
    • This did not involve a lot of change, except making sure the helper functions for broadcasting do not contain loops where the value of variables depends on earlier loop iterations. Besides that, I pretty much just replaced range with prange from Numba
  • The second assignment was on implementing broadcasted matrix multiplication, parallelized on the CPU with Numba
    • The only thing to really take care was the broadcasting dimensions (since the last two dimensions of the matrices have to be consistent with matmul not with broadcasting)
    • Besides that, I implemented a simple naive matmul making sure to follow the Numba restrictions
    • Did not bother optimizing the matmul since I wanted to move onto the CUDA part
  • Third assignment: CUDA map, zip, reduce_sum, reduce
    • Got stuck on map and zip much longer than I should have simply due to a fundamental mistake, not checking if item global index was out of bound for the size.
  • Fourth assignment: matmul
    • Implemented the naive one based on the tiled approach from PMPP/Numba examples
    • For the broadcasted one