MiniTorch

Notes from the Minitorch project which involved implementing a small PyTorch style library from scratch.

Module 0

The first assignment involved parallelizing (on the CPU) the tensor map, reduce, zip functions implemented the last module with Numba
- This did not involve a lot of change, except making sure the helper functions for broadcasting do not contain loops where the value of variables depends on earlier loop iterations. Besides that, I pretty much just replaced range with prange from Numba
The second assignment was on implementing broadcasted matrix multiplication, parallelized on the CPU with Numba
- The only thing to really take care was the broadcasting dimensions (since the last two dimensions of the matrices have to be consistent with matmul not with broadcasting)
- Besides that, I implemented a simple naive matmul making sure to follow the Numba restrictions
- Did not bother optimizing the matmul since I wanted to move onto the CUDA part
Third assignment: CUDA map, zip, reduce_sum, reduce
- Got stuck on map and zip much longer than I should have simply due to a fundamental mistake, not checking if item global index was out of bound for the size.
Fourth assignment: matmul
- Implemented the naive one based on the tiled approach from PMPP/Numba examples
- For the broadcasted one