Notes from the Minitorch project which involved implementing a small PyTorch style library from scratch.
Module 0
Module 1
Module 2
Module 3
- The first assignment involved parallelizing (on the CPU) the tensor
map, reduce, zipfunctions implemented the last module with Numba- This did not involve a lot of change, except making sure the helper functions for broadcasting do not contain loops where the value of variables depends on earlier loop iterations. Besides that, I pretty much just replaced
rangewithprangefrom Numba
- This did not involve a lot of change, except making sure the helper functions for broadcasting do not contain loops where the value of variables depends on earlier loop iterations. Besides that, I pretty much just replaced
- The second assignment was on implementing broadcasted matrix multiplication, parallelized on the CPU with Numba
- The only thing to really take care was the broadcasting dimensions (since the last two dimensions of the matrices have to be consistent with
matmulnot with broadcasting) - Besides that, I implemented a simple naive matmul making sure to follow the Numba restrictions
- Did not bother optimizing the matmul since I wanted to move onto the CUDA part
- The only thing to really take care was the broadcasting dimensions (since the last two dimensions of the matrices have to be consistent with
- Third assignment: CUDA
map, zip, reduce_sum, reduce- Got stuck on map and zip much longer than I should have simply due to a fundamental mistake, not checking if item global index was out of bound for the size.
- Fourth assignment:
matmul- Implemented the naive one based on the tiled approach from PMPP/Numba examples
- For the broadcasted one