Skip to content

Custom C++ and CUDA operators for Matrix-Matrix Operations in PyTorch

License

Notifications You must be signed in to change notification settings

MMoshtaghi/mat-mat-ops

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mat-mat-ops

Custom C++ and CUDA operators for Matrix-Matrix Operations in PyTorch. Here I implemented Shared Memory Cache-Blocking and Block-tiling for both forward and backward kernels.

If you want to know how to write your own custom kernel, this PyTorch offical tutorial is all you need :)

Requirements:

CUDA Toolkit 12.4 PyTorch 2.4+

Supported operations so far:

  • Mat-Mat Mul
  • Mat-Mat L1

To build:

pip install .

To test:

the interactive option : test/test.ipynb or

python test/test_extension.py

Author

Mehdi Moshtaghi

About

Custom C++ and CUDA operators for Matrix-Matrix Operations in PyTorch

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published