mat-mat-ops

Custom C++ and CUDA operators for Matrix-Matrix Operations in PyTorch. Here I implemented Shared Memory Cache-Blocking and Block-tiling for both forward and backward kernels.

If you want to know how to write your own custom kernel, this PyTorch offical tutorial is all you need :)

Requirements:

CUDA Toolkit 12.4 PyTorch 2.4+

Supported operations so far:

Mat-Mat Mul
Mat-Mat L1

To build:

pip install .

To test:

the interactive option : test/test.ipynb or

python test/test_extension.py

Author

Mehdi Moshtaghi

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
mat_mat_ops		mat_mat_ops
test		test
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mat-mat-ops

Requirements:

Supported operations so far:

To build:

To test:

Author

About

Releases

Packages

Languages

License

MMoshtaghi/mat-mat-ops

Folders and files

Latest commit

History

Repository files navigation

mat-mat-ops

Requirements:

Supported operations so far:

To build:

To test:

Author

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages