-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Factorized linear supports implementation switch and gradient checkpoint #26
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @JeremieMelo, this looks great, just made a few suggestions on the API.
n_layers : int, default is 1 | ||
number of linear layers to be parametrized with a single factorized tensor | ||
bias : bool, default is True | ||
with_cp : bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just call it checkpointing
to be more explicit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in the newest commit.
device : PyTorch device to use, default is None | ||
dtype : PyTorch dtype, default is None | ||
""" | ||
def __init__(self, in_tensorized_features, out_tensorized_features, bias=True, | ||
factorization='cp', rank='same', n_layers=1, device=None, dtype=None): | ||
factorization='cp', rank='same', implementation='factorized', n_layers=1, | ||
with_cp=False, device=None, dtype=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of with_cp
, we can use checkpointing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated in the newest commit.
Sure, could you create a suggested change to replace all 'with_cp' to 'checkpointing', so I can directly add it to the batched commit. Thanks. Co-authored-by: Jean Kossaifi <jean.kossaifi@gmail.com>
with_cp to checkpointing, move weight out of _inner_forward
Looks good, thanks @JeremieMelo, merging! |
support implementation switches between factorized and reconstructed
gradient checkpointing for memory-efficient training-mode forward function.