Fixed misc mixed precision issues #805

mattdangerw · 2023-08-28T18:43:45Z

The main goal of this is to add dtype support for the MHA layer. E.g. get MultHeadAttention(dtype="mixed_float16") to work correctly (it does not today).

But tried out adding a mixed precision test for all layers, which caught some other bugs.

mattdangerw · 2023-08-28T18:52:41Z

keras_core/layers/layer.py

-                if self.autocast and self.compute_dtype != self.variable_dtype:
-                    # For mixed precision, we automatically cast layer variables
-                    # (float ones only) to the compute dtype upon access.
+                if self.autocast and backend.is_float_dtype(self.compute_dtype):


I am unsure about this! Let me know what you think.

The issue with the previous logic is it would not handle nesting well. Say you have a layer with dypte="mixed_float16" which contains a dense layer with dtype=float32". The autocast scope is currently never cleared, so the inner layer would see all variables cast to a lower precision than the inner compute dtype, and you would just get an error when trying to run the dense call graph.

However, this logic will end up stacking more autocast scopes (doesn't seem like a big deal), and more aggressively applying the autocast logic (maybe a big deal?). Seems clean to always autocast if self.autocast = True, but not sure of all the implications.

We should seek to avoid creating an AutocastScope if unnecessary because of the eager performance overhead (which matters specifically for torch).

To handle nesting I guess we can further complicate the logic, something like

if self.autocast and backend.is_float_dtype(self.compute_dtype): autocast_scope = get_autocast_scope() if (autocast_scope and autocast_scope.dtype != self.compute_dtype) or (not autocast_scope and self.compute_dtype != self.variable_dtype): with backend.AutocastScope(self.compute_dtype):

WDYT? Too much? Is the performance cost of many AutocastScopes low enough we can go with your proposal?

Sounds good. If we want to be fully correct, I think we actually need to add the concept of an empty scope. E.g. if layer with autocast = False is inside a model, we actually need to "clear the scope" which will have been set.

mattdangerw · 2023-08-28T18:56:32Z

Interesting, looks like a lot of missing support on torch for float16 that I wasn't seeing locally. Will need to take a look.

fchollet

Thanks for the PR!

fchollet · 2023-08-28T20:58:12Z

keras_core/layers/layer.py

-                if self.autocast and self.compute_dtype != self.variable_dtype:
-                    # For mixed precision, we automatically cast layer variables
-                    # (float ones only) to the compute dtype upon access.
+                if self.autocast and backend.is_float_dtype(self.compute_dtype):


We should seek to avoid creating an AutocastScope if unnecessary because of the eager performance overhead (which matters specifically for torch).

To handle nesting I guess we can further complicate the logic, something like

if self.autocast and backend.is_float_dtype(self.compute_dtype): autocast_scope = get_autocast_scope() if (autocast_scope and autocast_scope.dtype != self.compute_dtype) or (not autocast_scope and self.compute_dtype != self.variable_dtype): with backend.AutocastScope(self.compute_dtype):

WDYT? Too much? Is the performance cost of many AutocastScopes low enough we can go with your proposal?

mattdangerw · 2023-08-29T20:44:41Z

Another question here. Looks like torch + half + cpu just doesn't really work do to missing ops. Should we

Just let these error persist so mixed_precision.set_global_policy('mixed_float16') will cause misc errors on torch CPU? E.g. relu is broken.
Detect and log a warning?
Detect and use full precision instead?

fchollet · 2023-08-29T21:52:15Z

Let's skip the broken tests with torch.

mattdangerw · 2023-08-29T23:31:51Z

ready i think!

fchollet

LGTM, thank you!

Eventually, I want to add some dtype changes similar to -> keras-team/keras-core#805 But the nice for that PR on keras-core was I could add dtype test to a common harness and test all layers. So I think it's finally time to bite the bullet and add these for keras-nlp. This ports and simplifies the `run_layer_test` code from keras-core and applies it to our modeling layers. I am ditching the saved model tests for our individual layers, with the idea being that saved model tests are slow, and we get fairly robust serialization tests now without saved model. If this is good enough for keras-core layers, I think we can follow suit here. We still test saving end to end through our modeling tests.

* Add a test harness based on keras-core's `run_layer_test` Eventually, I want to add some dtype changes similar to -> keras-team/keras-core#805 But the nice for that PR on keras-core was I could add dtype test to a common harness and test all layers. So I think it's finally time to bite the bullet and add these for keras-nlp. This ports and simplifies the `run_layer_test` code from keras-core and applies it to our modeling layers. I am ditching the saved model tests for our individual layers, with the idea being that saved model tests are slow, and we get fairly robust serialization tests now without saved model. If this is good enough for keras-core layers, I think we can follow suit here. We still test saving end to end through our modeling tests. * Fix tests * Address comments

Fixed misc mixed precision issues

a4e0fd9

mattdangerw requested a review from fchollet August 28, 2023 18:43

mattdangerw commented Aug 28, 2023

View reviewed changes

Fix numpy backend

1e4b8dc

fchollet reviewed Aug 28, 2023

View reviewed changes

Address comments

774a65c

fchollet approved these changes Aug 30, 2023

View reviewed changes

fchollet merged commit 7afe233 into keras-team:main Aug 30, 2023

mattdangerw mentioned this pull request Sep 8, 2023

Add a test harness based on keras-core's run_layer_test keras-team/keras-hub#1238

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed misc mixed precision issues #805

Fixed misc mixed precision issues #805

mattdangerw commented Aug 28, 2023

mattdangerw Aug 28, 2023

fchollet Aug 28, 2023

mattdangerw Aug 29, 2023

mattdangerw commented Aug 28, 2023

fchollet left a comment

fchollet Aug 28, 2023

mattdangerw commented Aug 29, 2023

fchollet commented Aug 29, 2023

mattdangerw commented Aug 29, 2023

fchollet left a comment

Fixed misc mixed precision issues #805

Fixed misc mixed precision issues #805

Conversation

mattdangerw commented Aug 28, 2023

mattdangerw Aug 28, 2023

Choose a reason for hiding this comment

fchollet Aug 28, 2023

Choose a reason for hiding this comment

mattdangerw Aug 29, 2023

Choose a reason for hiding this comment

mattdangerw commented Aug 28, 2023

fchollet left a comment

Choose a reason for hiding this comment

fchollet Aug 28, 2023

Choose a reason for hiding this comment

mattdangerw commented Aug 29, 2023

fchollet commented Aug 29, 2023

mattdangerw commented Aug 29, 2023

fchollet left a comment

Choose a reason for hiding this comment