-
Notifications
You must be signed in to change notification settings - Fork 614
Fix bugged gradients when combiner == 'MEAN' #2505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Wait hang on, I'm dumb. Fixing the PR! |
weights = tf.ones_like(indices, dtype=params.dtype) / tf.cast( | ||
tf.shape(indices)[1], params.dtype | ||
) | ||
combiner = "sum" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we have this combiner overriding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a quick workaround, but the output and grads are correct, and performance is fine! Since combiner == "mean" does not support weights, we can get the right results by creating a dummy weight array (we do that anyway), then scaling it so that we get an unweighted mean instead of an unweighted sum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you should call tf.reduce_mean
after custom op instead of integrating with it or combiner overriding.
The GPU test is failing. |
lol. This one is just a simple issue - the tolerances for the equality comparison are too tight for the float16 tests. Give me a sec and I'll fix it. |
Combiner is not used in addons/tensorflow_addons/custom_ops/layers/cc/kernels/embedding_bag_backward_kernels.cu.cc Line 151 in 97eb293
, which is different with cpu custom op. Maybe you should call tf.reduce_mean after custom op instead of integrating with it.
|
@@ -49,8 +49,13 @@ def _embedding_bag( | |||
Returns: | |||
A `Tensor` of the format specified by `data_format`. | |||
""" | |||
if weights is None: | |||
if weights is None and combiner == "sum": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use reduction instead of combiner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The combiner is combining part of the layer and applies to the inputs/weights, not the output/loss! Is reduction still the right approach there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add combiner
document?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to merge this as CI is broken without it. If there are any issues can we follow up in next PR. Cursory glance LGTM.
I can't see ubuntu gpu CI report in the PR, could you check it? |
Yeah landed on master (Shown by badge on central README): https://source.cloud.google.com/results/invocations/5a171fa3-3bad-4859-9a05-414e4c155851 |
How could I check it in the review process? Should I record the url? I could get the info in previous PRs. |
Hi, thanks for merging, and sorry for my slow replies! I've only been able to get to this at the weekends, but I am committed to maintaining it - it's unfortunate that we had to rush because of the broken CI, but I do think this PR should resolve most of the issues, and I see the follow-up PR is already open! |
Appreciate the maintainership @Rocketknight1 ! Its volunteer work so no need to apologize for slow responses :) |
Fixes #2502