-
Notifications
You must be signed in to change notification settings - Fork 405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify f32-gemm for HVX #7918
Modify f32-gemm for HVX #7918
Conversation
ejparkqc
commented
Feb 28, 2025
•
edited
Loading
edited
- Use portable API in possible places.
- Add new kernel (e.g., try mr=7, nr=32).
- Add all f32-gemm kernels in f32-gemm-minmax-bench.
- Convert scalar tail code to vector version.
src/configs/gemm-config.c
Outdated
#elif XNN_ARCH_HEXAGON && XNN_ENABLE_HVX | ||
const struct xnn_hardware_config* hardware_config = xnn_init_hardware_config(); | ||
assert(hardware_config != NULL); | ||
(void) hardware_config; // May be unused. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its used on the next line :-). remove
src/configs/gemm-config.c
Outdated
(void) hardware_config; // May be unused. | ||
if (hardware_config->use_hvx) { | ||
f32_gemm_config.minmax.gemm[XNN_MR_TO_INDEX(1)] = xnn_init_hmp_gemm_ukernel((xnn_gemm_ukernel_fn) xnn_f32_gemm_minmax_ukernel_1x4__scalar); | ||
f32_gemm_config.minmax.gemm[XNN_MR_TO_INDEX(4)] = xnn_init_hmp_gemm_ukernel((xnn_gemm_ukernel_fn) xnn_f32_gemm_minmax_ukernel_4x4__scalar); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4x4 scalar? shouldnt this be using hvx?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed, I removed this file for now. We will add the configuration back when we know which implementation we want to use.
#include "xnnpack/gemm.h" | ||
#include "xnnpack/intrinsics-polyfill.h" | ||
|
||
$RANGE_MR = list(range(MR)) | ||
void xnn_f32_gemm_minmax_ukernel_${MR}x${NR}__hvx_broadcast( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename? Or bring back min/max below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeap it supposed to be min/max, but removed the clamping code by accident. I added back them!
|
||
k -= sizeof(float); | ||
} while (k != 0); | ||
|
||
const HVX_Vector vmin = Q6_V_vsplat_R(params->scalar.min); | ||
XNN_SIMD_CONST_F32(vmin, params->scalar.min); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use xnn_max_f32 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
- Use portable API - Convert scalar tail code to vector tail code - Add benchmark
- Add new kernel into benchmark - Remove generated linear kernels that we are not going to use
We will add this change back when we know the kernel version we want to use.
-- 93a92d7 by EJ Park <eunjpark@quicinc.com>: Modify f32-gemm - Use portable API - Convert scalar tail code to vector tail code - Add benchmark -- ec31037 by EJ Park <eunjpark@quicinc.com>: Add linear kernel test -- 396ed74 by EJ Park <eunjpark@quicinc.com>: f32-gemm minmax with portable API -- 90638a9 by EJ Park <eunjpark@quicinc.com>: More changes - Add new kernel into benchmark - Remove generated linear kernels that we are not going to use -- 28b8d98 by EJ Park <eunjpark@quicinc.com>: Revert changes in benchmark.cc -- 95e8bff by EJ Park <eunjpark@quicinc.com>: Revert changes in gemm-config We will add this change back when we know the kernel version we want to use. -- 52c6c02 by EJ Park <eunjpark@quicinc.com>: Put clamping code back for min/max -- 3117fb7 by EJ Park <eunjpark@quicinc.com>: use min/max from portablt API -- c79744d by EJ Park <eunjpark@quicinc.com>: Change to use xnn_set1_f32 -- bfbab46 by EJ Park <eunjpark@quicinc.com>: Change to use xnn_f32_set1 for vmin and vmax FUTURE_COPYBARA_INTEGRATE_REVIEW=#7918 from ejparkqc:wip-f32attn bfbab46 PiperOrigin-RevId: 733923897