Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify f32-gemm for HVX #7918

Merged
merged 10 commits into from
Mar 11, 2025
Merged

Modify f32-gemm for HVX #7918

merged 10 commits into from
Mar 11, 2025

Conversation

ejparkqc
Copy link
Contributor

@ejparkqc ejparkqc commented Feb 28, 2025

  1. Use portable API in possible places.
  2. Add new kernel (e.g., try mr=7, nr=32).
  3. Add all f32-gemm kernels in f32-gemm-minmax-bench.
  4. Convert scalar tail code to vector version.

#elif XNN_ARCH_HEXAGON && XNN_ENABLE_HVX
const struct xnn_hardware_config* hardware_config = xnn_init_hardware_config();
assert(hardware_config != NULL);
(void) hardware_config; // May be unused.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its used on the next line :-). remove

(void) hardware_config; // May be unused.
if (hardware_config->use_hvx) {
f32_gemm_config.minmax.gemm[XNN_MR_TO_INDEX(1)] = xnn_init_hmp_gemm_ukernel((xnn_gemm_ukernel_fn) xnn_f32_gemm_minmax_ukernel_1x4__scalar);
f32_gemm_config.minmax.gemm[XNN_MR_TO_INDEX(4)] = xnn_init_hmp_gemm_ukernel((xnn_gemm_ukernel_fn) xnn_f32_gemm_minmax_ukernel_4x4__scalar);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4x4 scalar? shouldnt this be using hvx?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed, I removed this file for now. We will add the configuration back when we know which implementation we want to use.

#include "xnnpack/gemm.h"
#include "xnnpack/intrinsics-polyfill.h"

$RANGE_MR = list(range(MR))
void xnn_f32_gemm_minmax_ukernel_${MR}x${NR}__hvx_broadcast(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename? Or bring back min/max below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap it supposed to be min/max, but removed the clamping code by accident. I added back them!


k -= sizeof(float);
} while (k != 0);

const HVX_Vector vmin = Q6_V_vsplat_R(params->scalar.min);
XNN_SIMD_CONST_F32(vmin, params->scalar.min);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use xnn_max_f32 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

ejparkqc added 9 commits March 5, 2025 11:02
- Use portable API
- Convert scalar tail code to vector tail code
- Add benchmark
- Add new kernel into benchmark
- Remove generated linear kernels that we are not going to use
We will add this change back when we know the kernel version we want to use.
copybara-service bot pushed a commit that referenced this pull request Mar 11, 2025
--
93a92d7 by EJ Park <eunjpark@quicinc.com>:

Modify f32-gemm

- Use portable API
- Convert scalar tail code to vector tail code
- Add benchmark

--
ec31037 by EJ Park <eunjpark@quicinc.com>:

Add linear kernel test

--
396ed74 by EJ Park <eunjpark@quicinc.com>:

f32-gemm minmax with portable API

--
90638a9 by EJ Park <eunjpark@quicinc.com>:

More changes

- Add new kernel into benchmark
- Remove generated linear kernels that we are not going to use

--
28b8d98 by EJ Park <eunjpark@quicinc.com>:

Revert changes in benchmark.cc

--
95e8bff by EJ Park <eunjpark@quicinc.com>:

Revert changes in gemm-config

We will add this change back when we know the kernel version we want to use.

--
52c6c02 by EJ Park <eunjpark@quicinc.com>:

Put clamping code back for min/max

--
3117fb7 by EJ Park <eunjpark@quicinc.com>:

use min/max from portablt API

--
c79744d by EJ Park <eunjpark@quicinc.com>:

Change to use xnn_set1_f32

--
bfbab46 by EJ Park <eunjpark@quicinc.com>:

Change to use xnn_f32_set1 for vmin and vmax

FUTURE_COPYBARA_INTEGRATE_REVIEW=#7918 from ejparkqc:wip-f32attn bfbab46
PiperOrigin-RevId: 733923897
@copybara-service copybara-service bot merged commit f7584e8 into google:master Mar 11, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants