From 0fa54119101693a9670972a3c27657d2ee1c59d1 Mon Sep 17 00:00:00 2001 From: DavideGianessi <118054693+DavideGianessi@users.noreply.github.com> Date: Sat, 12 Nov 2022 12:49:11 +0100 Subject: [PATCH] typo --- content/english/hpc/number-theory/montgomery.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/content/english/hpc/number-theory/montgomery.md b/content/english/hpc/number-theory/montgomery.md index 669e39ba..0eeef0b0 100644 --- a/content/english/hpc/number-theory/montgomery.md +++ b/content/english/hpc/number-theory/montgomery.md @@ -1,6 +1,7 @@ --- title: Montgomery Multiplication weight: 4 +published: true --- Unsurprisingly, a large fraction of computation in [modular arithmetic](../modular) is often spent on calculating the modulo operation, which is as slow as [general integer division](/hpc/arithmetic/division/) and typically takes 15-20 cycles, depending on the operand size. @@ -287,6 +288,6 @@ int inverse(int _a) { } ``` -While vanilla binary exponentiation with a compiler-generated fast modulo trick requires ~170ns per `inverse` call, this implementation takes ~166ns, going down to ~158s we omit `transform` and `reduce` (a reasonable use case is for `inverse` to be used as a subprocedure in a bigger modular computation). This is a small improvement, but Montgomery multiplication becomes much more advantageous for SIMD applications and larger data types. +While vanilla binary exponentiation with a compiler-generated fast modulo trick requires ~170ns per `inverse` call, this implementation takes ~166ns, going down to ~158ns we omit `transform` and `reduce` (a reasonable use case is for `inverse` to be used as a subprocedure in a bigger modular computation). This is a small improvement, but Montgomery multiplication becomes much more advantageous for SIMD applications and larger data types. **Exercise.** Implement efficient *modular* [matix multiplication](/hpc/algorithms/matmul).