Skip to content

Commit ca8189d

Browse files
authored
Merge pull request #191 from DavideGianessi/prose-patch-1
typo
2 parents 32486ce + 0fa5411 commit ca8189d

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

content/english/hpc/number-theory/montgomery.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
---
22
title: Montgomery Multiplication
33
weight: 4
4+
published: true
45
---
56

67
Unsurprisingly, a large fraction of computation in [modular arithmetic](../modular) is often spent on calculating the modulo operation, which is as slow as [general integer division](/hpc/arithmetic/division/) and typically takes 15-20 cycles, depending on the operand size.
@@ -287,6 +288,6 @@ int inverse(int _a) {
287288
}
288289
```
289290
290-
While vanilla binary exponentiation with a compiler-generated fast modulo trick requires ~170ns per `inverse` call, this implementation takes ~166ns, going down to ~158s we omit `transform` and `reduce` (a reasonable use case is for `inverse` to be used as a subprocedure in a bigger modular computation). This is a small improvement, but Montgomery multiplication becomes much more advantageous for SIMD applications and larger data types.
291+
While vanilla binary exponentiation with a compiler-generated fast modulo trick requires ~170ns per `inverse` call, this implementation takes ~166ns, going down to ~158ns we omit `transform` and `reduce` (a reasonable use case is for `inverse` to be used as a subprocedure in a bigger modular computation). This is a small improvement, but Montgomery multiplication becomes much more advantageous for SIMD applications and larger data types.
291292
292293
**Exercise.** Implement efficient *modular* [matix multiplication](/hpc/algorithms/matmul).

0 commit comments

Comments
 (0)