Skip to content

Commit b612a4e

Browse files
committed
Add bignum_mont{mul,sqr}_p384_neon, speed improvements/refactoring in tactics
This patch adds `bignum_mont{mul,sqr}_p384_neon` which are slightly faster than `bignum_mont{mul,sqr}_p384`. They use SIMD instructions and better scheduling found with SLOTHY. Their correctness is verified using equivalence check w.r.t. specifications of their scalar ops. The new SUBROUTINE lemmas are added to the specification list using ``` ./tools/collect-specs.sh arm >arm/proofs/specifications.txt ``` Benchmark results on Graviton2: ``` bignum_montsqr_p384 : 58.6 ns each (var 0.3%, corr 0.06) = 17053295 ops/sec bignum_montsqr_p384_neon : 52.6 ns each (var 0.4%, corr -0.04) = 19017192 ops/sec bignum_montmul_p384 : 72.9 ns each (var 0.2%, corr -0.02) = 13726633 ops/sec bignum_montmul_p384_neon : 68.1 ns each (var 0.3%, corr 0.02) = 14680905 ops/sec ``` Test and benchmark were updated to include these & fix incorrect naming bugs in my previous p256_neon patch. Also, some speedups in tactics are made: 1. `ARM_STEPS'_AND_ABBREV_TAC` and `ARM_STEPS'_AND_REWRITE_TAC`. They are tactics for symbolic execution when showing equivalence of two programs after reordering instructions. `ARM_STEPS'_AND_ABBREV_TAC` does symbolic execution of the 'left' program and abbreviates every RHS of new `read comp s = RHS`s, meaning that after the tactic is done there are a bunch of equality assumptions whose number increases linearly to the number of instructions. `ARM_STEPS'_AND_REWRITE_TAC` then does symbolic execution of the 'right' program and rewrites the results using the assumptions. This means the overall complexity of `ARM_STEPS'_AND_REWRITE_TAC` was quadratic to the number of instructions (# assum * # insts = (# insts)^2). This is fixed to be (close to) linear, by separately maintaining the abbreviations as a list of theorems internally rather than assumptions. This doesn’t mean that the therotical time complexity is now linear, but many tactics inside `ARM_STEPS'_AND_REWRITE_TAC` that inspect assumptions now run linearly. 2. `FIND_HOLE_TAC` `FIND_HOLE_TAC` tactic finds the 'hole' in the memory space that can place the machine code that is used in program equivalence. This is done by inspecting `nonoverlapping` assumptions, properly segmenting the memory with fixed-width ranges and doing case analysis. Previously the # splitted cases was something like 2^((# segments)^2), but now it is reduced to (# segments)^(#segments). Comparing these two numbers is easier if logarithm is used. Finally, some lemmas in existing `_neon.ml` proofs are updated so that they do not mix usage of '*_mc' and '*_core_mc'. '*_core_mc' is a machine code that is a sub-list of '*_mc' retrieved by stripping the callee-save register store/loads as well as the ret instruction. If possible, a lemmas is updated to only use '*_core_mc' because this makes the modular usage of the lemma possible in bigger theorems.
1 parent 0a3b3f3 commit b612a4e

26 files changed

+4623
-723
lines changed

arm/Makefile

+2
Original file line numberDiff line numberDiff line change
@@ -269,8 +269,10 @@ BIGNUM_OBJ = curve25519/bignum_add_p25519.o \
269269
p384/bignum_mod_p384_6.o \
270270
p384/bignum_montmul_p384.o \
271271
p384/bignum_montmul_p384_alt.o \
272+
p384/bignum_montmul_p384_neon.o \
272273
p384/bignum_montsqr_p384.o \
273274
p384/bignum_montsqr_p384_alt.o \
275+
p384/bignum_montsqr_p384_neon.o \
274276
p384/bignum_mux_6.o \
275277
p384/bignum_neg_p384.o \
276278
p384/bignum_nonzero_6.o \

arm/allowed_asm

+2
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
: csel$
1616
: cset$
1717
: csetm$
18+
: dup$
1819
: eor$
1920
: extr$
2021
: ldp$
@@ -38,6 +39,7 @@
3839
: sbc$
3940
: sbcs$
4041
: shl$
42+
: shrn$
4143
: stp$
4244
: str$
4345
: strb$

arm/p384/Makefile

+2
Original file line numberDiff line numberDiff line change
@@ -35,8 +35,10 @@ OBJ = bignum_add_p384.o \
3535
bignum_mod_p384_6.o \
3636
bignum_montmul_p384.o \
3737
bignum_montmul_p384_alt.o \
38+
bignum_montmul_p384_neon.o \
3839
bignum_montsqr_p384.o \
3940
bignum_montsqr_p384_alt.o \
41+
bignum_montsqr_p384_neon.o \
4042
bignum_mux_6.o \
4143
bignum_neg_p384.o \
4244
bignum_nonzero_6.o \

0 commit comments

Comments
 (0)