Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A bit-flip in the ECC-protected register file can go unnoticed when writeback forwarding is enabled #2188

Closed
viniul opened this issue Jul 8, 2024 · 0 comments
Labels

Comments

@viniul
Copy link

viniul commented Jul 8, 2024

Summary

The register file can be protected through ECC. In case this ECC detects an error, an alert signal is raised, and execution is aborted. However, the ECC signal is masked (ignored) in case writeback forwarding is enabled, see ibex_core.sv:919:

    // Calculate errors - qualify with WB forwarding to avoid xprop into the alert signal
    assign rf_ecc_err_a_id = |rf_ecc_err_a & rf_ren_a & ~rf_rd_a_wb_match;

This masking ignores the fact that sometimes, the core uses the value from the register file, even when writeback forwarding is enabled -- namely when encountering data hazards.

As a result, a bit-flip in the register file can propagate to the rest of the core, even with ECC in the register file being enabled.

This issue was found by Konrad Jansen, Niklas Kühnapfel, Peter William Deutsch, and Vincent Ulitzsch.

Issue Details

Consider this waveform, in particular the signals listed in this screenshot:
grafik

Consider these two instructions which are about to be executed in the attached screenshot of the waveform.

168:  fec42703               lw     a4,-20(s0)
16c:  fcf740e3               blt    a4,a5,12c <byteArrayCompare+0x24>

The branch instruction needs to wait for the load instruction to finish -- therefore the core is encountering a data hazard. A fault injected in the register file (in register a4), will propagate and disrupt execution, but remains undetected by the ECC. In the attached waveform, the fault is injected at 553 ps. To see why the fault does not trigger an alert, let us look at the waveform:

Consider the signals debug_pc_id (pc decode) and debug_pc_wb. At 550, the load instruction is in the writeback stage and the branch instruction is in the decode + execute stage. The load instruction will cause a stall in the pipeline, see ibex_decoder, line 1003

    // If instruction is reading register that load will be writing stall in
    // ID until load is complete. No need to stall when reading zero register.
    assign rf_rd_a_hz = rf_rd_a_wb_match & rf_ren_a;
    assign stall_ld_hz = outstanding_load_wb_i & (rf_rd_a_hz | rf_rd_b_hz);

We therefore stall for one cycle to get the load.

This load happens one cycle later (552 ps). The rf_write_wb signal is set to low, therefore setting

assign rf_rdata_a_fwd = rf_rd_a_wb_match & rf_write_wb_i ? rf_wdata_fwd_wb_i : rf_rdata_a_i;

to not use the value from the writeback stage, but from the rf_rdata_a_i value, see ibex_id_stage, line 1006. However, the rf_rd_a_wb_match signal is not set to low, thereby disabling the ECC signal:

assign rf_ecc_err_a_id = |rf_ecc_err_a & rf_ren_a & ~rf_rd_a_wb_match;

The branch instruction from the decode stage therefore receives the value from the register file, although the rf_rd_a_wb_match signal is still set to high. Because the ECC signal is set to low, our fault goes unnoticed. In the example waveform, the injected fault at 553 ps (in the register file holding register a4) disrupts the value of alu_operand_a_ex, thereby changing the result of branch (debug_branch_decision), but because rf_rd_a_wb_match is still enabled, the ECC does not trigger.

Fix

Only mask the ECC signal in case writeback forwarding is enabled:

    // Calculate errors - qualify with WB forwarding to avoid xprop into the alert signal
    assign rf_ecc_err_a_id = |rf_ecc_err_a & rf_ren_a & (~rf_rd_a_wb_match | ~rf_write_wb);
    assign rf_ecc_err_b_id = |rf_ecc_err_b & rf_ren_b & (~rf_rd_b_wb_match | ~rf_write_wb);

@viniul viniul added the Type:Bug Bugs label Jul 8, 2024
GregAC added a commit to GregAC/ibex that referenced this issue Jul 15, 2024
Under certain circumstances Ibex ignored the ECC check from the register
file when it should not have. This fixes the issue.

Fixes lowRISC#2188
GregAC added a commit to GregAC/ibex that referenced this issue Jul 15, 2024
Under certain circumstances Ibex ignored the ECC check from the register
file when it should not have. This fixes the issue.

Fixes lowRISC#2188
@GregAC GregAC closed this as completed in 9e4a950 Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant