Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flash_ctrl,dv] Lots of broken block-level tests in flash_ctrl #22879

Closed
rswarbrick opened this issue Apr 29, 2024 · 9 comments · Fixed by #24079
Closed

[flash_ctrl,dv] Lots of broken block-level tests in flash_ctrl #22879

rswarbrick opened this issue Apr 29, 2024 · 9 comments · Fixed by #24079
Assignees
Labels
Component:DV DV issue: testbench, test case, etc. Component:TestTriage IP:flash_ctrl

Comments

@rswarbrick
Copy link
Contributor

Hierarchy of regression failure

Block level

Failure Description

The nightly DV runs report lots of failing tests in flash_ctrl. They seem to have started breaking on 28th April, including the following tests:

  • flash_ctrl_access_after_disable
  • flash_ctrl_integrity
  • flash_ctrl_intr_wr
  • flash_ctrl_intr_wr_slow_flash
  • flash_ctrl_oversize_error
  • flash_ctrl_rw_derr
  • flash_ctrl_sec_cm
  • flash_ctrl_write_word_sweep
  • flash_ctrl_rw_serr
  • flash_ctrl_prog_reset
  • flash_ctrl_rw_evict
  • flash_ctrl_derr_detect
  • flash_ctrl_serr_address
  • flash_ctrl_serr_counter
  • flash_ctrl_rw_evict_all_en
  • flash_ctrl_fs_sup

@matutem: I think you landed a large flash_ctrl PR a few days ago. Would you mind taking a look at some of these: I think your PR might well be responsible for the failures.

Steps to Reproduce

Tests with similar or related failures

No response

@vogelpi
Copy link
Contributor

vogelpi commented May 21, 2024

Those failures are all due to enabling ECC more widely in block-level DV some time ago. Before doing this, the nightlies all looked good. The last RTL change was already before that.

In the meantime, @matutem has been adjusting more block-level DV sequences to let them deal with ECC errors correctly. There are still some to do but I am not to worried about this.

@vogelpi vogelpi added the Triage Priority Issue to be discussed with priority in the next triage meeting label May 21, 2024
@johngt
Copy link

johngt commented May 21, 2024

@vogelpi to update test list

@vogelpi
Copy link
Contributor

vogelpi commented May 21, 2024

The current status (May 19 nightly regression) is as follows:

  • 100 % pass rate for all tests except for
  • flash_ctrl_rw_evict - 70%
  • flash_ctrl_rw_evict_all_en - 85%
  • flash_ctrl_oversize_error - 80%
  • flash_ctrl_intr_rd - 95%
  • flash_ctrl_intr_wr - 90%
  • flash_ctrl_read_word_sweep_derr - 40%
  • flash_ctrl_rw_derr - 60%
  • flash_ctrl_integrity - 40%
  • flash_ctrl_read_word_sweep_serr - 20%
  • flash_ctrl_rw_serr - 80%
  • flash_ctrl_wo - 85%
  • flash_ctrl_rw - 95%
  • flash_ctrl_fs_sup - 80%
  • flash_ctrl_config_regwen - 80%

@johngt johngt added the Component:DV DV issue: testbench, test case, etc. label May 22, 2024
@andreaskurth
Copy link
Contributor

Discussed in triage: moving to M5 as DV issue because RTL hasn't been changed in a while, many tests passed before enabling ECC in DV, and the remaining DV fails seem to be caused by DV problems rather than RTL problems

@andreaskurth andreaskurth removed the Triage Priority Issue to be discussed with priority in the next triage meeting label May 24, 2024
@moidx
Copy link
Contributor

moidx commented Jun 21, 2024

@matutem is still pending regression results for the following test cases:

  1. flash_ctrl_read_word_sweep_derr
  2. flash_ctrl_integrity

matutem added a commit to matutem/opentitan that referenced this issue Jul 1, 2024
- Improve tracking of words with error injection so only one error of any kind
  is injected per word.
- Improve initialization of data in flash_ctrl_read_word_sweep. There is still
  room for improvements.

Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit to matutem/opentitan that referenced this issue Jul 2, 2024
- Improve tracking of words with error injection so only one error of any kind
  is injected per word.
- Improve initialization of data in flash_ctrl_read_word_sweep. There is still
  room for improvements.

Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit to matutem/opentitan that referenced this issue Jul 2, 2024
- Improve tracking of words with error injection so only one error of any kind
  is injected per word.
- Improve initialization of data in flash_ctrl_read_word_sweep. There is still
  room for improvements.

Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit that referenced this issue Jul 2, 2024
- Improve tracking of words with error injection so only one error of any kind
  is injected per word.
- Improve initialization of data in flash_ctrl_read_word_sweep. There is still
  room for improvements.

Addresses #22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
@andreaskurth
Copy link
Contributor

andreaskurth commented Jul 5, 2024

Moving this issue to M7, as just discussed in triage meeting. Those tests fail because ECC checks have recently been enabled, but flash_ctrl was already signed off at V2S without those checks. It is out of scope of M5 to bring flash_ctrl DV to a state in which all tests pass for more than 90% of the seeds with ECC checks enabled.

We think the risk for critical RTL bugs in flash_ctrl is lower than when it was signed off at V2S because DV now covers the ECC feature and most tests pass. Also, flash_ctrl is getting exercised in almost all top-level tests as well as in almost all SiVal tests, and no critical bugs have been found.

@vogelpi
Copy link
Contributor

vogelpi commented Jul 8, 2024

Based on last week's discussion, I decided it was useful to document the history of this and the current state in terms of regression failures and how all this aligns with the V2S signoff, as well as with the RTL and DV changes done for Earlgrey-PROD.

Below, you can see an overview of the regressions results of the last couple of months. Every column corresponds to one regression run, every row corresponds to one test (note that I had to split the many tests into four views, i.e., the stacked images). I've annotated the most important RTL and DV PRs to corresponding regression runs.

flash_ctrl_regression_v2s_1_prs
flash_ctrl_regression_v2s_2_prs
flash_ctrl_regression_v2s_3
flash_ctrl_regression_v2s_4

What we can see:

So to summarize:

  • The only RTL change which had a negative impact on the regression was the sharing of the scrambling module to save area before M2.
  • Follow-up fixes were done to successfully improve the regression results again.
  • Switching on ECC in more tests impacted the regression results massively. Cleaning these test failures up is a major effort which is still ongoing. Thanks to the great work of @matutem we do now have a better understanding of the limitations of flash_ctrl DV and we have higher confidence in the design and DV.

Why ECC was switched off in so many tests is unclear and wasn't documented. We doubt that there was a good reason to disable this core feature in so many tests. Ideally, we would clean up all these failures of course but we shouldn't gate on this. Overall, I am convinced we are in a better shape on all sides compared to ES (DV, RTL, security) and I believe we shouldn't worry about Flash regarding Earlgrey-PROD.

FYI @andreaskurth , @matutem , @johngt, @moidx , @jonmichelson

@johngt
Copy link

johngt commented Jul 9, 2024

Thanks for the summary @vogelpi

@hayleynewton
Copy link

FYI, Look at the flash_ctrl tests based off the recent results.

Tests included in this ticket which fall below 100%:

80:
Flash_ctrl_serr_counter
Flash_ctrl_oversize_error
Flash_ctrl_rw_evict
Flash_ctrl_rw_derr
Flash_ctrl_rw_serr

0%:
flash_ctrl_derr_detect

We also have some outside of the ones flagged here:

Tests which fall below 100%:

80:
Flash_ctrl_ro
Flash_ctrl_rw
Flash_ctrl_phy_ack_consistency

matutem added a commit to matutem/opentitan that referenced this issue Jul 22, 2024
The errors injected need to be tracked per bank, partition, and caller.

Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit to matutem/opentitan that referenced this issue Jul 22, 2024
Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit to matutem/opentitan that referenced this issue Jul 22, 2024
The addresses written need to be tracked per bank and partition.
Add tracking for address ranges written during initialization.

Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit to matutem/opentitan that referenced this issue Jul 22, 2024
Data errors for host reads return d_error in TL response, and this must be handled in the
scoreboard. The code used to completely ignore these errors.

Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit to matutem/opentitan that referenced this issue Jul 22, 2024
The errors injected need to be tracked per bank, partition, and caller.

Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit to matutem/opentitan that referenced this issue Jul 22, 2024
Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit to matutem/opentitan that referenced this issue Jul 22, 2024
The addresses written need to be tracked per bank and partition.
Add tracking for address ranges written during initialization.

Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit to matutem/opentitan that referenced this issue Jul 22, 2024
Data errors for host reads return d_error in TL response, and this must be handled in the
scoreboard. The code used to completely ignore these errors.

Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit to matutem/opentitan that referenced this issue Jul 23, 2024
Data errors for host reads return d_error in TL response, and this must be handled in the
scoreboard. The code used to completely ignore these errors.

Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit to matutem/opentitan that referenced this issue Jul 23, 2024
Data errors for host reads return d_error in TL response, and this must be handled in the
scoreboard. The code used to completely ignore these errors.

Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit to matutem/opentitan that referenced this issue Jul 24, 2024
The errors injected need to be tracked per bank, partition, and caller.

Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit to matutem/opentitan that referenced this issue Jul 24, 2024
Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit to matutem/opentitan that referenced this issue Jul 24, 2024
The addresses written need to be tracked per bank and partition.
Add tracking for address ranges written during initialization.

Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit to matutem/opentitan that referenced this issue Jul 24, 2024
Data errors for host reads return d_error in TL response, and this must
be detected in the scoreboard. The code used to completely ignore these
errors.

Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit to matutem/opentitan that referenced this issue Jul 24, 2024
The errors injected need to be tracked per bank, partition, and caller.

Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit to matutem/opentitan that referenced this issue Jul 24, 2024
Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit to matutem/opentitan that referenced this issue Jul 24, 2024
The addresses written need to be tracked per bank and partition.
Add tracking for address ranges written during initialization.

Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit to matutem/opentitan that referenced this issue Jul 24, 2024
Data errors for host reads return d_error in TL response, and this must
be detected in the scoreboard. The code used to completely ignore these
errors.

Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit to matutem/opentitan that referenced this issue Jul 24, 2024
The errors injected need to be tracked per bank, partition, and caller.

Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit to matutem/opentitan that referenced this issue Jul 24, 2024
Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit to matutem/opentitan that referenced this issue Jul 24, 2024
The addresses written need to be tracked per bank and partition.
Add tracking for address ranges written during initialization.

Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit to matutem/opentitan that referenced this issue Jul 24, 2024
Data errors for host reads return d_error in TL response, and this must
be detected in the scoreboard. The code used to completely ignore these
errors.

Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit to matutem/opentitan that referenced this issue Jul 25, 2024
The errors injected need to be tracked per bank, partition, and caller.

Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit to matutem/opentitan that referenced this issue Jul 25, 2024
Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit to matutem/opentitan that referenced this issue Jul 25, 2024
The addresses written need to be tracked per bank and partition.
Add tracking for address ranges written during initialization.

Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit to matutem/opentitan that referenced this issue Jul 25, 2024
Data errors for host reads return d_error in TL response, and this must
be detected in the scoreboard. The code used to completely ignore these
errors.

Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit to matutem/opentitan that referenced this issue Jul 26, 2024
The errors injected need to be tracked per bank, partition, and caller.

Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit to matutem/opentitan that referenced this issue Jul 26, 2024
Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit to matutem/opentitan that referenced this issue Jul 26, 2024
The addresses written need to be tracked per bank and partition.
Add tracking for address ranges written during initialization.

Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit to matutem/opentitan that referenced this issue Jul 26, 2024
Data errors for host reads return d_error in TL response, and this must
be detected in the scoreboard. The code used to completely ignore these
errors.

Addresses lowRISC#22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit that referenced this issue Jul 26, 2024
The errors injected need to be tracked per bank, partition, and caller.

Addresses #22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit that referenced this issue Jul 26, 2024
Addresses #22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit that referenced this issue Jul 26, 2024
The addresses written need to be tracked per bank and partition.
Add tracking for address ranges written during initialization.

Addresses #22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
matutem added a commit that referenced this issue Jul 26, 2024
Data errors for host reads return d_error in TL response, and this must
be detected in the scoreboard. The code used to completely ignore these
errors.

Addresses #22879

Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component:DV DV issue: testbench, test case, etc. Component:TestTriage IP:flash_ctrl
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants