-
Notifications
You must be signed in to change notification settings - Fork 830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[flash_ctrl,dv] Lots of broken block-level tests in flash_ctrl #22879
Comments
Those failures are all due to enabling ECC more widely in block-level DV some time ago. Before doing this, the nightlies all looked good. The last RTL change was already before that. In the meantime, @matutem has been adjusting more block-level DV sequences to let them deal with ECC errors correctly. There are still some to do but I am not to worried about this. |
@vogelpi to update test list |
The current status (May 19 nightly regression) is as follows:
|
Discussed in triage: moving to M5 as DV issue because RTL hasn't been changed in a while, many tests passed before enabling ECC in DV, and the remaining DV fails seem to be caused by DV problems rather than RTL problems |
@matutem is still pending regression results for the following test cases:
|
- Improve tracking of words with error injection so only one error of any kind is injected per word. - Improve initialization of data in flash_ctrl_read_word_sweep. There is still room for improvements. Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
- Improve tracking of words with error injection so only one error of any kind is injected per word. - Improve initialization of data in flash_ctrl_read_word_sweep. There is still room for improvements. Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
- Improve tracking of words with error injection so only one error of any kind is injected per word. - Improve initialization of data in flash_ctrl_read_word_sweep. There is still room for improvements. Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
- Improve tracking of words with error injection so only one error of any kind is injected per word. - Improve initialization of data in flash_ctrl_read_word_sweep. There is still room for improvements. Addresses #22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
Moving this issue to M7, as just discussed in triage meeting. Those tests fail because ECC checks have recently been enabled, but flash_ctrl was already signed off at V2S without those checks. It is out of scope of M5 to bring flash_ctrl DV to a state in which all tests pass for more than 90% of the seeds with ECC checks enabled. We think the risk for critical RTL bugs in flash_ctrl is lower than when it was signed off at V2S because DV now covers the ECC feature and most tests pass. Also, flash_ctrl is getting exercised in almost all top-level tests as well as in almost all SiVal tests, and no critical bugs have been found. |
Based on last week's discussion, I decided it was useful to document the history of this and the current state in terms of regression failures and how all this aligns with the V2S signoff, as well as with the RTL and DV changes done for Earlgrey-PROD. Below, you can see an overview of the regressions results of the last couple of months. Every column corresponds to one regression run, every row corresponds to one test (note that I had to split the many tests into four views, i.e., the stacked images). I've annotated the most important RTL and DV PRs to corresponding regression runs. What we can see:
So to summarize:
Why ECC was switched off in so many tests is unclear and wasn't documented. We doubt that there was a good reason to disable this core feature in so many tests. Ideally, we would clean up all these failures of course but we shouldn't gate on this. Overall, I am convinced we are in a better shape on all sides compared to ES (DV, RTL, security) and I believe we shouldn't worry about Flash regarding Earlgrey-PROD. FYI @andreaskurth , @matutem , @johngt, @moidx , @jonmichelson |
Thanks for the summary @vogelpi |
FYI, Look at the flash_ctrl tests based off the recent results. Tests included in this ticket which fall below 100%:
0%: We also have some outside of the ones flagged here: Tests which fall below 100%:
|
The errors injected need to be tracked per bank, partition, and caller. Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
The addresses written need to be tracked per bank and partition. Add tracking for address ranges written during initialization. Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
Data errors for host reads return d_error in TL response, and this must be handled in the scoreboard. The code used to completely ignore these errors. Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
The errors injected need to be tracked per bank, partition, and caller. Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
The addresses written need to be tracked per bank and partition. Add tracking for address ranges written during initialization. Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
Data errors for host reads return d_error in TL response, and this must be handled in the scoreboard. The code used to completely ignore these errors. Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
Data errors for host reads return d_error in TL response, and this must be handled in the scoreboard. The code used to completely ignore these errors. Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
Data errors for host reads return d_error in TL response, and this must be handled in the scoreboard. The code used to completely ignore these errors. Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
The errors injected need to be tracked per bank, partition, and caller. Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
The addresses written need to be tracked per bank and partition. Add tracking for address ranges written during initialization. Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
Data errors for host reads return d_error in TL response, and this must be detected in the scoreboard. The code used to completely ignore these errors. Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
The errors injected need to be tracked per bank, partition, and caller. Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
The addresses written need to be tracked per bank and partition. Add tracking for address ranges written during initialization. Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
Data errors for host reads return d_error in TL response, and this must be detected in the scoreboard. The code used to completely ignore these errors. Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
The errors injected need to be tracked per bank, partition, and caller. Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
The addresses written need to be tracked per bank and partition. Add tracking for address ranges written during initialization. Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
Data errors for host reads return d_error in TL response, and this must be detected in the scoreboard. The code used to completely ignore these errors. Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
The errors injected need to be tracked per bank, partition, and caller. Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
The addresses written need to be tracked per bank and partition. Add tracking for address ranges written during initialization. Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
Data errors for host reads return d_error in TL response, and this must be detected in the scoreboard. The code used to completely ignore these errors. Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
The errors injected need to be tracked per bank, partition, and caller. Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
The addresses written need to be tracked per bank and partition. Add tracking for address ranges written during initialization. Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
Data errors for host reads return d_error in TL response, and this must be detected in the scoreboard. The code used to completely ignore these errors. Addresses lowRISC#22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
The errors injected need to be tracked per bank, partition, and caller. Addresses #22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
Addresses #22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
The addresses written need to be tracked per bank and partition. Add tracking for address ranges written during initialization. Addresses #22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
Data errors for host reads return d_error in TL response, and this must be detected in the scoreboard. The code used to completely ignore these errors. Addresses #22879 Signed-off-by: Guillermo Maturana <maturana@opentitan.org>
Hierarchy of regression failure
Block level
Failure Description
The nightly DV runs report lots of failing tests in
flash_ctrl
. They seem to have started breaking on 28th April, including the following tests:@matutem: I think you landed a large flash_ctrl PR a few days ago. Would you mind taking a look at some of these: I think your PR might well be responsible for the failures.
Steps to Reproduce
Tests with similar or related failures
No response
The text was updated successfully, but these errors were encountered: