Postpone fragile tests and run them one by one #187

Totktonada · 2019-08-26T14:48:11Z

Just like we doing for is_parallel = False test suites. It hopefully will allow us to overcome flaky test fails without test cases disabling.

The text was updated successfully, but these errors were encountered:

Added "fragile" option to suite.ini file format. The option lists tests that are not desined be run in parallel with others: say, those ones that lean of timings that are meet only when enough system resources are available. The option set in the same way as "disabled" and other test lists: | fragile = foo.test.lua ; gh-1234 | bar.test.lua ; gh-5678 Fixes #187.

Retry a failed test when it is marked as fragile (and several other conditions are met, see below). The test-run already allows to set a list of fragile tests. They are run one-by-one after all parallel ones in order to eliminate possible resource starvation and fit timings to ones when the tests pass. See [1]. In practice this approach does not help much against our problem with flaky tests. We decided to retry failed tests, when they are known as flagile. See [2]. The core idea is to split responsibility: known flaky fails will not deflect attention of a developer, but each fragile test will be marked explicitly, trackerized and will be analyzed by the quality assurance team. The default behaviour is not changed: each test from the fragile list will be run once after all parallel ones. But now it is possible to set retries amount. Beware: the implementation does not allow to just set retries count, it also requires to provide an md5sum of a failed test output (so called reject file). The idea here is to ensure that we retry the test only in case of a known fail: not some other fail within the test. This approach has the limitation: in case of fail a test may output an information that varies from run to run or depend of a base directory. We should always verify the output before put its checksum into the configuration file. Despite doubts regarding this approach, it looks simple and we decided to try and revisit it if there will be a need. See configuration example in [3]. [1]: tarantool/test-run#187 [2]: tarantool/test-run#189 [3]: tarantool/test-run#217 Part of #5050

Retry a failed test when it is marked as fragile (and several other conditions are met, see below). The test-run already allows to set a list of fragile tests. They are run one-by-one after all parallel ones in order to eliminate possible resource starvation and fit timings to ones when the tests pass. See [1]. In practice this approach does not help much against our problem with flaky tests. We decided to retry failed tests, when they are known as flagile. See [2]. The core idea is to split responsibility: known flaky fails will not deflect attention of a developer, but each fragile test will be marked explicitly, trackerized and will be analyzed by the quality assurance team. The default behaviour is not changed: each test from the fragile list will be run once after all parallel ones. But now it is possible to set retries amount. Beware: the implementation does not allow to just set retries count, it also requires to provide an md5sum of a failed test output (so called reject file). The idea here is to ensure that we retry the test only in case of a known fail: not some other fail within the test. This approach has the limitation: in case of fail a test may output an information that varies from run to run or depend of a base directory. We should always verify the output before put its checksum into the configuration file. Despite doubts regarding this approach, it looks simple and we decided to try and revisit it if there will be a need. See configuration example in [3]. [1]: tarantool/test-run#187 [2]: tarantool/test-run#189 [3]: tarantool/test-run#217 Part of #5050 (cherry picked from commit 43482ee)

Totktonada added the feature A new functionality label Aug 26, 2019

Totktonada closed this as completed in 725056f Aug 27, 2019

Totktonada mentioned this issue Aug 27, 2019

Postpone fragile tests #188

Merged

avtikhon mentioned this issue Feb 12, 2021

Need to make able to run flaky tests in parallel tarantool/tarantool-qa#80

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Postpone fragile tests and run them one by one #187

Postpone fragile tests and run them one by one #187

Totktonada commented Aug 26, 2019

Postpone fragile tests and run them one by one #187

Postpone fragile tests and run them one by one #187

Comments

Totktonada commented Aug 26, 2019