Skip to content

Retry fragile tests or allow them to fail #189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Totktonada opened this issue Sep 23, 2019 · 0 comments
Closed

Retry fragile tests or allow them to fail #189

Totktonada opened this issue Sep 23, 2019 · 0 comments
Assignees
Labels
feature A new functionality

Comments

@Totktonada
Copy link
Member

Manual retrying of testing to understand whether a fail is persistent spends my time, it is undesirable.

I have two ways to overcome this, both are on 'raw idea' rights.

Provide a way to set a retrying count for fragile tests

It could be an option in suite.ini. If one of those attempts is successful, then the test should be considered as passed.

We also need an ability to retry a hang test.

I was against retrying for a long time and had hope that we'll fix all flaky tests in some future. It seems we really unable to achieve this goal.

Allow fragile tests to fail

Maybe it worth to do that under command line option and use it in CI, but not locally.

@Totktonada Totktonada added feature A new functionality raw idea labels Sep 23, 2019
@ligurio ligurio self-assigned this Apr 1, 2020
avtikhon added a commit that referenced this issue Aug 12, 2020
Added ability to set per suite in suite.ini configuration file
'fragile_retries' option, which sets the number of accepted
reruns of the test failed from 'fragile' list.

Part of #189.
avtikhon added a commit that referenced this issue Aug 12, 2020
Added ability to check failed tests w/ fragile list to be sure that
the current fail equal to the issue mentioned in the fragile list.

Closes #189.
avtikhon added a commit that referenced this issue Aug 12, 2020
Added ability to check failed tests w/ fragile list to be sure that
the current fail equal to the issue mentioned in the fragile list.

Closes #189.
avtikhon added a commit that referenced this issue Aug 13, 2020
Added ability to check failed tests w/ fragile list to be sure that
the current fail equal to the issue mentioned in the fragile list.

Closes #189.
avtikhon added a commit that referenced this issue Aug 13, 2020
Added ability to check failed tests w/ fragile list to be sure that
the current fail equal to the issue mentioned in the fragile list.
Fragile list should consist of the results files checksums with
mentioned issues in the format:

  fragile = <basename of the test> ; gh-<issue> md5sum:<checksum>

Closes #189.
avtikhon added a commit that referenced this issue Aug 13, 2020
Added ability to check failed tests w/ fragile list to be sure that
the current fail equal to the issue mentioned in the fragile list.
Fragile list should consist of the results files checksums with
mentioned issues in the format:

  fragile = <basename of the test> ; gh-<issue> md5sum:<checksum>

Closes #189.
avtikhon added a commit that referenced this issue Aug 13, 2020
Added ability to check failed tests w/ fragile list to be sure that
the current fail equal to the issue mentioned in the fragile list.
Fragile list should consist of the results files checksums with
mentioned issues in the format:

  fragile = <basename of the test> ; gh-<issue> md5sum:<checksum>

Closes #189.
avtikhon added a commit that referenced this issue Sep 15, 2020
Added ability to set per suite in suite.ini configuration file
'fragile_retries' option, which sets the number of accepted
reruns of the test failed from 'fragile' list.

Part of #189.
avtikhon added a commit that referenced this issue Sep 15, 2020
Added ability to check failed tests w/ fragile list to be sure that
the current fail equal to the issue mentioned in the fragile list.
Fragile list should consist of the results files checksums with
mentioned issues in the format:

  fragile = <basename of the test> ; gh-<issue> md5sum:<checksum>

Closes #189.
avtikhon added a commit that referenced this issue Sep 15, 2020
Added ability to set per suite in suite.ini configuration file
'retries' option, which sets the number of accepted reruns of
the tests failed from 'fragile' list:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
                "issues": [ "gh-4095" ],
        }
    }}

Part of #189
avtikhon added a commit that referenced this issue Sep 15, 2020
Added ability to check failed tests w/ fragile list to be sure that
the current fail equal to the issue mentioned in the fragile list.
Fragile list should consist of the results files checksums with
mentioned issues in the format:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
            "checksums": [ "050af3a99561a724013995668a4bc71c", "f34be60193cfe9221d3fe50df657e9d3" ]
        }
    }}

Closes #189
avtikhon added a commit that referenced this issue Sep 15, 2020
Added ability to check failed tests w/ fragile list to be sure that
the current fail equal to the issue mentioned in the fragile list.
Fragile list should consist of the results files checksums with
mentioned issues in the format:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
            "checksums": [ "050af3a99561a724013995668a4bc71c", "f34be60193cfe9221d3fe50df657e9d3" ]
        }
    }}

Closes #189
avtikhon added a commit that referenced this issue Sep 15, 2020
Added ability to set per suite in suite.ini configuration file
'retries' option, which sets the number of accepted reruns of
the tests failed from 'fragile' list:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
        }
    }}

Part of #189
avtikhon added a commit that referenced this issue Sep 15, 2020
Added ability to check failed tests w/ fragile list to be sure that
the current fail equal to the issue mentioned in the fragile list.
Fragile list should consist of the results files checksums with
mentioned issues in the format:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
            "checksums": [ "050af3a99561a724013995668a4bc71c", "f34be60193cfe9221d3fe50df657e9d3" ]
        }
    }}

Closes #189
avtikhon added a commit that referenced this issue Sep 15, 2020
Added ability to check failed tests w/ fragile list to be sure that
the current fail equal to the issue mentioned in the fragile list.
Fragile list should consist of the results files checksums with
mentioned issues in the format:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
            "checksums": [ "050af3a99561a724013995668a4bc71c", "f34be60193cfe9221d3fe50df657e9d3" ]
        }
    }}

Closes #189
avtikhon added a commit that referenced this issue Sep 15, 2020
Added ability to set per suite in suite.ini configuration file
'retries' option, which sets the number of accepted reruns of
the tests failed from 'fragile' list:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
        }
    }}

Part of #189
avtikhon added a commit that referenced this issue Sep 15, 2020
Added ability to check failed tests w/ fragile list to be sure that
the current fail equal to the issue mentioned in the fragile list.
Fragile list should consist of the results files checksums with
mentioned issues in the format:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
            "checksums": [ "050af3a99561a724013995668a4bc71c", "f34be60193cfe9221d3fe50df657e9d3" ]
        }
    }}

Closes #189
avtikhon added a commit that referenced this issue Sep 15, 2020
Added ability to set per suite in suite.ini configuration file
'retries' option, which sets the number of accepted reruns of
the tests failed from 'fragile' list:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
        }
    }}

Part of #189
avtikhon added a commit that referenced this issue Sep 15, 2020
Added ability to check failed tests w/ fragile list to be sure that
the current fail equal to the issue mentioned in the fragile list.
Fragile list should consist of the results files checksums with
mentioned issues in the format:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
            "checksums": [ "050af3a99561a724013995668a4bc71c", "f34be60193cfe9221d3fe50df657e9d3" ]
        }
    }}

Closes #189
avtikhon added a commit that referenced this issue Sep 15, 2020
Added ability to check failed tests w/ fragile list to be sure that
the current fail equal to the issue mentioned in the fragile list.
Fragile list should consist of the results files checksums with
mentioned issues in the format:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
            "checksums": [ "050af3a99561a724013995668a4bc71c", "f34be60193cfe9221d3fe50df657e9d3" ]
        }
    }}

Closes #189
avtikhon added a commit that referenced this issue Sep 15, 2020
Added ability to check failed tests w/ fragile list to be sure that
the current fail equal to the issue mentioned in the fragile list.
Fragile list should consist of the results files checksums with
mentioned issues in the format:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
            "checksums": [ "050af3a99561a724013995668a4bc71c", "f34be60193cfe9221d3fe50df657e9d3" ]
        }
    }}

Closes #189
avtikhon added a commit that referenced this issue Sep 19, 2020
Added ability to set per suite in suite.ini configuration file
'retries' option, which sets the number of accepted reruns of
the tests failed from 'fragile' list:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
        }
    }}

Part of #189
avtikhon added a commit that referenced this issue Sep 19, 2020
Added ability to check failed tests w/ fragile list to be sure that
the current fail equal to the issue mentioned in the fragile list.
Fragile list should consist of the results files checksums with
mentioned issues in the format:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
            "checksums": [ "050af3a99561a724013995668a4bc71c", "f34be60193cfe9221d3fe50df657e9d3" ]
        }
    }}

Closes #189
avtikhon added a commit that referenced this issue Sep 19, 2020
Added ability to check failed tests w/ fragile list to be sure that
the current fail equal to the issue mentioned in the fragile list.
Fragile list should consist of the results files checksums with
mentioned issues in the format:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
            "checksums": [ "050af3a99561a724013995668a4bc71c", "f34be60193cfe9221d3fe50df657e9d3" ]
        }
    }}

Closes #189
avtikhon added a commit that referenced this issue Sep 19, 2020
Added ability to check failed tests w/ fragile list to be sure that
the current fail equal to the issue mentioned in the fragile list.
Fragile list should consist of the results files checksums with
mentioned issues in the format:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
            "checksums": [ "050af3a99561a724013995668a4bc71c", "f34be60193cfe9221d3fe50df657e9d3" ]
        }
    }}

Closes #189
avtikhon added a commit that referenced this issue Sep 19, 2020
Added ability to check failed tests w/ fragile list to be sure that
the current fail equal to the issue mentioned in the fragile list.
Fragile list should consist of the results files checksums with
mentioned issues in the format:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
            "checksums": [ "050af3a99561a724013995668a4bc71c", "f34be60193cfe9221d3fe50df657e9d3" ]
        }
    }}

Closes #189
avtikhon added a commit that referenced this issue Sep 21, 2020
Added ability to set per suite in suite.ini configuration file
'retries' option, which sets the number of accepted reruns of
the tests failed from 'fragile' list:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
        }
    }}

Part of #189
avtikhon added a commit that referenced this issue Sep 21, 2020
Added ability to check failed tests w/ fragile list to be sure that
the current fail equal to the issue mentioned in the fragile list.
Fragile list should consist of the results files checksums with
mentioned issues in the format:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
            "checksums": [ "050af3a99561a724013995668a4bc71c", "f34be60193cfe9221d3fe50df657e9d3" ]
        }
    }}

Closes #189
avtikhon added a commit that referenced this issue Sep 21, 2020
Added ability to set per suite in suite.ini configuration file
'retries' option, which sets the number of accepted reruns of
the tests failed from 'fragile' list:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
        }
    }}

Part of #189
avtikhon added a commit that referenced this issue Sep 21, 2020
Added ability to check failed tests w/ fragile list to be sure that
the current fail equal to the issue mentioned in the fragile list.
Fragile list should consist of the results files checksums with
mentioned issues in the format:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
            "checksums": [ "050af3a99561a724013995668a4bc71c", "f34be60193cfe9221d3fe50df657e9d3" ]
        }
    }}

Closes #189
avtikhon added a commit that referenced this issue Sep 21, 2020
Added ability to check results file checksum on tests fail and
compare with the checksums of the known issues mentioned in the
fragile list. Fragile list should consist of the results files
checksums with its issues in the format:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
            "checksums": [ "050af3a99561a724013995668a4bc71c", "f34be60193cfe9221d3fe50df657e9d3" ]
        }
    }}

Closes #189
avtikhon added a commit that referenced this issue Sep 21, 2020
Added ability to check results file checksum on tests fail and
compare with the checksums of the known issues mentioned in the
fragile list. Fragile list should consist of the results files
checksums with its issues in the format:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
            "checksums": [ "050af3a99561a724013995668a4bc71c", "f34be60193cfe9221d3fe50df657e9d3" ]
        }
    }}

Closes #189
avtikhon added a commit that referenced this issue Sep 21, 2020
Added ability to check results file checksum on tests fail and
compare with the checksums of the known issues mentioned in the
fragile list. Fragile list should consist of the results files
checksums with its issues in the format:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
            "checksums": [ "050af3a99561a724013995668a4bc71c", "f34be60193cfe9221d3fe50df657e9d3" ]
        }
    }}

Closes #189
avtikhon added a commit that referenced this issue Sep 22, 2020
Added ability to set per suite in suite.ini configuration file
'retries' option, which sets the number of accepted reruns of
the tests failed from 'fragile' list:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
        }
    }}

Part of #189
avtikhon added a commit that referenced this issue Sep 22, 2020
Added ability to check results file checksum on tests fail and
compare with the checksums of the known issues mentioned in the
fragile list. Fragile list should consist of the results files
checksums with its issues in the format:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
            "checksums": [ "050af3a99561a724013995668a4bc71c", "f34be60193cfe9221d3fe50df657e9d3" ]
        }
    }}

Closes #189
avtikhon added a commit that referenced this issue Sep 22, 2020
Added ability to set per suite in suite.ini configuration file
'retries' option, which sets the number of accepted reruns of
the tests failed from 'fragile' list:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
        }
    }}

Part of #189
avtikhon added a commit that referenced this issue Sep 22, 2020
Added ability to check results file checksum on tests fail and
compare with the checksums of the known issues mentioned in the
fragile list. Fragile list should consist of the results files
checksums with its issues in the format:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
            "checksums": [ "050af3a99561a724013995668a4bc71c", "f34be60193cfe9221d3fe50df657e9d3" ]
        }
    }}

Closes #189
avtikhon added a commit that referenced this issue Sep 22, 2020
Added ability to set per suite in suite.ini configuration file
'retries' option, which sets the number of accepted reruns of
the tests failed from 'fragile' list:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
        }
    }}

Part of #189
avtikhon added a commit that referenced this issue Sep 22, 2020
Added ability to check results file checksum on tests fail and
compare with the checksums of the known issues mentioned in the
fragile list. Fragile list should consist of the results files
checksums with its issues in the format:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
            "checksums": [ "050af3a99561a724013995668a4bc71c", "f34be60193cfe9221d3fe50df657e9d3" ]
        }
    }}

Closes #189
avtikhon added a commit that referenced this issue Sep 24, 2020
Added ability to set per suite in suite.ini configuration file
'retries' option, which sets the number of accepted reruns of
the tests failed from 'fragile' list:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
        }
    }}

Part of #189
avtikhon added a commit that referenced this issue Sep 24, 2020
Added ability to check results file checksum on tests fail and
compare with the checksums of the known issues mentioned in the
fragile list. Fragile list should consist of the results files
checksums with its issues in the format:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
            "checksums": [ "050af3a99561a724013995668a4bc71c", "f34be60193cfe9221d3fe50df657e9d3" ]
        }
    }}

Closes #189
avtikhon added a commit that referenced this issue Sep 24, 2020
Added ability to set per suite in suite.ini configuration file
'retries' option, which sets the number of accepted reruns of
the tests failed from 'fragile' list:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
        }
    }}

Part of #189
avtikhon added a commit that referenced this issue Sep 24, 2020
Added ability to check results file checksum on tests fail and
compare with the checksums of the known issues mentioned in the
fragile list. Fragile list should consist of the results files
checksums with its issues in the format:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
            "checksums": [ "050af3a99561a724013995668a4bc71c", "f34be60193cfe9221d3fe50df657e9d3" ]
        }
    }}

Closes #189
Totktonada pushed a commit that referenced this issue Sep 24, 2020
Added ability to set per suite in suite.ini configuration file
'retries' option, which sets the number of accepted reruns of
the tests failed from 'fragile' list:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
        }
    }}

Part of #189
Totktonada pushed a commit that referenced this issue Sep 24, 2020
Added ability to check results file checksum on tests fail and
compare with the checksums of the known issues mentioned in the
fragile list. Fragile list should consist of the results files
checksums with its issues in the format:

  fragile = {
    "retries": 10,
    "tests": {
        "bitset.test.lua": {
            "issues": [ "gh-4095" ],
            "checksums": [ "050af3a99561a724013995668a4bc71c", "f34be60193cfe9221d3fe50df657e9d3" ]
        }
    }}

Closes #189
@Totktonada Totktonada assigned avtikhon and unassigned ligurio Sep 24, 2020
Totktonada added a commit to tarantool/tarantool that referenced this issue Sep 24, 2020
Retry a failed test when it is marked as fragile (and several other
conditions are met, see below).

The test-run already allows to set a list of fragile tests. They are run
one-by-one after all parallel ones in order to eliminate possible
resource starvation and fit timings to ones when the tests pass. See
[1].

In practice this approach does not help much against our problem with
flaky tests. We decided to retry failed tests, when they are known as
flagile. See [2].

The core idea is to split responsibility: known flaky fails will not
deflect attention of a developer, but each fragile test will be marked
explicitly, trackerized and will be analyzed by the quality assurance
team.

The default behaviour is not changed: each test from the fragile list
will be run once after all parallel ones. But now it is possible to set
retries amount.

Beware: the implementation does not allow to just set retries count, it
also requires to provide an md5sum of a failed test output (so called
reject file). The idea here is to ensure that we retry the test only in
case of a known fail: not some other fail within the test.

This approach has the limitation: in case of fail a test may output an
information that varies from run to run or depend of a base directory.
We should always verify the output before put its checksum into the
configuration file.

Despite doubts regarding this approach, it looks simple and we decided
to try and revisit it if there will be a need.

See configuration example in [3].

[1]: tarantool/test-run#187
[2]: tarantool/test-run#189
[3]: tarantool/test-run#217

Part of #5050
Totktonada added a commit to tarantool/tarantool that referenced this issue Sep 24, 2020
Retry a failed test when it is marked as fragile (and several other
conditions are met, see below).

The test-run already allows to set a list of fragile tests. They are run
one-by-one after all parallel ones in order to eliminate possible
resource starvation and fit timings to ones when the tests pass. See
[1].

In practice this approach does not help much against our problem with
flaky tests. We decided to retry failed tests, when they are known as
flagile. See [2].

The core idea is to split responsibility: known flaky fails will not
deflect attention of a developer, but each fragile test will be marked
explicitly, trackerized and will be analyzed by the quality assurance
team.

The default behaviour is not changed: each test from the fragile list
will be run once after all parallel ones. But now it is possible to set
retries amount.

Beware: the implementation does not allow to just set retries count, it
also requires to provide an md5sum of a failed test output (so called
reject file). The idea here is to ensure that we retry the test only in
case of a known fail: not some other fail within the test.

This approach has the limitation: in case of fail a test may output an
information that varies from run to run or depend of a base directory.
We should always verify the output before put its checksum into the
configuration file.

Despite doubts regarding this approach, it looks simple and we decided
to try and revisit it if there will be a need.

See configuration example in [3].

[1]: tarantool/test-run#187
[2]: tarantool/test-run#189
[3]: tarantool/test-run#217

Part of #5050

(cherry picked from commit 43482ee)
Totktonada added a commit to tarantool/tarantool that referenced this issue Sep 24, 2020
Retry a failed test when it is marked as fragile (and several other
conditions are met, see below).

The test-run already allows to set a list of fragile tests. They are run
one-by-one after all parallel ones in order to eliminate possible
resource starvation and fit timings to ones when the tests pass. See
[1].

In practice this approach does not help much against our problem with
flaky tests. We decided to retry failed tests, when they are known as
flagile. See [2].

The core idea is to split responsibility: known flaky fails will not
deflect attention of a developer, but each fragile test will be marked
explicitly, trackerized and will be analyzed by the quality assurance
team.

The default behaviour is not changed: each test from the fragile list
will be run once after all parallel ones. But now it is possible to set
retries amount.

Beware: the implementation does not allow to just set retries count, it
also requires to provide an md5sum of a failed test output (so called
reject file). The idea here is to ensure that we retry the test only in
case of a known fail: not some other fail within the test.

This approach has the limitation: in case of fail a test may output an
information that varies from run to run or depend of a base directory.
We should always verify the output before put its checksum into the
configuration file.

Despite doubts regarding this approach, it looks simple and we decided
to try and revisit it if there will be a need.

See configuration example in [3].

[1]: tarantool/test-run#187
[2]: tarantool/test-run#189
[3]: tarantool/test-run#217

Part of #5050

(cherry picked from commit 43482ee)
Totktonada added a commit to tarantool/tarantool that referenced this issue Sep 24, 2020
Retry a failed test when it is marked as fragile (and several other
conditions are met, see below).

The test-run already allows to set a list of fragile tests. They are run
one-by-one after all parallel ones in order to eliminate possible
resource starvation and fit timings to ones when the tests pass. See
[1].

In practice this approach does not help much against our problem with
flaky tests. We decided to retry failed tests, when they are known as
flagile. See [2].

The core idea is to split responsibility: known flaky fails will not
deflect attention of a developer, but each fragile test will be marked
explicitly, trackerized and will be analyzed by the quality assurance
team.

The default behaviour is not changed: each test from the fragile list
will be run once after all parallel ones. But now it is possible to set
retries amount.

Beware: the implementation does not allow to just set retries count, it
also requires to provide an md5sum of a failed test output (so called
reject file). The idea here is to ensure that we retry the test only in
case of a known fail: not some other fail within the test.

This approach has the limitation: in case of fail a test may output an
information that varies from run to run or depend of a base directory.
We should always verify the output before put its checksum into the
configuration file.

Despite doubts regarding this approach, it looks simple and we decided
to try and revisit it if there will be a need.

See configuration example in [3].

[1]: tarantool/test-run#187
[2]: tarantool/test-run#189
[3]: tarantool/test-run#217

Part of #5050

(cherry picked from commit 43482ee)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new functionality
Projects
None yet
Development

No branches or pull requests

3 participants