Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29710

Valgrind tests massively fail due to silently killing server on shutdown timeout

    XMLWordPrintable

Details

    Description

      A number of tests very often fail on the Valgrind builder in our CI system.

      A prime example is the test innodb.table_flags, which performs a slow shutdown (innodb_fast_shutdown=0) before injecting some corruption using Perl code. On Valgrind, the SELECT statements after server restart would fail to return an error, because the slow shutdown did not proceed to the end as intended, but the server was silently and forcibly killed after an apparent 2-minute timeout was exceeded. As a result, crash recovery would apparently "heal" the corruption that was injected.

      Some tests, such as main.log_slow fail because under Valgrind, some steps would not complete in a few tenths of seconds as expected.

      Some replication tests would occasionally due to something related to the STOP SLAVE statement.

      In my experience, whatever Valgrind tests, AddressSanitizer (ASAN) and MemorySanitizer (MSAN, MDEV-20377) cover better. Because Valgrind employs a single-threaded JIT based CPU emulator, ASAN and MSAN are much more likely to find bugs related to race conditions, such as MDEV-23097 or MDEV-25064.

      I think that we should simply skip these tests on Valgrind (mostly via no_valgrind_unless_big.inc) to avoid lots of bogus failures.

      Attachments

        Issue Links

          Activity

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.