Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32896

Unstable XA + binglog tests, with possible MDEV-32830 caused issues

Details

    Description

      The following tests: binlog_xa_recover, binlog_xa_prepared_disconnect, and binlog_empty_xa_prepared have proven to be unstable.

      The issue is that they fail differently, and additionally fail in various different ways, on base 10.6 and the MDEV-32830 patch trees. As such, MTR stress testing of XA + binlog on the MDEV-32830 patch is not possible, and it is possible that MDEV-32830 is causing different/additional issues.

      Given this, these tests will need to be fixed and stabilized before signoff on the MDEV-32830 patch can happen.

      The failures occur even when run in single-thread instances (verified), but various issues can be made to shown quickly (< ~1 minute) using:

      rm -Rf /dev/shm/var_auto*; MTR_MEM=/dev/shm ./mysql-test-run --repeat=35 --parallel=30 --mem --force binlog_xa_recover{,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,} | tee mtr_output.txt
      rm -Rf /dev/shm/var_auto*; MTR_MEM=/dev/shm ./mysql-test-run --repeat=35 --parallel=30 --mem --force binlog_xa_prepared_disconnect{,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,} | tee mtr_output.txt
      rm -Rf /dev/shm/var_auto*; MTR_MEM=/dev/shm ./mysql-test-run --repeat=35 --parallel=30 --mem --force binlog_empty_xa_prepared{,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,} | tee mtr_output.txt
      

      Attachments

        Issue Links

          Activity

            Roel Roel Van de Paar added a comment - - edited

            Regrettably I am also seeing, more sporadically, issues with binlog_xa_checkpoint, binlog_xa_handling and xa_binlog, though the latter thus far on base only.

            binlog_xa_checkpoint and binlog_xa_handling will thus need to be checked also. For binlog_xa_handling issues have been seen only on the patch tree thus far.

            Additionally, what may be of interest, xa_binlog is considerably faster (17 seconds versus 70 seconds for 1085 tests) on the patch tree than on base. For this testcase, there seems to be a clear parallelism at work in the patch tree unlike base.

            Roel Roel Van de Paar added a comment - - edited Regrettably I am also seeing, more sporadically, issues with binlog_xa_checkpoint, binlog_xa_handling and xa_binlog, though the latter thus far on base only. binlog_xa_checkpoint and binlog_xa_handling will thus need to be checked also. For binlog_xa_handling issues have been seen only on the patch tree thus far. Additionally, what may be of interest, xa_binlog is considerably faster (17 seconds versus 70 seconds for 1085 tests) on the patch tree than on base. For this testcase, there seems to be a clear parallelism at work in the patch tree unlike base.
            Elkin Andrei Elkin added a comment -

            roel, I can't confirm by running them locally the way you did. On both bb-10.6-MDEV-31949 and the vanilla 10.6.
            Crossref is really slow but I could query out of it a list of binlog_xa_recover failures which may confirm the test is unstable.

            Let me ask you to paste 10.6 and bb-10.6-MDEV-31949 traces in two separate comments so that I'd try to match, or explain any difference?
            Let's start with binlog_xa_recover and binlog_xa_prepared_disconnect.

            Elkin Andrei Elkin added a comment - roel , I can't confirm by running them locally the way you did. On both bb-10.6- MDEV-31949 and the vanilla 10.6. Crossref is really slow but I could query out of it a list of binlog_xa_recover failures which may confirm the test is unstable. Let me ask you to paste 10.6 and bb-10.6- MDEV-31949 traces in two separate comments so that I'd try to match, or explain any difference? Let's start with binlog_xa_recover and binlog_xa_prepared_disconnect.
            Elkin Andrei Elkin added a comment -

            > I am also seeing, more sporadically, issues with binlog_xa_checkpoint
            In which branch?
            The test has been altered in 31949 in 9de57a483e7. Previously it must've been non-deterministic.
            So let's do the same as above, while I can't (could not) reproduce on 31949 I need traces.

            Please always paste them - even for your own records - as apparently mtr invocation references may not suffice for one with different env.

            Elkin Andrei Elkin added a comment - > I am also seeing, more sporadically, issues with binlog_xa_checkpoint In which branch? The test has been altered in 31949 in 9de57a483e7. Previously it must've been non-deterministic. So let's do the same as above, while I can't (could not) reproduce on 31949 I need traces. Please always paste them - even for your own records - as apparently mtr invocation references may not suffice for one with different env.
            Roel Roel Van de Paar added a comment - - edited

            > I can't confirm by running them locally the way you did. On both bb-10.6-MDEV-31949 and the vanilla 10.6.
            On bb-10.6-MDEV-31949, binlog_xa_recover looks stable, testing others.

            > In which branch?
            bb-10.6-MDEV-32830-qa before, but now testing bb-10.6-MDEV-31949.

            > The test has been altered in 31949 in 9de57a483e7. Previously it must've been non-deterministic.
            Understood, it looks like it.

            > Please always paste them - even for your own records - as apparently mtr invocation references may not suffice for one with different env.
            Agreed, and I generally would. In this case it wasn't crash/assert stack traces, but various "somewhat random errors" scrolling for many pages

            Roel Roel Van de Paar added a comment - - edited > I can't confirm by running them locally the way you did. On both bb-10.6- MDEV-31949 and the vanilla 10.6. On bb-10.6- MDEV-31949 , binlog_xa_recover looks stable, testing others. > In which branch? bb-10.6- MDEV-32830 -qa before, but now testing bb-10.6- MDEV-31949 . > The test has been altered in 31949 in 9de57a483e7. Previously it must've been non-deterministic. Understood, it looks like it. > Please always paste them - even for your own records - as apparently mtr invocation references may not suffice for one with different env. Agreed, and I generally would. In this case it wasn't crash/assert stack traces, but various "somewhat random errors" scrolling for many pages
            Roel Roel Van de Paar added a comment - - edited

            This is waiting for MDEV-32830 ftm, so I have reversed the blocker direction. Retesting required once MDEV-32830 and MDEV-31949 are ready for testing.

            Roel Roel Van de Paar added a comment - - edited This is waiting for MDEV-32830 ftm, so I have reversed the blocker direction. Retesting required once MDEV-32830 and MDEV-31949 are ready for testing.

            People

              Roel Roel Van de Paar
              Roel Roel Van de Paar
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.