Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29684

Testing and improving 'cluster conflict resolving' with 10.4 and later

Details

    Description

      There are a number of bug reports of cluster wide conflict resolving related crashes or hangs. To troubleshoot these, it would be good to run generic cluster conflict tests, just for easier reproducing of the issue(s).

      Run high intensity write-write conflict load in the cluster, with varying SQL access patterns.
      Also, some mtr tests exercising high priority victim aborting, appear to be disabled in 10.4. Enable these and fix the test if it appears to be non-deterministic.

      Analyze and troubleshoot issues surfacing from the testing.

      Attachments

        1. all_bt_28102022.txt
          116 kB
        2. bt_all_v2.txt
          117 kB
        3. bt_all.txt
          115 kB
        4. mysql.err
          1.02 MB
        5. new.err
          1.21 MB
        6. oltp_and_ddl_v1.yy
          4 kB
        7. oltp_and_ddl_v2.yy
          3 kB
        8. oltp.zz
          2 kB

        Issue Links

          Activity

            ramesh Ramesh Sivaraman added a comment - - edited

            seppo We can reproduce the problem even without the {{SET wsrep_OSU_method..}}statement. This is a sporadic issue, so start the RQG run multiple times to reproduce the problem.

            PFA grammar oltp_and_ddl_v2.yy to reproduce the issue.

            RQG command to reproduce the hang problem.

            perl runall-new.pl \
               --basedir=/test/mtest/mariadb-10.4.28-linux-x86_64 \
               --vardir=/home/ramesh/galera-rqg-test \
               --mysqld=--wsrep-provider=/test/mtest/mariadb-10.4.28-linux-x86_64/lib/libgalera_smm.so \
               --gendata=conf/mariadb/oltp.zz \
               --grammar=conf/mariadb/oltp_and_ddl_v2.yy \
               --threads=16 \
               --galera=mss \
               --mysqld=--wsrep_sst_method=rsync \
               --mysqld=--core \
               --mysqld=--general-log \
               --mysqld=--general-log-file=queries.log \
               --mysqld=--log-output=file \
               --mysqld=--wsrep-debug=0 \
               --mysqld=--wsrep-sync-wait=15 \
               --mysqld=--wsrep_retry_autocommit=0 \
               --mysqld=--wsrep_slave_threads=12 \
               --mysqld=--wsrep_log_conflicts=1 \
               --mysqld=--wsrep_on=1 \
               --mysqld=--default-storage-engine=innodb \
               --mysqld=--sort_buffer_size=200M \
               --mysqld=--innodb-lock-wait-timeout=1 \
               --mysqld=--gtid_domain_id=10 \
               --mysqld=--wsrep_gtid_domain_id=100 \
               --mysqld=--wsrep_gtid_mode=ON \
               --mysqld=--wsrep_slave_threads=4 \
               --mysqld=--slave_parallel_threads=4 \
               --mysqld=--server_id=11 \
               --mysqld=--gtid_strict_mode=1 \
               --mysqld=--log_slave_updates=ON \
               --mysqld=--log_bin=binlog \
               --mysqld=--binlog_format=ROW \
               --mysqld=--master_info_repository=TABLE \
               --mysqld=--relay_log_info_repository=TABLE 
            

            ramesh Ramesh Sivaraman added a comment - - edited seppo We can reproduce the problem even without the {{SET wsrep_OSU_method..}}statement. This is a sporadic issue, so start the RQG run multiple times to reproduce the problem. PFA grammar oltp_and_ddl_v2.yy to reproduce the issue. RQG command to reproduce the hang problem. perl runall-new.pl \ --basedir=/test/mtest/mariadb-10.4.28-linux-x86_64 \ --vardir=/home/ramesh/galera-rqg-test \ --mysqld=--wsrep-provider=/test/mtest/mariadb-10.4.28-linux-x86_64/lib/libgalera_smm.so \ --gendata=conf/mariadb/oltp.zz \ --grammar=conf/mariadb/oltp_and_ddl_v2.yy \ --threads=16 \ --galera=mss \ --mysqld=--wsrep_sst_method=rsync \ --mysqld=--core \ --mysqld=--general-log \ --mysqld=--general-log-file=queries.log \ --mysqld=--log-output=file \ --mysqld=--wsrep-debug=0 \ --mysqld=--wsrep-sync-wait=15 \ --mysqld=--wsrep_retry_autocommit=0 \ --mysqld=--wsrep_slave_threads=12 \ --mysqld=--wsrep_log_conflicts=1 \ --mysqld=--wsrep_on=1 \ --mysqld=--default-storage-engine=innodb \ --mysqld=--sort_buffer_size=200M \ --mysqld=--innodb-lock-wait-timeout=1 \ --mysqld=--gtid_domain_id=10 \ --mysqld=--wsrep_gtid_domain_id=100 \ --mysqld=--wsrep_gtid_mode=ON \ --mysqld=--wsrep_slave_threads=4 \ --mysqld=--slave_parallel_threads=4 \ --mysqld=--server_id=11 \ --mysqld=--gtid_strict_mode=1 \ --mysqld=--log_slave_updates=ON \ --mysqld=--log_bin=binlog \ --mysqld=--binlog_format=ROW \ --mysqld=--master_info_repository=TABLE \ --mysqld=--relay_log_info_repository=TABLE
            seppo Seppo Jaakola added a comment -

            This new RQG grammar makes two other scenarios to surface, I will submit separate MDEV issue for both, as they are not related to the fixes suggested in PR's for this MDEV. The surfacing issues are:

            • KILL command is issued from inside of a transaction. And further, this KILL execution can be a target of another KILL command. BF aborting a KILL inside a transaction appears to cause a crash.
            • SAVEPOINT command will be replicated as query log event. If transaction containing such SAVEPOINT event, has to replay, the replaying may fail for un-assigned character set information.
            seppo Seppo Jaakola added a comment - This new RQG grammar makes two other scenarios to surface, I will submit separate MDEV issue for both, as they are not related to the fixes suggested in PR's for this MDEV. The surfacing issues are: KILL command is issued from inside of a transaction. And further, this KILL execution can be a target of another KILL command. BF aborting a KILL inside a transaction appears to cause a crash. SAVEPOINT command will be replicated as query log event. If transaction containing such SAVEPOINT event, has to replay, the replaying may fail for un-assigned character set information.
            jplindst Jan Lindström (Inactive) added a comment - - edited bb-10.4- MDEV-29684 -galera https://github.com/MariaDB/server/commit/c85f18a37feae677a1593fd7bbebd19bb4a5fd94 (sql) https://github.com/MariaDB/server/commit/3e466c745ef3a97e1372a23f37a0cf04745ce25f (test case) https://github.com/MariaDB/server/commit/c76134bb305e49689380e271edd959f656be4de0 (sql) https://github.com/MariaDB/server/commit/83a450464806625e8d77e1aeb93aa0f5b972ec59 (InnoDB) https://github.com/MariaDB/server/commit/54abb014a3abe2a2f09e7bdeb799b5614d9d94a4 (sql + InnoDB) bb-10.5- MDEV-29684 -galera (about same as 10.4) https://github.com/MariaDB/server/commit/e0e06ed82a6d75a279a425cd362537c3d3476ee9 bb-10.6- MDEV-29684 -galera https://github.com/MariaDB/server/commit/bcf368f58883f7efc8676426c56853014d15857d https://github.com/MariaDB/server/commit/31a8b027422ad63e0b1bc91ed982a5ba309d7c51 https://github.com/MariaDB/server/commit/2a1e41727cb9f4b563fb5bbefd4047d3e5878519 https://github.com/MariaDB/server/commit/640b5d52a8d48934c980d7cb15c02661ad20e56d https://github.com/MariaDB/server/commit/2a92e74dd4611367c3581e1b6542c537863b4675

            seppo, I provided some review comments to a 10.6 version of this: 1, 2, 3, 4. Please address it.

            I did not look at other versions yet.

            marko Marko Mäkelä added a comment - seppo , I provided some review comments to a 10.6 version of this: 1 , 2 , 3 , 4 . Please address it. I did not look at other versions yet.

            For the 10.6 version, a fix of MDEV-29860 could be useful to have.

            marko Marko Mäkelä added a comment - For the 10.6 version, a fix of MDEV-29860 could be useful to have.

            People

              jplindst Jan Lindström (Inactive)
              seppo Seppo Jaakola
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.