Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31988

galera_partition test: assertion due to unallowed state transition

Details

    Description

      galera_partition test fails intermittently (~30% of the time) when testing on the 10.6-enterprise-rebase branch due to unallowed state transition:

      2023-08-16 16:02:13 2 [Note] WSREP: Server panda synced with group
      2023-08-16 16:02:13 2 [Note] WSREP: server panda state change: joined -> synced
      2023-08-16 16:02:13 2 [Note] WSREP: Server status change joined -> synced
      2023-08-16 16:02:13 2 [Note] WSREP: Synchronized with group, ready for connections
      2023-08-16 16:02:13 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2023-08-16 16:02:13 2 [Note] WSREP: Detected STR version: 1, req_len: 153, req: STRv1
      2023-08-16 16:02:13 2 [Note] WSREP: Cert index preload: 3 -> 3
      2023-08-16 16:02:13 0 [Note] WSREP: 1.1 (panda): State transfer to 3.3 (ns367620) complete.
      2023-08-16 16:02:13 0 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 3)
      2023-08-16 16:02:13 0 [Note] WSREP: Processing event queue:... -nan% (0/0 events) complete.
      2023-08-16 16:02:13 0 [Note] WSREP: Member 1.1 (ns367620) synced with group.
      2023-08-16 16:02:13 0 [Note] WSREP: Processing event queue:...100.0% (1/1 events) complete.
      2023-08-16 16:02:13 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 3)
      2023-08-16 16:02:13 22 [Note] WSREP: assigned new next trx id: 25
      2023-08-16 16:02:13 22 [Note] WSREP: wsrep_after_statement for 22 client_state exec  client_mode local trans_state aborted
      2023-08-16 16:02:13 22 [Note] WSREP: wsrep_thd_binlog_reset
      2023-08-16 16:02:13 22 [Note] WSREP: assigned new next trx id: 26
      2023-08-16 16:02:13 22 [Note] WSREP: wsrep_commit_empty for 22 client_state exec client_mode local trans_state executing sql SELECT VARIABLE_NAME, VARIABLE_VALUE
                    FROM INFORMATION_SCHEMA.GLOBAL_STATUS
                    WHERE VARIABLE_NAME = 'wsrep_ready'
      2023-08-16 16:02:13 22 [Note] WSREP: wsrep_after_statement for 22 client_state exec  client_mode local trans_state aborted
      2023-08-16 16:02:13 22 [Note] WSREP: wsrep_thd_binlog_reset
      2023-08-16 16:02:13 22 [Note] WSREP: wsrep_after_statement for 22 client_state exec  client_mode local trans_state aborted
      2023-08-16 16:02:13 22 [Note] WSREP: wsrep_thd_binlog_reset
      2023-08-16 16:02:13 22 [Note] WSREP: assigned new next trx id: 27
      2023-08-16 16:02:13 2 [Note] WSREP: server panda state change: synced -> donor
      2023-08-16 16:02:13 2 [Note] WSREP: Server status change synced -> donor
      2023-08-16 16:02:13 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2023-08-16 16:02:13 0 [Note] WSREP: async IST sender starting to serve tcp://127.0.0.1:16014 sending 3-3, preload starts from 3
      2023-08-16 16:02:13 0 [Note] WSREP: Donor monitor thread started to monitor
      2023-08-16 16:02:13 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'donor' --address '127.0.0.1:16015/rsync_sst' --local-port 16001 --socket '/home/panda/maria-10.6/build/mysql-test/var/tmp/mysqld.2.sock' --progress 0 --datadir '/home/panda/maria-10.6/build/mysql-test/var/mysqld.2/data/' --defaults-file '/home/panda/maria-10.6/build/mysql-test/var/my.cnf' --defaults-group-suffix '.2' --gtid '7ac8f618-3c3d-11ee-94fc-13bdcf9cdd3a:3' --gtid-domain-id 0 --binlog-index 'mysqld-bin.index' --mysqld-args --defaults-group-suffix=.2 --defaults-file=/home/panda/maria-10.6/build/mysql-test/var/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --loose-enable-partition --core-file --loose-debug-sync-timeout=300'
      2023-08-16 16:02:13 0 [Note] WSREP: IST sender 3 -> 3
      2023-08-16 16:02:13 2 [Note] WSREP: sst_donor_thread signaled with 0
      2023-08-16 16:02:13 13 [Note] WSREP: Server panda synced with group
      2023-08-16 16:02:13 13 [Warning] WSREP: server: panda unallowed state transition: donor -> synced
       
      mariadbd: /home/panda/maria-10.6/wsrep-lib/src/server_state.cpp:1368: void wsrep::server_state::state(wsrep::unique_lock<wsrep::mutex>&, state): Assertion `0' failed.
      230816 16:02:13 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed, 
      something is definitely wrong and this may fail.
       
      Server version: 10.6.15-10-MariaDB-debug-log source revision: a58082bb6a7efb490aafbed5e264168b743c0a9d
      key_buffer_size=1048576
      read_buffer_size=131072
      max_used_connections=1
      max_threads=153
      thread_count=14
      It is possible that mysqld could use up to 
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 63896 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x7f1e44000dc8
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7f1e96b1bb90 thread_stack 0x49000
      WSREP_SST: [INFO] rsync SST started on donor (20230816 16:02:13.195)
      WSREP_SST: [INFO] Using stunnel for SSL encryption: CA: '/home/panda/maria-10.6/mysql-test/std_data/cacert.pem', CAPATH='', ssl-mode='VERIFY_CA' (20230816 16:02:13.312)
      2023-08-16 16:02:13 0 [Note] WSREP: Flushing tables for SST...
      mysys/stacktrace.c:215(my_print_stacktrace)[0x559e8ac50d9f]
      sql/signal_handler.cc:241(handle_fatal_signal)[0x559e8a2f4435]
      libc_sigaction.c:0(__restore_rt)[0x7f1ecb577520]
      nptl/pthread_kill.c:44(__pthread_kill_implementation)[0x7f1ecb5cba7c]
      posix/raise.c:27(__GI_raise)[0x7f1ecb577476]
      stdlib/abort.c:81(__GI_abort)[0x7f1ecb55d7f3]
      intl/loadmsgcat.c:1177(_nl_load_domain)[0x7f1ecb55d71b]
      /lib/x86_64-linux-gnu/libc.so.6(+0x39e96)[0x7f1ecb56ee96]
      src/server_state.cpp:1371(wsrep::server_state::state(std::unique_lock<wsrep::mutex>&, wsrep::server_state::state))[0x559e8ae3c76c]
      src/server_state.cpp:1095(wsrep::server_state::on_sync())[0x559e8ae3af59]
      src/wsrep_provider_v26.cpp:531((anonymous namespace)::synced_cb(void*))[0x559e8ae527d7]
      2023-08-16 16:02:14 0 [Note] WSREP: 0.1 (ns367620): State transfer to 2.2 (ns367620) complete.
      2023-08-16 16:02:14 0 [Note] WSREP: Member 0.1 (ns367620) synced with group.
      src/replicator_smm.cpp:3066(galera::ReplicatorSMM::process_sync(long))[0x7f1ec644c904]
      src/gcs_action_source.cpp:144(galera::GcsActionSource::dispatch(void*, gcs_action const&, bool&))[0x7f1ec649da0e]
      src/gcs_action_source.cpp:192(galera::GcsActionSource::process(void*, bool&))[0x7f1ec649dfa9]
      src/replicator_smm.cpp:402(galera::ReplicatorSMM::async_recv(void*))[0x7f1ec643bbe8]
      src/wsrep_provider.cpp:291(galera_recv)[0x7f1ec6412b06]
      src/wsrep_provider_v26.cpp:858(wsrep::wsrep_provider_v26::run_applier(wsrep::high_priority_service*))[0x559e8ae53b8e]
      sql/wsrep_thd.cc:57(wsrep_replication_process(THD*, void*))[0x559e8a714c20]
      2023-08-16 16:02:15 0 [Note] WSREP: 2.2 (ns367620): State transfer from 0.1 (ns367620) complete.
      2023-08-16 16:02:15 0 [Note] WSREP: Member 2.2 (ns367620) synced with group.
      sql/wsrep_mysqld.cc:4034(start_wsrep_THD(void*))[0x559e8a70007d]
      perfschema/pfs.cc:2203(pfs_spawn_thread)[0x559e8a659630]
      nptl/pthread_create.c:442(start_thread)[0x7f1ecb5c9b43]
      x86_64/clone3.S:83(__clone3)[0x7f1ecb65ba00]
       
      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x0): (null)
      Connection ID (thread ID): 13
      Status: NOT_KILLED

      Attachments

        Issue Links

          Activity

            I could not reproduce this issue:

            jan@jan-HP-ZBook-15u-G5:~/work/mariadb/10.6e/mysql-test$ ./mtr galera.galera_partition --repeat=30 --enable-disabled
            Logging: ./mtr  galera.galera_partition --repeat=30 --enable-disabled
            VS config: 
            vardir: /home/jan/work/mariadb/10.6e/mysql-test/var
            Checking leftover processes...
            Removing old var directory...
            Creating var directory '/home/jan/work/mariadb/10.6e/mysql-test/var'...
            Checking supported features...
            MariaDB Version 10.6.15-10-MariaDB-debug
             - SSL connections supported
             - binaries are debug compiled
             - binaries built with wsrep patch
            Collecting tests...
             - galera.galera_partition will be run although it's been disabled
               due to 'MENT-1922 assertion due to unallowed state transition'
            Installing system database...
             
            ==============================================================================
             
            TEST                                      RESULT   TIME (ms) or COMMENT
            --------------------------------------------------------------------------
             
            worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
            galera.galera_partition 'innodb'         [ pass ]   3729
            galera.galera_partition 'innodb'         [ 2 pass ]   3917
            galera.galera_partition 'innodb'         [ 3 pass ]   3748
            galera.galera_partition 'innodb'         [ 4 pass ]   3914
            galera.galera_partition 'innodb'         [ 5 pass ]   3818
            galera.galera_partition 'innodb'         [ 6 pass ]   3690
            galera.galera_partition 'innodb'         [ 7 pass ]   3791
            galera.galera_partition 'innodb'         [ 8 pass ]   3853
            galera.galera_partition 'innodb'         [ 9 pass ]   3849
            galera.galera_partition 'innodb'         [ 10 pass ]   3865
            galera.galera_partition 'innodb'         [ 11 pass ]   3801
            galera.galera_partition 'innodb'         [ 12 pass ]   3803
            galera.galera_partition 'innodb'         [ 13 pass ]   3832
            galera.galera_partition 'innodb'         [ 14 pass ]   3901
            galera.galera_partition 'innodb'         [ 15 pass ]   3794
            galera.galera_partition 'innodb'         [ 16 pass ]   3825
            galera.galera_partition 'innodb'         [ 17 pass ]   3824
            galera.galera_partition 'innodb'         [ 18 pass ]   3938
            galera.galera_partition 'innodb'         [ 19 pass ]   3776
            galera.galera_partition 'innodb'         [ 20 pass ]   3884
            galera.galera_partition 'innodb'         [ 21 pass ]   3875
            galera.galera_partition 'innodb'         [ 22 pass ]   3975
            galera.galera_partition 'innodb'         [ 23 pass ]   3852
            galera.galera_partition 'innodb'         [ 24 pass ]   3975
            galera.galera_partition 'innodb'         [ 25 pass ]   3811
            galera.galera_partition 'innodb'         [ 26 pass ]   3841
            galera.galera_partition 'innodb'         [ 27 pass ]   3914
            galera.galera_partition 'innodb'         [ 28 pass ]   3924
            galera.galera_partition 'innodb'         [ 29 pass ]   3818
            galera.galera_partition 'innodb'         [ 30 pass ]   3870
            --------------------------------------------------------------------------
            The servers were restarted 0 times
            Spent 115.407 of 154 seconds executing testcases
             
            Completed: All 30 tests were successful.
            

            janlindstrom Jan Lindström added a comment - I could not reproduce this issue: jan@jan-HP-ZBook-15u-G5:~/work/mariadb/10.6e/mysql-test$ ./mtr galera.galera_partition --repeat=30 --enable-disabled Logging: ./mtr galera.galera_partition --repeat=30 --enable-disabled VS config: vardir: /home/jan/work/mariadb/10.6e/mysql-test/var Checking leftover processes... Removing old var directory... Creating var directory '/home/jan/work/mariadb/10.6e/mysql-test/var'... Checking supported features... MariaDB Version 10.6.15-10-MariaDB-debug - SSL connections supported - binaries are debug compiled - binaries built with wsrep patch Collecting tests... - galera.galera_partition will be run although it's been disabled due to 'MENT-1922 assertion due to unallowed state transition' Installing system database...   ==============================================================================   TEST RESULT TIME (ms) or COMMENT --------------------------------------------------------------------------   worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019 galera.galera_partition 'innodb' [ pass ] 3729 galera.galera_partition 'innodb' [ 2 pass ] 3917 galera.galera_partition 'innodb' [ 3 pass ] 3748 galera.galera_partition 'innodb' [ 4 pass ] 3914 galera.galera_partition 'innodb' [ 5 pass ] 3818 galera.galera_partition 'innodb' [ 6 pass ] 3690 galera.galera_partition 'innodb' [ 7 pass ] 3791 galera.galera_partition 'innodb' [ 8 pass ] 3853 galera.galera_partition 'innodb' [ 9 pass ] 3849 galera.galera_partition 'innodb' [ 10 pass ] 3865 galera.galera_partition 'innodb' [ 11 pass ] 3801 galera.galera_partition 'innodb' [ 12 pass ] 3803 galera.galera_partition 'innodb' [ 13 pass ] 3832 galera.galera_partition 'innodb' [ 14 pass ] 3901 galera.galera_partition 'innodb' [ 15 pass ] 3794 galera.galera_partition 'innodb' [ 16 pass ] 3825 galera.galera_partition 'innodb' [ 17 pass ] 3824 galera.galera_partition 'innodb' [ 18 pass ] 3938 galera.galera_partition 'innodb' [ 19 pass ] 3776 galera.galera_partition 'innodb' [ 20 pass ] 3884 galera.galera_partition 'innodb' [ 21 pass ] 3875 galera.galera_partition 'innodb' [ 22 pass ] 3975 galera.galera_partition 'innodb' [ 23 pass ] 3852 galera.galera_partition 'innodb' [ 24 pass ] 3975 galera.galera_partition 'innodb' [ 25 pass ] 3811 galera.galera_partition 'innodb' [ 26 pass ] 3841 galera.galera_partition 'innodb' [ 27 pass ] 3914 galera.galera_partition 'innodb' [ 28 pass ] 3924 galera.galera_partition 'innodb' [ 29 pass ] 3818 galera.galera_partition 'innodb' [ 30 pass ] 3870 -------------------------------------------------------------------------- The servers were restarted 0 times Spent 115.407 of 154 seconds executing testcases   Completed: All 30 tests were successful.

            I do not think this is related to MDEV-27847 because test does not change wsrep_cluster_address.

            janlindstrom Jan Lindström added a comment - I do not think this is related to MDEV-27847 because test does not change wsrep_cluster_address.
            janlindstrom Jan Lindström added a comment - - edited

            I propose a fix but if this does not help I need more information i.e. error logs and test stdout. Again do git am < file_name to apply. This fix might be good to put 10.4+ (CS/ES)

            janlindstrom Jan Lindström added a comment - - edited I propose a fix but if this does not help I need more information i.e. error logs and test stdout. Again do git am < file_name to apply. This fix might be good to put 10.4+ (CS/ES)
            janlindstrom Jan Lindström added a comment - https://github.com/MariaDB/server/pull/2734
            sysprg Julius Goryavsky added a comment - Fixed, https://github.com/MariaDB/server/commit/fee138a1233d7a74658eaf2a830a3c11da654253

            People

              sysprg Julius Goryavsky
              sysprg Julius Goryavsky
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.