Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32024

Galera library 26.4.16 fails with every server version

Details

    Description

      MariaDB tests in CI mainly run with the latest Galera library from mariadb-4.x branch, as was demanded by Galera development.

      The recently pushed 26.4.16 library fails everywhere, with a variety of errors, and thus makes the server CI unusable.

      errors

      galera_sr.GCF-1060 'innodb'              w2 [ fail ]  Found warnings/errors in server log file!
              Test ended at 2023-08-23 03:55:04
      line
      2023-08-23  3:54:54 25 [Warning] WSREP: Failed to insert streaming client 25
      2023-08-23  3:54:54 25 [Warning] WSREP: Failed to insert streaming client 25
      2023-08-23  3:54:54 25 [Warning] WSREP: Failed to insert streaming client 25
      2023-08-23  3:54:54 25 [Warning] WSREP: Failed to insert streaming client 25
      

      crashes

      galera.galera_sequences 'innodb'         w2 [ retry-fail ]
              Test ended at 2023-08-24 07:27:36
       
      CURRENT_TEST: galera.galera_sequences
      mysqltest: At line 178: query 'INSERT INTO t1(b) values (2)' failed with wrong errno 2013: 'Lost connection to MySQL server during query', instead of 0...
      

      wrong results

      wsrep.wsrep_provider_plugin_defaults 'innodb' w2 [ retry-fail ]
              Test ended at 2023-08-24 14:25:01
       
      CURRENT_TEST: wsrep.wsrep_provider_plugin_defaults
      --- /usr/share/mariadb-test/suite/wsrep/r/wsrep_provider_plugin_defaults.result	2023-08-24 07:55:52.000000000 +0000
      +++ /dev/shm/var/2/log/wsrep_provider_plugin_defaults.reject	2023-08-24 14:25:01.342887420 +0000
      @@ -10,7 +10,7 @@
       'wsrep_provider_signal',
       'wsrep_provider_gmcast_listen_addr');
       COUNT(*)
      -83
      +84
       SELECT * FROM INFORMATION_SCHEMA.SYSTEM_VARIABLES
       WHERE VARIABLE_NAME LIKE 'wsrep_provider_%' AND VARIABLE_NAME NOT IN (
       'wsrep_provider',
      @@ -998,6 +998,21 @@
       READ_ONLY	NO
       COMMAND_LINE_ARGUMENT	REQUIRED
       GLOBAL_VALUE_PATH	NULL
      +VARIABLE_NAME	WSREP_PROVIDER_PROTONET_BACKEND
      +SESSION_VALUE	NULL
      +GLOBAL_VALUE	asio
      +GLOBAL_VALUE_ORIGIN	COMPILE-TIME
      +DEFAULT_VALUE	asio
      +VARIABLE_SCOPE	GLOBAL
      +VARIABLE_TYPE	VARCHAR
      +VARIABLE_COMMENT	Wsrep provider option
      +NUMERIC_MIN_VALUE	NULL
      +NUMERIC_MAX_VALUE	NULL
      +NUMERIC_BLOCK_SIZE	NULL
      +ENUM_VALUE_LIST	NULL
      +READ_ONLY	YES
      +COMMAND_LINE_ARGUMENT	REQUIRED
      +GLOBAL_VALUE_PATH	NULL
       VARIABLE_NAME	WSREP_PROVIDER_PROTONET_VERSION
       SESSION_VALUE	NULL
       GLOBAL_VALUE	0
       
      mysqltest: Result length mismatch
      

      And so on, it's not a full list.

      Please check and fix it, and please at least run Galera tests locally before pushing something to main.

      Attachments

        Issue Links

          Activity

            FYI, I've fixed galera.galera_as_slave_gtid_myisam in 10.10 and wsrep.wsrep_provider_plugin_defaults in 11.0

            serg Sergei Golubchik added a comment - FYI, I've fixed galera.galera_as_slave_gtid_myisam in 10.10 and wsrep.wsrep_provider_plugin_defaults in 11.0

            janlindstrom, I don't understand how you can fix a crash with changes in the test

            serg Sergei Golubchik added a comment - janlindstrom , I don't understand how you can fix a crash with changes in the test

            And, please, when fixing sequences, apply the following patch too (I cannot do it while the test is crashing):

            --- a/sql/sql_table.cc
            +++ b/sql/sql_table.cc
            @@ -5324,7 +5324,7 @@ bool wsrep_check_sequence(THD* thd, const sequence_defini>
                 if (db_type != DB_TYPE_INNODB)
                 {
                   my_error(ER_NOT_SUPPORTED_YET, MYF(0),
            -               "Galera cluster does support only InnoDB sequences");
            +               "non-InnoDB sequences in Galera cluster");
                   return(true);
                 }
             
            @@ -5335,8 +5335,7 @@ bool wsrep_check_sequence(THD* thd, const sequence_defini>
                     seq->cache)
                 {
                   my_error(ER_NOT_SUPPORTED_YET, MYF(0),
            -               "In Galera if you use CACHE you should set INCREMENT BY 0"
            -              " to behave correctly in a cluster");
            +               "CACHE without INCREMENT BY 0 in Galera cluster");
                   return(true);
                 }
             
            

            Currently it prints

            ERROR 42000: This version of MariaDB doesn't yet support 'Galera cluster does support only InnoDB sequences'
            

            which looks rather silly

            serg Sergei Golubchik added a comment - And, please, when fixing sequences, apply the following patch too (I cannot do it while the test is crashing): --- a/sql/sql_table.cc +++ b/sql/sql_table.cc @@ -5324,7 +5324,7 @@ bool wsrep_check_sequence(THD* thd, const sequence_defini> if (db_type != DB_TYPE_INNODB) { my_error(ER_NOT_SUPPORTED_YET, MYF(0), - "Galera cluster does support only InnoDB sequences"); + "non-InnoDB sequences in Galera cluster"); return(true); } @@ -5335,8 +5335,7 @@ bool wsrep_check_sequence(THD* thd, const sequence_defini> seq->cache) { my_error(ER_NOT_SUPPORTED_YET, MYF(0), - "In Galera if you use CACHE you should set INCREMENT BY 0" - " to behave correctly in a cluster"); + "CACHE without INCREMENT BY 0 in Galera cluster"); return(true); } Currently it prints ERROR 42000: This version of MariaDB doesn't yet support 'Galera cluster does support only InnoDB sequences' which looks rather silly

            janlindstrom At the moment we have a server crash and lost connection, and not just a hang in the test:

            CURRENT_TEST: galera.galera_sequences
            mysqltest: At line 236: query 'INSERT INTO t1(b) values (1),(2),(3),(4),(5),(6),(7),(8),(9)' failed with wrong errno 2013: 'Lost connection to MySQL server during query', instead of 0...
             
            The result from queries just before the failure was:
            < snip >
            @@auto_increment_offset
            2
            SET SESSION wsrep_sync_wait=0;
            connection node_1;
            connection node_2;
            connection node_1;
            DROP SEQUENCE t;
            DROP TABLE t1;
            CREATE SEQUENCE t INCREMENT BY 0 NOCACHE ENGINE=INNODB;
            DROP SEQUENCE t;
            CREATE SEQUENCE t INCREMENT BY 1 CACHE=20 ENGINE=INNODB;
            ERROR 42000: This version of MariaDB doesn't yet support 'In Galera if you use CACHE you should set INCREMENT BY 0 to behave correctly in a cluster'
            CREATE SEQUENCE t INCREMENT BY 0 CACHE=20 ENGINE=INNODB;
            CREATE TABLE t1(a int not null primary key default nextval(t), b int) engine=innodb;
            connection node_2;
            # Wait DDL to replicate
            connection node_1;
            SET SESSION wsrep_sync_wait=0;
            connection node_2;
            SET SESSION wsrep_sync_wait=0;
             
            More results from queries before failure can be found in /dev/shm/var/1/log/galera_sequences.log

            and:

            WSREP_SST: [INFO] rsync SST completed on donor (20230925 06:53:55.214)
            2023-09-25  6:53:55 0 [Note] WSREP: Donor monitor thread ended with total time 2 sec
            2023-09-25  6:53:55 0 [Note] WSREP: (492dcd42-89d1, 'tcp://0.0.0.0:16002') turning message relay requesting off
            2023-09-25  6:53:56 0 [Note] WSREP: async IST sender served
            2023-09-25  6:53:56 0 [Note] WSREP: 1.0 (centos74-amd64): State transfer from 0.0 (centos74-amd64) complete.
            2023-09-25  6:53:56 0 [Note] WSREP: Member 1.0 (centos74-amd64) synced with group.
            2023-09-25  6:53:58 0 [Note] WSREP: Member 1.0 (centos74-amd64) desyncs itself from group
            2023-09-25  6:53:58 0 [Note] WSREP: Member 1.0 (centos74-amd64) resyncs itself to group.
            2023-09-25  6:53:58 0 [Note] WSREP: Member 1.0 (centos74-amd64) synced with group.
            2023-09-25  6:53:58 0 [Note] WSREP: Member 1.0 (centos74-amd64) desyncs itself from group
            2023-09-25  6:53:58 0 [Note] WSREP: Member 1.0 (centos74-amd64) resyncs itself to group.
            2023-09-25  6:53:58 0 [Note] WSREP: Member 1.0 (centos74-amd64) synced with group.
            2023-09-25  6:53:58 1 [ERROR] Slave SQL: Error 'Unknown table 'test.t1'' on query. Default database: 'test'. Query: 'DROP TABLE t1', Internal MariaDB error code: 1051
            2023-09-25  6:53:58 1 [Warning] WSREP: Ignoring error 'Unknown table 'test.t1'' on query. Default database: 'test'. Query: 'DROP TABLE t1', Error_code: 1051
            2023-09-25  6:53:58 1 [ERROR] Slave SQL: Error 'Unknown SEQUENCE: 'test.sq2'' on query. Default database: 'test'. Query: 'DROP SEQUENCE sq2', Internal MariaDB error code: 4091
            2023-09-25  6:53:58 1 [Warning] WSREP: Ignoring error 'Unknown SEQUENCE: 'test.sq2'' on query. Default database: 'test'. Query: 'DROP SEQUENCE sq2', Error_code: 4091
            2023-09-25  6:53:58 17 [ERROR] WSREP: FSM: no such a transition REPLICATING -> COMMITTED
            230925  6:53:58 [ERROR] mysqld got signal 6 ;
            This could be because you hit a bug. It is also possible that this binary
            or one of the libraries it was linked against is corrupt, improperly built,
            or misconfigured. This error can also be caused by malfunctioning hardware.
             
            To report this bug, see https://mariadb.com/kb/en/reporting-bugs
             
            We will try our best to scrape up some info that will hopefully help
            diagnose the problem, but since we have already crashed, 
            something is definitely wrong and this may fail.
             
            Server version: 10.4.32-MariaDB-log source revision: 3ac25b480055e7e99e46a958c04f9ffb7a6d68cf
            key_buffer_size=1048576
            read_buffer_size=131072
            max_used_connections=2
            max_threads=153
            thread_count=10
            It is possible that mysqld could use up to 
            key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 63557 K  bytes of memory
            Hope that's ok; if not, decrease some variables in the equation.
             
            Thread pointer: 0x563c9cbbc808
            Attempting backtrace. You can use the following information to find out
            where mysqld died. If you see no messages after this, something went
            terribly wrong...
            stack_bottom = 0x7f6d85e8dc40 thread_stack 0x49000
            mysys/stacktrace.c:175(my_print_stacktrace)[0x563c9301d4de]
            sql/signal_handler.cc:238(handle_fatal_signal)[0x563c92a6d687]
            sigaction.c:0(__restore_rt)[0x7f6d8d4165e0]
            /lib64/libc.so.6(gsignal+0x37)[0x7f6d8c86b1f7]
            /lib64/libc.so.6(abort+0x148)[0x7f6d8c86c8e8]
            src/fsm.hpp:56(galera::FSM<galera::TrxHandle::State, galera::TrxHandle::Transition>::shift_to(galera::TrxHandle::State, int))[0x7f6d892e4cda]
            src/replicator_smm.cpp:1423(galera::ReplicatorSMM::commit_order_leave(galera::TrxHandleSlave&, wsrep_buf const*))[0x7f6d892f44bb]
            detail/shared_count.hpp:371(galera_commit_order_leave)[0x7f6d892e0468]
            /usr/sbin/mysqld(_ZN5wsrep18wsrep_provider_v2618commit_order_leaveERKNS_9ws_handleERKNS_7ws_metaERKNS_14mutable_bufferE+0x91)[0x563c930ab001]
            src/wsrep_provider_v26.cpp:969(wsrep::wsrep_provider_v26::commit_order_leave(wsrep::ws_handle const&, wsrep::ws_meta const&, wsrep::mutable_buffer const&))[0x563c930a4ee0]
            src/transaction.cpp:579(wsrep::transaction::ordered_commit())[0x563c92b5aae9]
            sql/log.cc:7822(MYSQL_BIN_LOG::queue_for_group_commit(MYSQL_BIN_LOG::group_commit_entry*))[0x563c92b6001c]
            sql/log.cc:7480(MYSQL_BIN_LOG::write_transaction_to_binlog(THD*, binlog_cache_mngr*, Log_event*, bool, bool, bool))[0x563c92b604b0]
            sql/log.cc:516(binlog_cache_mngr::reset(bool, bool))[0x563c92b6066d]
            sql/log.cc:1814(binlog_commit_flush_stmt_cache(THD*, bool, binlog_cache_mngr*))[0x563c92b60894]
            sql/log.cc:2091(binlog_rollback(handlerton*, THD*, bool))[0x563c92b60a7f]
            sql/handler.cc:1956(ha_rollback_trans(THD*, bool))[0x563c92a70f6b]
            sql/handler.cc:1747(ha_commit_trans(THD*, bool))[0x563c92a71c94]
            sql/transaction.cc:438(trans_commit_stmt(THD*))[0x563c9297121f]
            sql/sql_class.h:4028(THD::get_stmt_da())[0x563c92871242]
            sql/sql_parse.cc:8013(mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool))[0x563c9287903b]
            sql/sql_class.h:4028(THD::get_stmt_da())[0x563c928798a6]
            sql/sql_parse.cc:1843(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool))[0x563c9287c77e]
            sql/sql_parse.cc:1379(do_command(THD*))[0x563c9287ce22]
            sql/sql_connect.cc:1420(do_handle_one_connection(CONNECT*))[0x563c92962512]
            sql/sql_connect.cc:1326(handle_one_connection)[0x563c929625fd]
            perfschema/pfs.cc:1872(pfs_spawn_thread)[0x563c92cef3ed]
            pthread_create.c:0(start_thread)[0x7f6d8d40ee25]
            /lib64/libc.so.6(clone+0x6d)[0x7f6d8c92e34d]
             
            Trying to get some variables.
            Some pointers may be invalid and cause the dump to abort.
            Query (0x563c9ccdc020): INSERT INTO t1(b) values (1),(2),(3),(4),(5),(6),(7),(8),(9)
             
            Connection ID (thread ID): 17
            Status: KILL_QUERY

            sysprg Julius Goryavsky added a comment - janlindstrom At the moment we have a server crash and lost connection, and not just a hang in the test: CURRENT_TEST: galera.galera_sequences mysqltest: At line 236: query 'INSERT INTO t1(b) values (1),(2),(3),(4),(5),(6),(7),(8),(9)' failed with wrong errno 2013: 'Lost connection to MySQL server during query', instead of 0...   The result from queries just before the failure was: < snip > @@auto_increment_offset 2 SET SESSION wsrep_sync_wait=0; connection node_1; connection node_2; connection node_1; DROP SEQUENCE t; DROP TABLE t1; CREATE SEQUENCE t INCREMENT BY 0 NOCACHE ENGINE=INNODB; DROP SEQUENCE t; CREATE SEQUENCE t INCREMENT BY 1 CACHE=20 ENGINE=INNODB; ERROR 42000: This version of MariaDB doesn't yet support 'In Galera if you use CACHE you should set INCREMENT BY 0 to behave correctly in a cluster' CREATE SEQUENCE t INCREMENT BY 0 CACHE=20 ENGINE=INNODB; CREATE TABLE t1(a int not null primary key default nextval(t), b int) engine=innodb; connection node_2; # Wait DDL to replicate connection node_1; SET SESSION wsrep_sync_wait=0; connection node_2; SET SESSION wsrep_sync_wait=0;   More results from queries before failure can be found in /dev/shm/var/1/log/galera_sequences.log and: WSREP_SST: [INFO] rsync SST completed on donor (20230925 06:53:55.214) 2023-09-25 6:53:55 0 [Note] WSREP: Donor monitor thread ended with total time 2 sec 2023-09-25 6:53:55 0 [Note] WSREP: (492dcd42-89d1, 'tcp://0.0.0.0:16002') turning message relay requesting off 2023-09-25 6:53:56 0 [Note] WSREP: async IST sender served 2023-09-25 6:53:56 0 [Note] WSREP: 1.0 (centos74-amd64): State transfer from 0.0 (centos74-amd64) complete. 2023-09-25 6:53:56 0 [Note] WSREP: Member 1.0 (centos74-amd64) synced with group. 2023-09-25 6:53:58 0 [Note] WSREP: Member 1.0 (centos74-amd64) desyncs itself from group 2023-09-25 6:53:58 0 [Note] WSREP: Member 1.0 (centos74-amd64) resyncs itself to group. 2023-09-25 6:53:58 0 [Note] WSREP: Member 1.0 (centos74-amd64) synced with group. 2023-09-25 6:53:58 0 [Note] WSREP: Member 1.0 (centos74-amd64) desyncs itself from group 2023-09-25 6:53:58 0 [Note] WSREP: Member 1.0 (centos74-amd64) resyncs itself to group. 2023-09-25 6:53:58 0 [Note] WSREP: Member 1.0 (centos74-amd64) synced with group. 2023-09-25 6:53:58 1 [ERROR] Slave SQL: Error 'Unknown table 'test.t1'' on query. Default database: 'test'. Query: 'DROP TABLE t1', Internal MariaDB error code: 1051 2023-09-25 6:53:58 1 [Warning] WSREP: Ignoring error 'Unknown table 'test.t1'' on query. Default database: 'test'. Query: 'DROP TABLE t1', Error_code: 1051 2023-09-25 6:53:58 1 [ERROR] Slave SQL: Error 'Unknown SEQUENCE: 'test.sq2'' on query. Default database: 'test'. Query: 'DROP SEQUENCE sq2', Internal MariaDB error code: 4091 2023-09-25 6:53:58 1 [Warning] WSREP: Ignoring error 'Unknown SEQUENCE: 'test.sq2'' on query. Default database: 'test'. Query: 'DROP SEQUENCE sq2', Error_code: 4091 2023-09-25 6:53:58 17 [ERROR] WSREP: FSM: no such a transition REPLICATING -> COMMITTED 230925 6:53:58 [ERROR] mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware.   To report this bug, see https://mariadb.com/kb/en/reporting-bugs   We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail.   Server version: 10.4.32-MariaDB-log source revision: 3ac25b480055e7e99e46a958c04f9ffb7a6d68cf key_buffer_size=1048576 read_buffer_size=131072 max_used_connections=2 max_threads=153 thread_count=10 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 63557 K bytes of memory Hope that's ok; if not, decrease some variables in the equation.   Thread pointer: 0x563c9cbbc808 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7f6d85e8dc40 thread_stack 0x49000 mysys/stacktrace.c:175(my_print_stacktrace)[0x563c9301d4de] sql/signal_handler.cc:238(handle_fatal_signal)[0x563c92a6d687] sigaction.c:0(__restore_rt)[0x7f6d8d4165e0] /lib64/libc.so.6(gsignal+0x37)[0x7f6d8c86b1f7] /lib64/libc.so.6(abort+0x148)[0x7f6d8c86c8e8] src/fsm.hpp:56(galera::FSM<galera::TrxHandle::State, galera::TrxHandle::Transition>::shift_to(galera::TrxHandle::State, int))[0x7f6d892e4cda] src/replicator_smm.cpp:1423(galera::ReplicatorSMM::commit_order_leave(galera::TrxHandleSlave&, wsrep_buf const*))[0x7f6d892f44bb] detail/shared_count.hpp:371(galera_commit_order_leave)[0x7f6d892e0468] /usr/sbin/mysqld(_ZN5wsrep18wsrep_provider_v2618commit_order_leaveERKNS_9ws_handleERKNS_7ws_metaERKNS_14mutable_bufferE+0x91)[0x563c930ab001] src/wsrep_provider_v26.cpp:969(wsrep::wsrep_provider_v26::commit_order_leave(wsrep::ws_handle const&, wsrep::ws_meta const&, wsrep::mutable_buffer const&))[0x563c930a4ee0] src/transaction.cpp:579(wsrep::transaction::ordered_commit())[0x563c92b5aae9] sql/log.cc:7822(MYSQL_BIN_LOG::queue_for_group_commit(MYSQL_BIN_LOG::group_commit_entry*))[0x563c92b6001c] sql/log.cc:7480(MYSQL_BIN_LOG::write_transaction_to_binlog(THD*, binlog_cache_mngr*, Log_event*, bool, bool, bool))[0x563c92b604b0] sql/log.cc:516(binlog_cache_mngr::reset(bool, bool))[0x563c92b6066d] sql/log.cc:1814(binlog_commit_flush_stmt_cache(THD*, bool, binlog_cache_mngr*))[0x563c92b60894] sql/log.cc:2091(binlog_rollback(handlerton*, THD*, bool))[0x563c92b60a7f] sql/handler.cc:1956(ha_rollback_trans(THD*, bool))[0x563c92a70f6b] sql/handler.cc:1747(ha_commit_trans(THD*, bool))[0x563c92a71c94] sql/transaction.cc:438(trans_commit_stmt(THD*))[0x563c9297121f] sql/sql_class.h:4028(THD::get_stmt_da())[0x563c92871242] sql/sql_parse.cc:8013(mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool))[0x563c9287903b] sql/sql_class.h:4028(THD::get_stmt_da())[0x563c928798a6] sql/sql_parse.cc:1843(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool))[0x563c9287c77e] sql/sql_parse.cc:1379(do_command(THD*))[0x563c9287ce22] sql/sql_connect.cc:1420(do_handle_one_connection(CONNECT*))[0x563c92962512] sql/sql_connect.cc:1326(handle_one_connection)[0x563c929625fd] perfschema/pfs.cc:1872(pfs_spawn_thread)[0x563c92cef3ed] pthread_create.c:0(start_thread)[0x7f6d8d40ee25] /lib64/libc.so.6(clone+0x6d)[0x7f6d8c92e34d]   Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (0x563c9ccdc020): INSERT INTO t1(b) values (1),(2),(3),(4),(5),(6),(7),(8),(9)   Connection ID (thread ID): 17 Status: KILL_QUERY
            janlindstrom Jan Lindström added a comment - https://github.com/MariaDB/server/pull/2793
            sysprg Julius Goryavsky added a comment - - edited Fix merged with head revision: https://github.com/MariaDB/server/commit/e913f4e11e1e519196f276d7c5689f653e724547 Remaining part moved to separate task: https://jira.mariadb.org/browse/MDEV-32561

            People

              sysprg Julius Goryavsky
              elenst Elena Stepanova
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.