Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35610

Server abort, assertion failure, assorted WSREP desync/resync errors upon leaving the cluster

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.5, 10.6, 10.11, 11.4, 11.7
    • 10.5, 10.6, 10.11, 11.4, 11.7
    • wsrep
    • None

    Description

      Place the test case below under mysql-test/suite/galera_3nodes/t (so that it inherits the suite's config) and run as usual. Don't add a cleanup while debugging, as it changes the outcome.

      The test case is non-deterministic, please don't push it into the regression suite, create a deterministic one instead.

      By just re-running it, I get at least the errors quoted below, maybe more. Also, sometimes it hangs, so I recommend running it with --testcase-timeout=1 (or just interrupt it manually if it doesn't end within a minute or so). The hang here is probably another form of a failure. The choice of the galera library doesn't seem to matter, I had failures with at least our 26.4.20 and with 26.4.14 from Debian repos.

      --source include/galera_cluster.inc
       
      CREATE TABLE t (pk INT AUTO_INCREMENT PRIMARY KEY, f INT) ENGINE=InnoDB;
      INSERT INTO t (pk) VALUES (1);
      SET SESSION WSREP_ON= 0;
      INSERT INTO t (pk) VALUES (NULL),(NULL);
      SET SESSION WSREP_ON= 1;
      UPDATE t SET f = 10 WHERE pk = 7;
      BACKUP STAGE START;
      BACKUP STAGE BLOCK_COMMIT;
      

      10.6 f5aed7457348022fb96295b8ca9ecae91f54797d

      2024-12-09 15:16:17 2 [Note] WSREP: Applier thread exiting ret: 0 thd: 2
      2024-12-09 15:16:17 2 [Warning] Aborted connection 2 to db: 'unconnected' user: 'unauthenticated' host: '' (This connection closed normally without authentication)
      2024-12-09 15:16:17 11 [Note] WSREP: Desyncing and pausing the provider
      2024-12-09 15:16:17 11 [ERROR] WSREP: Node desync failed.: 77 (File descriptor in bad state)
               at ./galera/src/replicator_smm.cpp:desync():3158
      2024-12-09 15:16:17 11 [Note] WSREP: pause
      2024-12-09 15:16:17 11 [Note] WSREP: Provider paused at 00000000-0000-0000-0000-000000000000:-1 (18)
      2024-12-09 15:16:17 11 [Warning] WSREP: Failed to pause provider
      mariadbd: /data/bld/10.6-debug/wsrep-lib/src/server_state.cpp:1326: void wsrep::server_state::resync(wsrep::unique_lock<wsrep::mutex>&): Assertion `desync_count_ > 0' fail
      ed.
      241209 15:16:17 [ERROR] mysqld got signal 6 ;
       
      #9  0x00007face5253e32 in __GI___assert_fail (assertion=0x55e8c03f4524 "desync_count_ > 0", file=0x55e8c03f3428 "/data/bld/10.6-debug/wsrep-lib/src/server_state.cpp", line=1326, function=0x55e8c03f44e0 "void wsrep::server_state::resync(wsrep::unique_lock<wsrep::mutex>&)") at ./assert/assert.c:101
      #10 0x000055e8bfd1be9b in wsrep::server_state::resync (this=0x55e8c305b940, lock=...) at /data/bld/10.6-debug/wsrep-lib/src/server_state.cpp:1326
      #11 0x000055e8bfd0eb37 in wsrep::server_state::resync (this=0x55e8c305b940) at /data/bld/10.6-debug/wsrep-lib/include/wsrep/server_state.hpp:448
      #12 0x000055e8bfd181b3 in wsrep::server_state::desync_and_pause (this=0x55e8c305b940) at /data/bld/10.6-debug/wsrep-lib/src/server_state.cpp:623
      #13 0x000055e8bf1b8916 in backup_block_ddl (thd=0x7fac94002098) at /data/bld/10.6-debug/sql/backup.cc:311
      #14 0x000055e8bf1b81f4 in run_backup_stage (thd=0x7fac94002098, stage=BACKUP_LOCK_COMMIT) at /data/bld/10.6-debug/sql/backup.cc:125
      #15 0x000055e8beec020e in mysql_execute_command (thd=0x7fac94002098, is_called_from_prepared_stmt=false) at /data/bld/10.6-debug/sql/sql_parse.cc:5253
      #16 0x000055e8beec9d64 in mysql_parse (thd=0x7fac94002098, rawbuf=0x7fac940157e0 "BACKUP STAGE BLOCK_COMMIT", length=25, parser_state=0x7facc2bb1380) at /data/bld/10.6-debug/sql/sql_parse.cc:8194
      #17 0x000055e8beec9408 in wsrep_mysql_parse (thd=0x7fac94002098, rawbuf=0x7fac940157e0 "BACKUP STAGE BLOCK_COMMIT", length=25, parser_state=0x7facc2bb1380) at /data/bld/10.6-debug/sql/sql_parse.cc:8005
      #18 0x000055e8beeb5140 in dispatch_command (command=COM_QUERY, thd=0x7fac94002098, packet=0x7fac9400cdd9 "BACKUP STAGE BLOCK_COMMIT", packet_length=25, blocking=true) at /data/bld/10.6-debug/sql/sql_parse.cc:1895
      #19 0x000055e8beeb3b91 in do_command (thd=0x7fac94002098, blocking=true) at /data/bld/10.6-debug/sql/sql_parse.cc:1421
      #20 0x000055e8bf08f508 in do_handle_one_connection (connect=0x55e8c37db5d8, put_in_cache=true) at /data/bld/10.6-debug/sql/sql_connect.cc:1407
      #21 0x000055e8bf08f289 in handle_one_connection (arg=0x55e8c37db5d8) at /data/bld/10.6-debug/sql/sql_connect.cc:1319
      #22 0x000055e8bf5f4aea in pfs_spawn_thread (arg=0x55e8c3097ef8) at /data/bld/10.6-debug/storage/perfschema/pfs.cc:2201
      #23 0x00007face52a8044 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
      #24 0x00007face532861c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
      

      2024-12-09 15:27:57 2 [Note] WSREP: Applier thread exiting ret: 0 thd: 2
      2024-12-09 15:27:57 2 [Warning] Aborted connection 2 to db: 'unconnected' user: 'unauthenticated' host: '' (This connection closed normally without authentication)
      2024-12-09 15:27:57 11 [Note] WSREP: Resuming and resyncing the provider
      2024-12-09 15:27:57 11 [Note] WSREP: resume
      2024-12-09 15:27:57 11 [Warning] WSREP: tried to resume unpaused provider
      2024-12-09 15:27:57 11 [ERROR] WSREP: ./gcs/src/gcs.cpp:s_join():960: Sending JOIN failed: -103 (Software caused connection abort).
      2024-12-09 15:27:57 11 [ERROR] WSREP: gcs_join(00000000-0000-0000-0000-000000000000:-1) failed: 103 (Software caused connection abort)
               at ./galera/src/galera_gcs.hpp:join():231
      2024-12-09 15:27:57 11 [Warning] WSREP: Resume and resync failed, server may have to be restarted
      

      2024-12-09 15:31:07 11 [Note] WSREP: Desyncing and pausing the provider
      2024-12-09 15:31:07 11 [ERROR] WSREP: Node desync failed.: 77 (File descriptor in bad state)
               at ./galera/src/replicator_smm.cpp:desync():3158
      2024-12-09 15:31:07 11 [Note] WSREP: pause
      2024-12-09 15:31:07 2 [Note] WSREP: view(view_id(NON_PRIM,d3843508-9269,3) memb {
              d3843508-9269,0
      } joined {
      } left {
      } partitioned {
              d39c62ef-ad59,0
              d39f6041-adf6,0
      })
      

      2024-12-09 15:43:07 11 [Warning] WSREP: Failed to pause provider
      2024-12-09 15:43:07 11 [ERROR] WSREP: ./gcs/src/gcs.cpp:s_join():960: Sending JOIN failed: -103 (Software caused connection abort).
      2024-12-09 15:43:07 0 [Note] WSREP: Service thread queue flushed.
      2024-12-09 15:43:07 2 [Note] WSREP: ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:-1, protocol version: 5
      2024-12-09 15:43:07 11 [ERROR] WSREP: gcs_join(00000000-0000-0000-0000-000000000000:-1) failed: 103 (Software caused connection abort)
               at ./galera/src/galera_gcs.hpp:join():231
      2024-12-09 15:43:07 2 [Note] WSREP: Applier thread exiting ret: 0 thd: 2
      terminate called after throwing an instance of 'wsrep::runtime_error'
        what():  Failed to resync
      241209 15:43:07 [ERROR] mysqld got signal 6 ;
       
      #3  <signal handler called>
      #4  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
      #5  0x00007f8969aa9d9f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
      #6  0x00007f8969a5af32 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
      #7  0x00007f8969a45472 in __GI_abort () at ./stdlib/abort.c:79
      #8  0x00007f8969c9d919 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
      #9  0x00007f8969ca8e1a in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
      #10 0x00007f8969ca8e85 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
      #11 0x00007f8969ca90d8 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6
      #12 0x000055c3a876ff46 in wsrep::server_state::resync (this=0x55c3ab76a940, lock=...) at /data/bld/10.6-debug/wsrep-lib/src/server_state.cpp:1332
      #13 0x000055c3a8762b37 in wsrep::server_state::resync (this=0x55c3ab76a940) at /data/bld/10.6-debug/wsrep-lib/include/wsrep/server_state.hpp:448
      #14 0x000055c3a876c1b3 in wsrep::server_state::desync_and_pause (this=0x55c3ab76a940) at /data/bld/10.6-debug/wsrep-lib/src/server_state.cpp:623
      #15 0x000055c3a7c0c916 in backup_block_ddl (thd=0x7f8914000dc8) at /data/bld/10.6-debug/sql/backup.cc:311
      #16 0x000055c3a7c0c1f4 in run_backup_stage (thd=0x7f8914000dc8, stage=BACKUP_LOCK_COMMIT) at /data/bld/10.6-debug/sql/backup.cc:125
      #17 0x000055c3a791420e in mysql_execute_command (thd=0x7f8914000dc8, is_called_from_prepared_stmt=false) at /data/bld/10.6-debug/sql/sql_parse.cc:5253
      #18 0x000055c3a791dd64 in mysql_parse (thd=0x7f8914000dc8, rawbuf=0x7f891401e690 "BACKUP STAGE BLOCK_COMMIT", length=25, parser_state=0x7f895c442380) at /data/bld/10.6-debug/sql/sql_parse.cc:8194
      #19 0x000055c3a791d408 in wsrep_mysql_parse (thd=0x7f8914000dc8, rawbuf=0x7f891401e690 "BACKUP STAGE BLOCK_COMMIT", length=25, parser_state=0x7f895c442380) at /data/bld/10.6-debug/sql/sql_parse.cc:8005
      #20 0x000055c3a7909140 in dispatch_command (command=COM_QUERY, thd=0x7f8914000dc8, packet=0x7f8914016539 "BACKUP STAGE BLOCK_COMMIT", packet_length=25, blocking=true) at /data/bld/10.6-debug/sql/sql_parse.cc:1895
      #21 0x000055c3a7907b91 in do_command (thd=0x7f8914000dc8, blocking=true) at /data/bld/10.6-debug/sql/sql_parse.cc:1421
      #22 0x000055c3a7ae3508 in do_handle_one_connection (connect=0x55c3abeea4b8, put_in_cache=true) at /data/bld/10.6-debug/sql/sql_connect.cc:1407
      #23 0x000055c3a7ae3289 in handle_one_connection (arg=0x55c3abeea4b8) at /data/bld/10.6-debug/sql/sql_connect.cc:1319
      #24 0x000055c3a8048aea in pfs_spawn_thread (arg=0x55c3ab7a6ef8) at /data/bld/10.6-debug/storage/perfschema/pfs.cc:2201
      #25 0x00007f8969aa8044 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
      #26 0x00007f8969b2861c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
      

      Attachments

        Activity

          People

            sysprg Julius Goryavsky
            elenst Elena Stepanova
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.