Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18497

Node has been dropped from the cluster after implementing replication between two, 3-nodes Galera cluster

    XMLWordPrintable

Details

    Description

      One of our customer is facing issue after implementing replication between two, 3 nodes galera cluster.

      [Test Environment]

      • MariaDB 10.1.30
      • OP Node (Cluster) : MDBD0, MDBD1, MDBD2
      • DR Node (Cluster) : MDBDG0, MDBDG1, MDBDG2
      • Replication (Dual) : MDBD2 -> MDBDG2, MDBDG2 -> MDBD2

      1. OP3(MDBD2) DB Sevice change to DR3(MDBDG2). And OP3 DATA backup.
      2. Send a OP3 data backup File to DR3
      3. DR3 data file delete, and restore op3 data backup file.
      4. Replication sync completed.
      5. New table create on DR3 DB.

      • OP3 replication completed.
      • DR1,DR2,OP1,OP2 replicaton completed by Galera cluster
        6. But OP3 (MDBD2)DB Down.

      When OP-MDBD2 or DR-MDBDG2 executed the CREATE TABLE AS SELECT (CTAS) statement, we found that an error occurred when there was no data in the table that executed the SELECT statement.

      [Test Scenarios]
      Case#1
      Execute CTAS on a table (sbtest1) with no data in OP-MDBD2.(The same result in DR-MDBDG2)
      CREATE TABLE IF NOT EXISTS temp1 AS (SELECT * FROM sbtest1);
      Result : The following error occurs

      DR-MDBDG2 Error Log
      2019-01-31 16:45:06 140115764099840 [Warning] WSREP: SQL statement was ineffective, THD: 8, buf: 129
      schema: (null)
      QUERY: (null)
      => Skipping replication
      2019-01-31 16:45:06 140115764099840 [ERROR] Slave SQL: Node has dropped from cluster, Gtid 0-202-5957873, Internal MariaDB error code: 1047
      2019-01-31 16:45:06 140115764099840 [Note] Slave SQL thread exiting, replication stopped in log 'maria-bin.000004' at position 790493

      Case#2
      Execute CTAS on the table (sbtest2) in which data exists in OP-MDBD2.(The same result in DR-MDBDG2)
      Result : Both OP-MDBD2 and DR-MDBDG2 are normal

      Case#3
      Execute CTAS on a table (sbtest3) with no data on OP-MDBD0 or OP-MDBD1.(The same result in DR-MDBDG0 or DR-MDBDG1)
      Result : All normal

      Case#4
      Execute 'create table sbtest4 (id int(10), primary key (id));' statement on all nodes instead of CTAS
      Result : All normal

      Error log details:

      2019-01-17 18:42:05 139854293756672 [ERROR] Slave SQL: Node has dropped from cluster, Gtid 0-106-24925607, Internal MariaDB error code: 1047
      2019-01-17 18:42:05 139854293756672 [Note] Slave SQL thread exiting, replication stopped in log 'maria-bin.000004' at position 12399630
      2019-01-17 18:42:05 139854293756672 [Note] WSREP: Slave error due to node temporarily non-primarySQL slave will continue
      2019-01-17 18:42:05 139854293756672 [Note] WSREP: slave restart: 7
      2019-01-17 18:42:05 139854293756672 [Note] WSREP: ready state reached
      2019-01-17 18:42:05 139854293756672 [Note] Slave SQL thread initialized, starting replication in log 'maria-bin.000004' at position 12399630, relay log './relay-log.000002' position: 537
      2019-01-17 18:42:05 139854293756672 [Warning] WSREP: SQL statement was ineffective, THD: 459, buf: 458
      schema: (null) 
      QUERY: (null)
       => Skipping replication
      2019-01-17 18:42:05 139854293756672 [ERROR] WSREP: FSM: no such a transition ROLLED_BACK -> ROLLED_BACK
      190117 18:42:05 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed, 
      something is definitely wrong and this may fail.
       
      Server version: 10.1.30-MariaDB
      key_buffer_size=33554432
      read_buffer_size=1048576
      max_used_connections=100
      max_threads=302
      thread_count=101
      It is possible that mysqld could use up to 
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1585270 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x7f325bf63008
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7f325d7fe298 thread_stack 0x48400
      /db/mariadb/app/bin/mysqld(my_print_stacktrace+0x2e)[0xc192be]
      /db/mariadb/app/bin/mysqld(handle_fatal_signal+0x4bf)[0x77177f]
      /lib64/libpthread.so.0(+0xf680)[0x7f3420d8a680]
      /lib64/libc.so.6(gsignal+0x37)[0x7f341fb96207]
      /lib64/libc.so.6(abort+0x148)[0x7f341fb978f8]
      /usr/lib64/galera/libgalera_smm.so(_ZN6galera3FSMINS_9TrxHandle5StateENS1_10TransitionENS_10EmptyGuardENS_11EmptyActionEE8shift_toES2_+0x17c)[0x7f341d9925cc]
      /usr/lib64/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM13post_rollbackEPNS_9TrxHandleE+0x26)[0x7f341d9883b6]
      /usr/lib64/galera/libgalera_smm.so(galera_post_rollback+0x48)[0x7f341d9997d8]
      /db/mariadb/app/bin/mysqld[0x6fc960]
      /db/mariadb/app/bin/mysqld(_Z17ha_rollback_transP3THDb+0x12e)[0x774ece]
      /db/mariadb/app/bin/mysqld(_Z15ha_commit_transP3THDb+0x32a)[0x77704a]
      /db/mariadb/app/bin/mysqld(_Z12trans_commitP3THD+0x4c)[0x6a721c]
      /db/mariadb/app/bin/mysqld(_ZN13Xid_log_event14do_apply_eventEP14rpl_group_info+0xcd)[0x85dd6d]
      /db/mariadb/app/bin/mysqld[0x537583]
      /db/mariadb/app/bin/mysqld[0x54152d]
      /db/mariadb/app/bin/mysqld(handle_slave_sql+0x150b)[0x54315b]
      /lib64/libpthread.so.0(+0x7dd5)[0x7f3420d82dd5]
      /lib64/libc.so.6(clone+0x6d)[0x7f341fc5eb3d]
       
      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x0): is an invalid pointer
      Connection ID (thread ID): 459
      Status: NOT_KILLED
      

      Attachments

        Activity

          People

            jplindst Jan Lindström (Inactive)
            niljoshi Nilnandan Joshi
            Votes:
            1 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.