Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-21114

Galera: RQG: Nodes disconnect from the cluster after HA_ERR_KEY_NOT_FOUND and HA_ERR_FOUND_DUPP_KEY

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.4.11
    • N/A
    • Galera
    • OS: CentOS Linux release 7.6.1810 (Core).

    Description

      When RQG is used with --galera=mms, master node 1 and slave node 2 disconnect from the cluster after HA_ERR_KEY_NOT_FOUND and HA_ERR_FOUND_DUPP_KEY accordingly.

      MariaDB Version 10.4.11-MariaDB-debug: Repository: MariaDB/server; branch 10.4; Revision ae72205e31e7665238c1757ede352d9ca53d5327.

      Galera lib 26.4.3(r4548) debug: Repository: MariaDB/galera; branch mariadb-4.x; Revision a5431753a3f6bdd348adcbca00e3450ba0ef9045.

      RQG: Repository: MariaDB/randgen; master branch; Revision bea5b6b07d08ba6349d8a6ff4356b01678822727.

      Run:

      perl ./runall-new.pl --grammar=conf/galera/galera_stress.yy --gendata=conf/galera/galera_stress.zz --duration=600 --queries=200M --threads=1 --galera=mms --basedir=/home/stepan/mariadb/10.4/git --vardir=/home/stepan/rqg/github/var "--mysqld=--wsrep-provider=/usr/lib/libgalera_smm_4.so" "--mysqld=--wsrep_sst_method=rsync" "--mysqld=--core" "--mysqld=--general-log" "--mysqld=--general-log-file=queries.log" "--mysqld=--log-output=file" "--mysqld=--wsrep-debug=server" "--mysqld=--wsrep-sync-wait=15" "--mysqld=--wsrep_retry_autocommit=0" "--mysqld=--wsrep_log_conflicts=1" "--mysqld=--wsrep_on=ON"
      

      Master node 1:
      2019-11-20 16:22:09 2 [ERROR] mysqld: Can't find record in 'table100_innodb_int'
      2019-11-20 16:22:09 2 [ERROR] Slave SQL: Could not execute Update_rows_v1 event on table test.table100_innodb_int; Can't find record in 'table100_innodb_int', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 321, Internal MariaDB error code: 1032
      2019-11-20 16:22:09 2 [ERROR] WSREP: Failed to apply write set: gtid: 09f8e15a-0ba1-11ea-bed3-56e4b108bea5:130 server_id: 09f7e5f7-0ba1-11ea-a944-d7f74653ae06 client_id: 17 trx_id: 1748 flags: 3 (start_transaction | commit)
      2019-11-20 16:22:09 2 [Note] [Debug] WSREP: server cnt7glr11.localdomain state change: connected -> disconnected
      2019-11-20 16:22:09 2 [Note] WSREP: Server status change connected -> disconnected
      2019-11-20 16:22:09 2 [Warning] Aborted connection 2 to db: 'unconnected' user: 'unauthenticated' host: '' (This connection closed normally without authentication)

      2019-11-20 16:22:09 2 [Note] WSREP: assigned new next trx id: 1384
      2019-11-20 16:22:09 2 [ERROR] mysqld: Can't find record in 'table100_innodb_int'
      2019-11-20 16:22:09 2 [ERROR] Slave SQL: Could not execute Update_rows_v1 event on table test.table100_innodb_int; Can't find record in 'table100_innodb_int', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 321, Internal MariaDB error code: 1032
      2019-11-20 16:22:09 2 [Note] WSREP: ha_rollback_trans(2, FALSE) rolled back: REPLACE INTO `table100_innodb_int` ( `col_char_12`, `col_char_12_key` ) VALUES ( 'u', 406257664 ), ( 4083875840, 5 ) /* QNO 1280 CON_ID 17 */L՝: Can't find record in 'table100_innodb_int'; is_real 0
      2019-11-20 16:22:09 2 [Warning] WSREP: Event 3 Update_rows_v1 apply failed: 120, seqno 130
      2019-11-20 16:22:09 2 [Note] WSREP: ha_rollback_trans(2, TRUE) rolled back: REPLACE INTO `table100_innodb_int` ( `col_char_12`, `col_char_12_key` ) VALUES ( 'u', 406257664 ), ( 4083875840, 5 ) /* QNO 1280 CON_ID 17 */L՝: Can't find record in 'table100_innodb_int'; is_real 1
      2019-11-20 16:22:09 2 [Note] WSREP: wsrep_after_apply 2
      2019-11-20 16:22:09 2 [Note] WSREP: wsrep_thd_binlog_reset
      2019-11-20 16:22:09 2 [Note] WSREP: Log dummy write set 130
      2019-11-20 16:22:09 2 [Note] WSREP: wsrep_thd_binlog_reset
      2019-11-20 16:22:09 2 [ERROR] WSREP: Failed to apply write set: gtid: 09f8e15a-0ba1-11ea-bed3-56e4b108bea5:130 server_id: 09f7e5f7-0ba1-11ea-a944-d7f74653ae06 client_id: 17 trx_id: 1748 flags: 3 (start_transaction | commit)
      2019-11-20 16:22:09 2 [Note] WSREP: Closing send monitor...
      2019-11-20 16:22:09 2 [Note] WSREP: Closed send monitor.
      2019-11-20 16:22:09 2 [Note] WSREP: gcomm: terminating thread
      2019-11-20 16:22:09 2 [Note] WSREP: gcomm: joining thread
      2019-11-20 16:22:09 2 [Note] WSREP: gcomm: closing backend
      2019-11-20 16:22:09 17 [Note] WSREP: assigned new next trx id: 1385
      2019-11-20 16:22:09 2 [Note] WSREP: view(view_id(NON_PRIM,09f7e5f7,3) memb {
      	0a54a855,0
      } joined {
      } left {
      } partitioned {
      	09f7e5f7,0
      	0d528ca2,0
      })
      2019-11-20 16:22:09 2 [Note] WSREP: PC protocol downgrade 1 -> 0
      2019-11-20 16:22:09 17 [Note] WSREP: wsrep_sync_wait: thd->variables.wsrep_sync_wait= 15, mask= 4, thd->variables.wsrep_on= 1
      2019-11-20 16:22:09 2 [Note] WSREP: view((empty))
      2019-11-20 16:22:09 2 [Note] WSREP: gcomm: closed
      2019-11-20 16:22:09 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
      2019-11-20 16:22:09 0 [Note] WSREP: Flow-control interval: [16, 16]
      2019-11-20 16:22:09 0 [Note] WSREP: Received NON-PRIMARY.
      2019-11-20 16:22:09 0 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 130)
      2019-11-20 16:22:09 0 [Note] WSREP: New SELF-LEAVE.
      2019-11-20 16:22:09 0 [Note] WSREP: Flow-control interval: [0, 0]
      2019-11-20 16:22:09 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
      2019-11-20 16:22:09 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: -1)
      2019-11-20 16:22:09 0 [Note] WSREP: RECV thread exiting 0: Success
      2019-11-20 16:22:09 17 [Warning] WSREP: gcs_caused() returned -103 (Software caused connection abort)
      2019-11-20 16:22:09 2 [Note] WSREP: recv_thread() joined.
      2019-11-20 16:22:09 2 [Note] WSREP: Closing replication queue.
      2019-11-20 16:22:09 2 [Note] WSREP: Closing slave action queue.
      2019-11-20 16:22:09 2 [Note] WSREP: ####### processing CC -1, local, ordered
      2019-11-20 16:22:09 2 [Note] WSREP: ####### My UUID: 0a54a855-0ba1-11ea-93a9-d724df0d647a
      2019-11-20 16:22:09 2 [Note] WSREP: ####### ST not required
      2019-11-20 16:22:09 2 [Note] WSREP: ================================================
      View:
        id: 09f8e15a-0ba1-11ea-bed3-56e4b108bea5:-1
        status: non-primary
        protocol_version: 4
        capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
        final: no
        own_index: 0
        members(1):
      	0: 0a54a855-0ba1-11ea-93a9-d724df0d647a, cnt7glr11.localdomain
      =================================================
      2019-11-20 16:22:09 2 [Note] WSREP: Non-primary view
      2019-11-20 16:22:09 2 [Note] [Debug] WSREP: server cnt7glr11.localdomain state change: synced -> connected
      2019-11-20 16:22:09 2 [Note] WSREP: Server status change synced -> connected
      2019-11-20 16:22:09 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2019-11-20 16:22:09 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2019-11-20 16:22:09 2 [Note] WSREP: ####### processing CC -1, local, ordered
      2019-11-20 16:22:09 2 [Note] WSREP: ####### My UUID: 0a54a855-0ba1-11ea-93a9-d724df0d647a
      2019-11-20 16:22:09 2 [Note] WSREP: ####### ST not required
      2019-11-20 16:22:09 2 [Note] WSREP: ================================================
      View:
        id: 09f8e15a-0ba1-11ea-bed3-56e4b108bea5:-1
        status: non-primary
        protocol_version: 4
        capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
        final: yes
        own_index: -1
        members(0):
      =================================================
      2019-11-20 16:22:09 2 [Note] WSREP: Non-primary view
      2019-11-20 16:22:09 2 [Note] [Debug] WSREP: server cnt7glr11.localdomain state change: connected -> disconnected
      2019-11-20 16:22:09 2 [Note] WSREP: Server status change connected -> disconnected
      2019-11-20 16:22:09 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2019-11-20 16:22:09 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2019-11-20 16:22:09 17 [Note] WSREP: wsrep_commit_empty(17)
      2019-11-20 16:22:09 17 [Note] WSREP: wsrep_thd_binlog_reset
      2019-11-20 16:22:09 17 [Note] WSREP: wsrep_thd_binlog_reset
      2019-11-20 16:22:09 17 [Note] WSREP: dispatch_command leave
          thd: 17 thd_ptr: 0x7f4eb0000af0 client_mode: local client_state: result trx_state: aborted
          next_trx_id: 1385 trx_id: -1 seqno: -1
          is_streaming: 0 fragments: 0
          sql_errno: 1205 message: Lock wait timeout exceeded; try restarting transaction
          command: 40 query: REPLACE INTO `table100_innodb_int` ( `col_char_12`, `col_char_12_key` ) 
      2019-11-20 16:22:09 0 [Note] WSREP: Service thread queue flushed.
      2019-11-20 16:22:09 2 [Note] WSREP: ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:-1, protocol version: 5
      2019-11-20 16:22:09 2 [Note] WSREP: Applier thread exiting ret: 0 thd: 2
      2019-11-20 16:22:09 2 [Note] WSREP: closing applier 2
      2019-11-20 16:22:09 2 [Note] WSREP: wsrep system thread: 2, 0x7f4f34000af0 closing
      2019-11-20 16:22:09 2 [Warning] Aborted connection 2 to db: 'unconnected' user: 'unauthenticated' host: '' (This connection closed normally without authentication)
      2019-11-20 16:22:09 2 [Note] WSREP: wsrep running threads now: 1
      2019-11-20 16:22:09 2 [Note] WSREP: ha_rollback_trans(2, TRUE) rolled back: (null): Can't find record in 'table100_innodb_int'; is_real 1
      

      Slave node 2:
      2019-11-20 16:22:16 2 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table test.table10000_innodb_int; Duplicate entry '0' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 8406, Internal MariaDB error code: 1062
      2019-11-20 16:22:16 2 [ERROR] WSREP: Failed to apply write set: gtid: 09f8e15a-0ba1-11ea-bed3-56e4b108bea5:147 server_id: 09f7e5f7-0ba1-11ea-a944-d7f74653ae06 client_id: 17 trx_id: 1940 flags: 3 (start_transaction | commit)
      2019-11-20 16:22:17 2 [Note] [Debug] WSREP: server cnt7glr11.localdomain state change: connected -> disconnected
      2019-11-20 16:22:17 2 [Warning] Aborted connection 2 to db: 'unconnected' user: 'unauthenticated' host: '' (This connection closed normally without authentication)

      2019-11-20 16:22:16 2 [Note] WSREP: assigned new next trx id: 986
      2019-11-20 16:22:16 2 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table test.table10000_innodb_int; Duplicate entry '0' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 8406, Internal MariaDB error code: 1062
      2019-11-20 16:22:16 2 [Note] WSREP: ha_rollback_trans(2, FALSE) rolled back: INSERT IGNORE INTO `table10000_innodb_int` ( `col_int_key`, `col_char_12` ) VALUES ( 'b', 'w' ), ( 5, 'b' ) /* QNO 1682 CON_ID 17 */LÕ]: Duplicate entry '0' for key 'PRIMARY'; is_real 0
      2019-11-20 16:22:16 2 [Warning] WSREP: Event 64 Write_rows_v1 apply failed: 121, seqno 147
      2019-11-20 16:22:16 2 [Note] WSREP: ha_rollback_trans(2, TRUE) rolled back: INSERT IGNORE INTO `table10000_innodb_int` ( `col_int_key`, `col_char_12` ) VALUES ( 'b', 'w' ), ( 5, 'b' ) /* QNO 1682 CON_ID 17 */LÕ]: Duplicate entry '0' for key 'PRIMARY'; is_real 1
      2019-11-20 16:22:16 2 [Note] WSREP: wsrep_after_apply 2
      2019-11-20 16:22:16 2 [Note] WSREP: wsrep_thd_binlog_reset
      2019-11-20 16:22:16 2 [Note] WSREP: Log dummy write set 147
      2019-11-20 16:22:16 2 [Note] WSREP: wsrep_thd_binlog_reset
      2019-11-20 16:22:16 2 [ERROR] WSREP: Failed to apply write set: gtid: 09f8e15a-0ba1-11ea-bed3-56e4b108bea5:147 server_id: 09f7e5f7-0ba1-11ea-a944-d7f74653ae06 client_id: 17 trx_id: 1940 flags: 3 (start_transaction | commit)
      2019-11-20 16:22:16 2 [Note] WSREP: Closing send monitor...
      2019-11-20 16:22:16 2 [Note] WSREP: Closed send monitor.
      2019-11-20 16:22:16 2 [Note] WSREP: gcomm: terminating thread
      2019-11-20 16:22:16 2 [Note] WSREP: gcomm: joining thread
      2019-11-20 16:22:16 2 [Note] WSREP: gcomm: closing backend
      2019-11-20 16:22:17 2 [Note] WSREP: view(view_id(NON_PRIM,09f7e5f7,4) memb {
      	0d528ca2,0
      } joined {
      } left {
      } partitioned {
      	09f7e5f7,0
      })
      2019-11-20 16:22:17 2 [Note] WSREP: PC protocol downgrade 1 -> 0
      2019-11-20 16:22:17 2 [Note] WSREP: view((empty))
      2019-11-20 16:22:17 2 [Note] WSREP: gcomm: closed
      2019-11-20 16:22:17 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
      2019-11-20 16:22:17 0 [Note] WSREP: Flow-control interval: [16, 16]
      2019-11-20 16:22:17 0 [Note] WSREP: Received NON-PRIMARY.
      2019-11-20 16:22:17 0 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 147)
      2019-11-20 16:22:17 0 [Note] WSREP: New SELF-LEAVE.
      2019-11-20 16:22:17 0 [Note] WSREP: Flow-control interval: [0, 0]
      2019-11-20 16:22:17 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
      2019-11-20 16:22:17 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: -1)
      2019-11-20 16:22:17 0 [Note] WSREP: RECV thread exiting 0: Success
      2019-11-20 16:22:17 2 [Note] WSREP: recv_thread() joined.
      2019-11-20 16:22:17 2 [Note] WSREP: Closing replication queue.
      2019-11-20 16:22:17 2 [Note] WSREP: Closing slave action queue.
      2019-11-20 16:22:17 2 [Note] WSREP: ####### processing CC -1, local, ordered
      2019-11-20 16:22:17 2 [Note] WSREP: ####### My UUID: 0d528ca2-0ba1-11ea-a7e1-5a09a105dcfd
      2019-11-20 16:22:17 2 [Note] WSREP: ####### ST not required
      2019-11-20 16:22:17 2 [Note] WSREP: ================================================
      View:
        id: 09f8e15a-0ba1-11ea-bed3-56e4b108bea5:-1
        status: non-primary
        protocol_version: 4
        capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
        final: no
        own_index: 0
        members(1):
      	0: 0d528ca2-0ba1-11ea-a7e1-5a09a105dcfd, cnt7glr11.localdomain
      =================================================
      2019-11-20 16:22:17 2 [Note] WSREP: Non-primary view
      2019-11-20 16:22:17 2 [Note] [Debug] WSREP: server cnt7glr11.localdomain state change: synced -> connected
      2019-11-20 16:22:17 2 [Note] WSREP: Server status change synced -> connected
      2019-11-20 16:22:17 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2019-11-20 16:22:17 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2019-11-20 16:22:17 2 [Note] WSREP: ####### processing CC -1, local, ordered
      2019-11-20 16:22:17 2 [Note] WSREP: ####### My UUID: 0d528ca2-0ba1-11ea-a7e1-5a09a105dcfd
      2019-11-20 16:22:17 2 [Note] WSREP: ####### ST not required
      2019-11-20 16:22:17 2 [Note] WSREP: ================================================
      View:
        id: 09f8e15a-0ba1-11ea-bed3-56e4b108bea5:-1
        status: non-primary
        protocol_version: 4
        capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
        final: yes
        own_index: -1
        members(0):
      =================================================
      2019-11-20 16:22:17 2 [Note] WSREP: Non-primary view
      2019-11-20 16:22:17 2 [Note] [Debug] WSREP: server cnt7glr11.localdomain state change: connected -> disconnected
      2019-11-20 16:22:17 2 [Note] WSREP: Server status change connected -> disconnected
      2019-11-20 16:22:17 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2019-11-20 16:22:17 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2019-11-20 16:22:17 0 [Note] WSREP: Service thread queue flushed.
      2019-11-20 16:22:17 2 [Note] WSREP: ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:-1, protocol version: 5
      2019-11-20 16:22:17 2 [Note] WSREP: Applier thread exiting ret: 0 thd: 2
      2019-11-20 16:22:17 2 [Note] WSREP: closing applier 2
      2019-11-20 16:22:17 2 [Note] WSREP: wsrep system thread: 2, 0x7f36d4000af0 closing
      2019-11-20 16:22:17 2 [Warning] Aborted connection 2 to db: 'unconnected' user: 'unauthenticated' host: '' (This connection closed normally without authentication)
      2019-11-20 16:22:17 2 [Note] WSREP: wsrep running threads now: 1
      2019-11-20 16:22:17 2 [Note] WSREP: ha_rollback_trans(2, TRUE) rolled back: (null): Duplicate entry '0' for key 'PRIMARY'; is_real 1
      2019-11-20 16:32:02 13 [Note] WSREP: assigned new next trx id: 987
      2019-11-20 16:32:02 13 [Note] WSREP: assigned new next trx id: 987
      2019-11-20 16:32:02 0 [Note] /home/stepan/mariadb/10.4/git/sql/mysqld (initiated by: [stepan] @ localhost [127.0.0.1]): Normal shutdown
      2019-11-20 16:32:02 0 [Note] WSREP: Shutdown replication
      

      RQG output:

      # 2019-11-20T16:31:58 [14601] No differences were found between servers 0 and 1.
      # 2019-11-20T16:31:58 [14601] No differences were found between servers 1 and 2.
      [...]
      # 2019-11-20T16:32:04 [14601] [14601] ./runall-new.pl will exit with exit status STATUS_OK (0)
      

      Other logs and xx/yy files.

      Attachments

        1. 191120_RQG_Disconnect_on_err.zip
          1.44 MB
        2. mysql1.err
          2.36 MB
        3. mysql2.err
          132 kB
        4. rqg.log
          4.19 MB

        Activity

          People

            janlindstrom Jan Lindström
            stepan.patryshev Stepan Patryshev (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.