[MDEV-15443] Invalid wsrep XID or binlog position read from the rollback segment - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.3.5
Fix Version/s: 10.3.6
Component/s: Galera, Storage Engine - InnoDB
Labels:
None

Description

The XID may not be read correctly from rollback segment header in the case if the rollback segment containing the highest trx id was not written by wsrep thread.

For example, running the following MTR test in galera test suite will demonstrate the problem:

--source include/have_innodb.inc

--source include/galera_cluster.inc

# Initialize table on node_1

CREATE TABLE t1 (f1 INT PRIMARY KEY) ENGINE=InnoDB;

INSERT INTO t1 VALUES (1);

# Go to node_2, verify that the previous INSERT completed.

# Take node_2 out of the cluster, insert and delete a record

# on a table with wsrep_on.

--connection node_2

SELECT * FROM t1;

SET GLOBAL wsrep_cluster_address='';

SET SESSION wsrep_on=0;

INSERT INTO t1 VALUES (2);

DELETE FROM t1 WHERE f1 = 2;

# Shutdown node_2

--source include/shutdown_mysqld.inc

# On node_1, verify that the node has left the cluster.

--connection node_1

--let $wait_condition = SELECT VARIABLE_VALUE = 1 FROM INFORMATION_SCHEMA.GLOBAL_STATUS WHERE VARIABLE_NAME = 'wsrep_cluster_size';

--source include/wait_condition.inc

# Insert into t1 to enforce IST on node_2 when it is restarted.

INSERT INTO t1 VALUES (2);

# Restart node_2

--connection node_2

--source include/start_mysqld.inc

--connection node_1

DROP TABLE t1;

When the node_2 is started at the end of the test, the rollback segment wsrep seqnos look like the following (zero seqno means invalid wsrep XID):

rseg_id: 0 trx_id: 40 wsrep seqno: 1

rseg_id: 1 trx_id: 5 wsrep seqno: 0

rseg_id: 2 trx_id: 40 wsrep seqno: 0

rseg_id: 3 trx_id: 42 wsrep seqno: 2

rseg_id: 4 trx_id: 44 wsrep seqno: 0

rseg_id: 5 trx_id: 46 wsrep seqno: 0

rseg_id: 6 trx_id: 15 wsrep seqno: 0

rseg_id: 7 trx_id: 17 wsrep seqno: 0

rseg_id: 8 trx_id: 19 wsrep seqno: 0

rseg_id: 9 trx_id: 0 wsrep seqno: 0

rseg_id: 10 trx_id: 22 wsrep seqno: 0

rseg_id: 11 trx_id: 24 wsrep seqno: 0

rseg_id: 12 trx_id: 26 wsrep seqno: 0

rseg_id: 13 trx_id: 32 wsrep seqno: 0

rseg_id: 14 trx_id: 29 wsrep seqno: 0

rseg_id: 15 trx_id: 31 wsrep seqno: 0

rseg_id: 16 trx_id: 0 wsrep seqno: 0

The rest of the rsegs haven't been written into (have trx_id: 0).

Now, the function

trx_rseg_read_wsrep_checkpoint(XID& xid)

reads the XID from the rseg with highest trx id:

                trx_id_t id = mach_read_from_8(rseg_header

                                               + TRX_RSEG_MAX_TRX_ID);

                if (id < max_id) {

                        continue;

                max_id = id;

                found = trx_rseg_read_wsrep_checkpoint(rseg_header, xid)

                        || found;

In the example dump above the highest trx id is in rseg 5, which does not contain a valid wsrep XID. As a result,

trx_rseg_wsrep_checkpoint(rseg_header, xid)

overwrites the previously found XID with zeroes and XID with all zeros is returned from this call. This leads to the following error

2018-03-01  4:35:43 0 [Note] WSREP: Read WSREPXid from InnoDB:  00000000-0000-0000-0000-000000000000:-1

2018-03-01  4:35:43 0 [Note] WSREP: SST received: 00000000-0000-0000-0000-000000000000:2

2018-03-01  4:35:43 1 [ERROR] WSREP: Application received wrong state:

	Received: 00000000-0000-0000-0000-000000000000

	Required: 2cba85c0-1cf9-11e8-b0ce-17781a6132b8

2018-03-01  4:35:43 1 [ERROR] WSREP: Application state transfer failed. This is unrecoverable condition, restart required.

Expected result:
The node_2 restarts and rejoins the cluster via IST.

Affects only 10.3, the test passes with 10.2.

Attachments

Issue Links

is caused by

MDEV-15158 On commit, do not write to the TRX_SYS page

Closed

Activity

People

Assignee:: Marko Mäkelä

Reporter:: Teemu Ollakka

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2018-03-01 02:49

Updated:: 2018-03-07 13:41

Resolved:: 2018-03-07 11:44

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server