[MDEV-15740] Galera does not recover prepared XA-transactions correctly - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.2.8, 10.3.1, 10.1(EOL)
Fix Version/s: 10.4.2, 10.1.38, 10.2.22, 10.3.13
Component/s: Galera
Labels:
None

Description

A stripped down version of galera.galera_gcache_recover:

mysql-test/suite/galera/t/foo.test (no need for a .result file):

--source include/galera_cluster.inc

--connection node_1

CREATE TABLE t1 (f1 INTEGER PRIMARY KEY) ENGINE=InnoDB;

INSERT INTO t1 VALUES (1);

--connection node_2

--let $wait_condition = SELECT COUNT(*) > 0 FROM t1;

--source include/wait_condition.inc

# If we don't sleep here, the tail command below prints:

#   WSREP: Recovered position: ...:1

# If we do sleep then we recover to seqno=2:

#   WSREP: Recovered position: ...:2

#--sleep 1

--source include/kill_galera.inc

--connection node_2

--let $galera_wsrep_recover_server_id=2

--source suite/galera/include/galera_wsrep_recover.inc

--echo Recovered node2

--exec tail -1 $MYSQL_TMP_DIR/galera_wsrep_recover.log

--echo sleep 1000...

--sleep 1000

The problem with that test is that (without the sleep) the contents of the t1 table on node2 goes out of sync with the recorded wsrep xid. Wsrep xid seqno=2 is lost and when it recovers later to 1 it would try to re-apply the changes behind seqno=2 from another node, BUT that would cause a duplicate key error because the row is already in the table t1.

The reason for this is that the mini-transaction which writes wsrep xid commits, but the redo log is never flushed up to that LSN before the node is killed. This bug is not present in MySQL because there the log is flushed properly during commit:

trx_sys_update_wsrep_checkpoint() writes seqno=2 using mtr=0x7fffdf9b4ff8

mtr_commit(mtr=0x7fffdf9b4ff8) end_lsn=1631956

innobase_commit()

  trx_commit_complete_for_mysql()

    trx_flush_log_if_needed()  // this is innodb_flush_log_at_trx_commit handling

      trx_flush_log_if_needed_low()

        log_write_up_to(lsn=1631956)

In MariaDB this code path is cancelled at trx_commit_complete_for_mysql():

1d0f70c2f894 (Michael Widenius 2243) trx_commit_complete_for_mysql(

1d0f70c2f894 (Michael Widenius 2244) /*==========================*/

068c61978e3a (Michael Widenius 2245)     trx_t*    trx)    /*!< in/out: transaction */

1d0f70c2f894 (Michael Widenius 2246) {

2e814d4702d7 (Jan Lindström    2247)     if (trx->id != 0

2e814d4702d7 (Jan Lindström    2248)         || !trx->must_flush_log_later

36e81a23c567 (Kristian Nielsen 2249)         || (srv_flush_log_at_trx_commit == 1 && trx->active_commit_ordered)) {

1d0f70c2f894 (Michael Widenius 2250)

068c61978e3a (Michael Widenius 2251)         return;

1d0f70c2f894 (Michael Widenius 2252)     }

1d0f70c2f894 (Michael Widenius 2253)

068c61978e3a (Michael Widenius 2254)     trx_flush_log_if_needed(trx->commit_lsn, trx);

The return; on line 2251 is executed and trx_flush_log_if_needed() is never called. This is because line 2249 evaluates to true - both srv_flush_log_at_trx_commit and trx->active_commit_ordered are 1.

commit 36e81a23c56

Parent: 5ae598390aa

Author:     Kristian Nielsen <knielsen@knielsen-hq.org>

AuthorDate: Mon Aug 7 12:38:47 2017 +0200

Commit:     Kristian Nielsen <knielsen@knielsen-hq.org>

CommitDate: Mon Aug 7 18:23:55 2017 +0200

    MDEV-11937: InnoDB flushes redo log too often

    Problem was introduced with the InnoDB 5.7 merge, the code related to

    avoiding extra fsync at the end of commit when binlog is enabled. The

    MariaDB method for this was removed, but the replacement MySQL method

    based on thd_get_durability_property() is not functional in MariaDB.

    This commit reverts the offending parts of the merge and adds a test

    case, to fix the problem for InnoDB. But other storage engines are

    likely to have a similar problem.

...

-           || thd_requested_durability(trx->mysql_thd)

-              == HA_IGNORE_DURABILITY) {

+           || (srv_flush_log_at_trx_commit == 1 && trx->active_commit_ordered)) {

This looks particularly strange because changing innodb_flush_log_at_trx_commit from 1 to 2 "fixes" the bug due to the condition srv_flush_log_at_trx_commit == 1 not being true anymore. But 1 is supposed to be more durable and slow compared to 2. Now, with the current code 2 is more durable!

Attachments

Issue Links

blocks

MDEV-15540 Galera suite MTR tests issuing wsrep_recover fail

Closed

causes

MDEV-16571 backup tests fail with missing data after restore

Open

is caused by

MDEV-11937 InnoDB flushes redo log too often

Closed

relates to

MDEV-18009 Missing redo log flush in innodb.instant_alter_crash

Closed

MDEV-14188 mariabackup.incremental_encrypted failed in buildbot with wrong result

Closed

Activity

People

Assignee:: Teemu Ollakka

Reporter:: Vasil (Inactive)

Votes:: 2 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 2018-03-30 07:59

Updated:: 2019-01-28 10:02

Resolved:: 2019-01-28 10:02

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server