[MDEV-21117] refine the server binlog-based recovery for semisync - Jira

Details

Type: Task
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Fix Version/s: 10.6.2
Component/s: Replication
Labels:
None

Description

When run after master server crash --tc-heuristic-recover=rollback produces inconsistent server state with binlog still containing transactions that were rolled back by the option.
Such way recovered server may not be used for replication. E.g when such way recovered
ex-master is demoted into slave its binlog state needs further correction to subtract
the rolled back transactions from its binlog status. Otherwise the "new" slave might claim
those transactions as locally present at the (gtid) master-slave connection protocol. At the same time the actual "new" master may never have seen those transactions (because they never arrived at it when it was formerly slave, due to the crash).

This issue should be fixed with refining the recovery procedure with truncating binlog to cut off the prepared rolled back transactions. The method is also known as pioneered by FB
https://percona.community/blog/2018/08/23/question-about-semi-synchronous-replication-answer-with-all-the-details/

Once a transaction reaches the binary logs it should roll forward.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

recovery_design.txt
6 kB
2021-01-28 14:23

Issue Links

blocks

MDEV-11855 Make semisync crash safe with the cluster

Open

causes

MDEV-33465 an option to enable semisync recovery

Closed

includes

MDEV-26652 xa transactions binlogged in wrong order

Closed

is duplicated by

MDEV-20996 Maxscale auto-failover with semi-sync replication is not providing a true HA solution

Closed

relates to

MDEV-21168 Active XA transactions stop slave from working after backup was restored.

Closed

MDEV-25395 server recovery hits replication event checksum error

Stalled

MDEV-33424 when both rpl_semi_sync_MASTER,SLAVE_enabled set the server should recover as master

Closed

MDEV-18959 Engine transaction recovery through persistent binlog

Stalled

links to

MySQL WL#5493: Binlog crash-safe when master crashed

(3 relates to, 1 links to)

Activity

Ascending order - Click to sort in descending order

Sujatha Sivakumar (Inactive) added a comment - 2019-11-22 14:05

Observed reported issue on latest 10.1.44

MariaDB [(none)]> use test;

Database changed

MariaDB [test]> create table t ( f int ) engine=innodb;

Query OK, 0 rows affected (0.02 sec)

MariaDB [test]> insert into t values (10);

Query OK, 1 row affected (0.01 sec)

MariaDB [test]> insert into t values (20);

ERROR 2013 (HY000): Lost connection to MySQL server during query

Crash the server in the middle of commit.
Restart the server with "--tc-heuristic-recover=ROLLBACK"

Observed that the transaction which tried to insert '20' is rolled back in engine.

MariaDB [test]> select * from t;

+------+

| f    |

+------+

|   10 |

+------+

But the binarylog retains the transaction which was rolled back in the engine.
The binlog 'master-bin.000001' is present post recovery

MariaDB [test]> show binlog events;

+-------------------+-----+-------------------+-----------+-------------+---------------------------------------------------+

| Log_name          | Pos | Event_type        | Server_id | End_log_pos | Info                                              |

+-------------------+-----+-------------------+-----------+-------------+---------------------------------------------------+

| master-bin.000001 |   4 | Format_desc       |         1 |         249 | Server ver: 10.1.44-MariaDB-debug, Binlog ver: 4  |

| master-bin.000001 | 249 | Gtid_list         |         1 |         274 | []                                                |

| master-bin.000001 | 274 | Binlog_checkpoint |         1 |         314 | master-bin.000001                                 |

| master-bin.000001 | 314 | Gtid              |         1 |         352 | GTID 0-1-1                                        |

| master-bin.000001 | 352 | Query             |         1 |         457 | use `test`; create table t (f int ) engine=innodb |

| master-bin.000001 | 457 | Gtid              |         1 |         495 | BEGIN GTID 0-1-2                                  |

| master-bin.000001 | 495 | Query             |         1 |         587 | use `test`; insert into t values(10)              |

| master-bin.000001 | 587 | Xid               |         1 |         614 | COMMIT /* xid=7 */                                |

| master-bin.000001 | 614 | Gtid              |         1 |         652 | BEGIN GTID 0-1-3                                  |

| master-bin.000001 | 652 | Query             |         1 |         744 | use `test`; insert into t values(20)              |

| master-bin.000001 | 744 | Xid               |         1 |         771 | COMMIT /* xid=8 */                                |

+-------------------+-----+-------------------+-----------+-------------+---------------------------------------------------+

11 rows in set (0.00 sec)

GTID state is also preserved within a newly created binary log.

MariaDB [test]> show binlog events in 'master-bin.000002';

+-------------------+-----+-------------------+-----------+-------------+--------------------------------------------------+

| Log_name          | Pos | Event_type        | Server_id | End_log_pos | Info                                             |

+-------------------+-----+-------------------+-----------+-------------+--------------------------------------------------+

| master-bin.000002 |   4 | Format_desc       |         1 |         249 | Server ver: 10.1.44-MariaDB-debug, Binlog ver: 4 |

| master-bin.000002 | 249 | Gtid_list         |         1 |         288 | [0-1-3]                                          |

| master-bin.000002 | 288 | Binlog_checkpoint |         1 |         328 | master-bin.000002                                |

| master-bin.000002 | 328 | Stop              |         1 |         347 |                                                  |

+-------------------+-----+-------------------+-----------+-------------+--------------------------------------------------+

4 rows in set (0.00 sec)

1 row in set (0.00 sec)

This results in inconsistent state.

Sujatha Sivakumar (Inactive) added a comment - 2019-11-22 14:05 Observed reported issue on latest 10.1.44 MariaDB [(none)]> use test; Database changed MariaDB [test]> create table t ( f int ) engine=innodb; Query OK, 0 rows affected (0.02 sec) MariaDB [test]> insert into t values (10); Query OK, 1 row affected (0.01 sec) MariaDB [test]> insert into t values (20); ERROR 2013 (HY000): Lost connection to MySQL server during query Crash the server in the middle of commit. Restart the server with "--tc-heuristic-recover=ROLLBACK" Observed that the transaction which tried to insert '20' is rolled back in engine. MariaDB [test]> select * from t; +------+ | f | +------+ | 10 | +------+ But the binarylog retains the transaction which was rolled back in the engine. The binlog 'master-bin.000001' is present post recovery MariaDB [test]> show binlog events; +-------------------+-----+-------------------+-----------+-------------+---------------------------------------------------+ | Log_name | Pos | Event_type | Server_id | End_log_pos | Info | +-------------------+-----+-------------------+-----------+-------------+---------------------------------------------------+ | master-bin.000001 | 4 | Format_desc | 1 | 249 | Server ver: 10.1.44-MariaDB-debug, Binlog ver: 4 | | master-bin.000001 | 249 | Gtid_list | 1 | 274 | [] | | master-bin.000001 | 274 | Binlog_checkpoint | 1 | 314 | master-bin.000001 | | master-bin.000001 | 314 | Gtid | 1 | 352 | GTID 0-1-1 | | master-bin.000001 | 352 | Query | 1 | 457 | use `test`; create table t (f int ) engine=innodb | | master-bin.000001 | 457 | Gtid | 1 | 495 | BEGIN GTID 0-1-2 | | master-bin.000001 | 495 | Query | 1 | 587 | use `test`; insert into t values(10) | | master-bin.000001 | 587 | Xid | 1 | 614 | COMMIT /* xid=7 */ | | master-bin.000001 | 614 | Gtid | 1 | 652 | BEGIN GTID 0-1-3 | | master-bin.000001 | 652 | Query | 1 | 744 | use `test`; insert into t values(20) | | master-bin.000001 | 744 | Xid | 1 | 771 | COMMIT /* xid=8 */ | +-------------------+-----+-------------------+-----------+-------------+---------------------------------------------------+ 11 rows in set (0.00 sec) GTID state is also preserved within a newly created binary log. MariaDB [test]> show binlog events in 'master-bin.000002'; +-------------------+-----+-------------------+-----------+-------------+--------------------------------------------------+ | Log_name | Pos | Event_type | Server_id | End_log_pos | Info | +-------------------+-----+-------------------+-----------+-------------+--------------------------------------------------+ | master-bin.000002 | 4 | Format_desc | 1 | 249 | Server ver: 10.1.44-MariaDB-debug, Binlog ver: 4 | | master-bin.000002 | 249 | Gtid_list | 1 | 288 | [0-1-3] | | master-bin.000002 | 288 | Binlog_checkpoint | 1 | 328 | master-bin.000002 | | master-bin.000002 | 328 | Stop | 1 | 347 | | +-------------------+-----+-------------------+-----------+-------------+--------------------------------------------------+ 4 rows in set (0.00 sec) 1 row in set (0.00 sec) This results in inconsistent state.

Sujatha Sivakumar (Inactive) added a comment - 2020-04-01 17:51

Hello Andrei,

Can you please review the fix for ~~MDEV-21117~~.

Build Bot: http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.1-sujatha
Patch: https://github.com/MariaDB/server/commit/b62432fc2459a312aaa0e27c0950de3420a34d53

Thank you.

Sujatha Sivakumar (Inactive) added a comment - 2020-04-01 17:51 Hello Andrei, Can you please review the fix for MDEV-21117 . Build Bot: http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.1-sujatha Patch: https://github.com/MariaDB/server/commit/b62432fc2459a312aaa0e27c0950de3420a34d53 Thank you.

Andrei Elkin added a comment - 2020-04-14 10:12

Part of the review is done. More work is requested.

Andrei Elkin added a comment - 2020-04-14 10:12 Part of the review is done. More work is requested.

Marko Mäkelä added a comment - 2020-04-17 15:23

I helped debug the "loss" of binlog position, which boiled down to this extraneous condition at InnoDB startup:

void

recv_recovery_from_checkpoint_finish(void)

/*======================================*/

	if (recv_needed_recovery) {

		trx_sys_print_mysql_master_log_pos();

		trx_sys_print_mysql_binlog_offset();

This code would only be invoked if any log records had been written after the latest log checkpoint. But binlog recovery requires that trx_sys_mysql_bin_log_pos be unconditionally initialized.

Marko Mäkelä added a comment - 2020-04-17 15:23 I helped debug the "loss" of binlog position, which boiled down to this extraneous condition at InnoDB startup: void recv_recovery_from_checkpoint_finish( void ) /*======================================*/ { if (recv_needed_recovery) { trx_sys_print_mysql_master_log_pos(); trx_sys_print_mysql_binlog_offset(); } This code would only be invoked if any log records had been written after the latest log checkpoint. But binlog recovery requires that trx_sys_mysql_bin_log_pos be unconditionally initialized.

Marko Mäkelä added a comment - 2020-04-23 10:08

I think that a merge of this to 10.3 is blocked by ~~MDEV-22351~~.

Marko Mäkelä added a comment - 2020-04-23 10:08 I think that a merge of this to 10.3 is blocked by MDEV-22351 .

Sujatha Sivakumar (Inactive) added a comment - 2020-04-23 15:34

Hello Sergei,

Please review the changes for ~~MDEV-21117~~.

https://github.com/MariaDB/server/commit/1dd06f633315759b359d1772b8b95bc6c8911c19
http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.1-sujatha

Note: binlog_truncate_retry_success.test fails on "kvm_fulltest2" in 10.3+ versions, because of ~~MDEV-22351~~. It will be address by ~~MDEV-22351~~ fix.

Thank you

Sujatha Sivakumar (Inactive) added a comment - 2020-04-23 15:34 Hello Sergei, Please review the changes for MDEV-21117 . https://github.com/MariaDB/server/commit/1dd06f633315759b359d1772b8b95bc6c8911c19 http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.1-sujatha Note: binlog_truncate_retry_success.test fails on "kvm_fulltest2" in 10.3+ versions, because of MDEV-22351 . It will be address by MDEV-22351 fix. Thank you

Andrei Elkin added a comment - 2020-10-23 11:50

The 10.1 patch is ready for review, 534ec31c9b6 commit on the HEAD of bb-10.1-andrei.

Andrei Elkin added a comment - 2020-10-23 11:50 The 10.1 patch is ready for review, 534ec31c9b6 commit on the HEAD of bb-10.1-andrei.

Andrei Elkin added a comment - 2020-10-23 15:07

serg: bc00529d14a - the new HEAD of bb-10.1-andrei corrects a silly logic flaw

--- a/sql/handler.cc

+++ b/sql/handler.cc

@@ -2052,7 +2052,7 @@ static my_bool xarecover_handlerton(THD *unused, plugin_ref plugin,

 #endif

-        else if (tc_heuristic_recover == TC_HEURISTIC_RECOVER_ROLLBACK)

+        else if (tc_heuristic_recover != TC_RECOVER_BINLOG_TRUNCATE)

 #ifndef DBUG_OFF

           int rc=

@@ -2070,8 +2070,6 @@ static my_bool xarecover_handlerton(THD *unused, plugin_ref plugin,

 #ifdef HAVE_REPLICATION

         else

-          DBUG_ASSERT(tc_heuristic_recover == TC_RECOVER_BINLOG_TRUNCATE);

           if (xid_member_replace(info->commit_list, x, info->mem_root))

             info->error= true;

Andrei Elkin added a comment - 2020-10-23 15:07 serg : bc00529d14a - the new HEAD of bb-10.1-andrei corrects a silly logic flaw --- a/sql/handler.cc +++ b/sql/handler.cc @@ - 2052 , 7 + 2052 , 7 @@ static my_bool xarecover_handlerton(THD *unused, plugin_ref plugin, } #endif } - else if (tc_heuristic_recover == TC_HEURISTIC_RECOVER_ROLLBACK) + else if (tc_heuristic_recover != TC_RECOVER_BINLOG_TRUNCATE) { #ifndef DBUG_OFF int rc= @@ - 2070 , 8 + 2070 , 6 @@ static my_bool xarecover_handlerton(THD *unused, plugin_ref plugin, #ifdef HAVE_REPLICATION else { - DBUG_ASSERT(tc_heuristic_recover == TC_RECOVER_BINLOG_TRUNCATE); - if (xid_member_replace(info->commit_list, x, info->mem_root)) { info->error= true ;

Andrei Elkin added a comment - 2020-11-26 17:23

serg: I did not changed the state back to In-progress, but it was the case for the past time.
Today `b1f9ce4a133` commit is finally BB-approved, and bb-10.2-andrei contains the branch.
Hope to hear from you about the patch rather sooner than just before next release .

Andrei Elkin added a comment - 2020-11-26 17:23 serg : I did not changed the state back to In-progress, but it was the case for the past time. Today `b1f9ce4a133` commit is finally BB-approved, and bb-10.2-andrei contains the branch. Hope to hear from you about the patch rather sooner than just before next release .

Andrei Elkin added a comment - 2021-01-22 16:53

Howdy, Sergei,

the latest patch abd7557d402 HEAD -> bb-10.3-andrei is ready for review.

Andrei Elkin added a comment - 2021-01-22 16:53 Howdy, Sergei, the latest patch abd7557d402 HEAD -> bb-10.3-andrei is ready for review.

Andrei Elkin added a comment - 2021-01-27 20:59

serg: As said on slack MDEV-24654 is untied from the semisync. A squashed 159a38ec388 is ready for review.

Andrei Elkin added a comment - 2021-01-27 20:59 serg : As said on slack MDEV-24654 is untied from the semisync. A squashed 159a38ec388 is ready for review.

Andrei Elkin added a comment - 2021-02-08 16:07 - edited

serg At testing of the former patches few glitches were eliminated.
ee538938345 HEAD -> bb-10.3-MDEV-21117 is the final commit that sustained aggressive manual testing with multi-engine & unsafe combinations.
It is ready for review.

Andrei Elkin added a comment - 2021-02-08 16:07 - edited serg At testing of the former patches few glitches were eliminated. ee538938345 HEAD -> bb-10.3- MDEV-21117 is the final commit that sustained aggressive manual testing with multi-engine & unsafe combinations. It is ready for review.

Andrei Elkin added a comment - 2021-04-13 09:43 - edited

Howdy, Sergei.

The patch is ready. It's in
bb-10.3-MDEV-21117 contains a series of commits, which in the git-log order are:

2b4000e3005 (HEAD -> 10.3-~~MDEV-21117~~-no_24654, origin/bb-10.3-~~MDEV-21117~~) MDEV-25395 server recovery hits replication event checksum error
f9213f3ed95 Discard FD checksum bug related hunks. To recommit separately as MDEV-25395 fixes.
9165d66dc94 Optimization of binlog scan rounds.
b4a09b20da2 Review notes for binlog_commit away of hton.
4bc83b27494 General character review notes addressed. This excludes binlog_commit away of hton.
dc39bd5f3ec Fixed test specific review comments.
e92037989f7 ~~MDEV-21117~~: refine the server binlog-based recovery for semisync

`e92037989f7` is the base that you reviewed with notes which are addressed in other commits.
The tests, changes related to binlog_commit and optimization of binlog scan, other changes and MDEV-25395 that motivates reading FD for finding out its binlog checksum.

Andrei Elkin added a comment - 2021-04-13 09:43 - edited Howdy, Sergei. The patch is ready. It's in bb-10.3- MDEV-21117 contains a series of commits, which in the git-log order are: 2b4000e3005 (HEAD -> 10.3- MDEV-21117 -no_24654, origin/bb-10.3- MDEV-21117 ) MDEV-25395 server recovery hits replication event checksum error f9213f3ed95 Discard FD checksum bug related hunks. To recommit separately as MDEV-25395 fixes. 9165d66dc94 Optimization of binlog scan rounds. b4a09b20da2 Review notes for binlog_commit away of hton. 4bc83b27494 General character review notes addressed. This excludes binlog_commit away of hton. dc39bd5f3ec Fixed test specific review comments. e92037989f7 MDEV-21117 : refine the server binlog-based recovery for semisync `e92037989f7` is the base that you reviewed with notes which are addressed in other commits. The tests, changes related to binlog_commit and optimization of binlog scan, other changes and MDEV-25395 that motivates reading FD for finding out its binlog checksum.

Andrei Elkin added a comment - 2021-04-15 14:27

serg: Fyi, I keep bb-10.3-~~MDEV-21117~~ updated with minor fixes to ~~MDEV-21117~~ and refinements to MDEV-25395 (today's commit to satisfy existing tests). No functional changes are done to neither of the two.

Andrei Elkin added a comment - 2021-04-15 14:27 serg : Fyi, I keep bb-10.3- MDEV-21117 updated with minor fixes to MDEV-21117 and refinements to MDEV-25395 (today's commit to satisfy existing tests). No functional changes are done to neither of the two.

Andrei Elkin added a comment - 2021-05-26 14:43 - edited

The patch is ported to 10.6 and is present as two commits
72a59e4b068 - the basic part, at4cc5a3074da - the user XA part (due to ~~MDEV-742~~).
See bb-10.6_~~MDEV-21117~~.

Andrei Elkin added a comment - 2021-05-26 14:43 - edited The patch is ported to 10.6 and is present as two commits 72a59e4b068 - the basic part, at4cc5a3074da - the user XA part (due to MDEV-742 ). See bb-10.6_ MDEV-21117 .

Andrei Elkin added a comment - 2021-05-26 14:57

serg, there were two replies from me that I've not seen your ack on.
Just in case, the 1st one is from

Subject: Re: [Maria-developers] e92037989f7: ~~MDEV-21117~~: refine the server binlog-based recovery for semisync
To: Sergei Golubchik <serg@mariadb.org>
Cc: Andrei Elkin <andrei.elkin@mariadb.com>, maria-developers@lists.launchpad.net, alice.sherepa@mariadb.com, sujatha.sivakumar@mariadb.com
Date: Tue, 27 Apr 2021 18:03:47 +0300 (4 weeks, 23 hours, 46 minutes ago)

to clarify on a simulation in this part

> in the first review I wrote
>
> this seems to be a rather crude way of faking a partially committed
> transaction. better to crash after the first engine has committed,
> that'd be much more natural.
>
> and you replied
>
> This simulation aimed at (allows for) more complicated recovery time
> event sequences.
> In this case, indeed, crashing by demand is about of the same efforts.
> I can convert to that.
>
> [x]

I had a rather sobering followup, slipped communicating to you though, sorry.

It's actually turns out not to be easy. A sequence of execution events

E1.prepare(trx), E2.prepare(trx), E1.commit(trx), crash

does not make E1 such as Innodb committed persistently in file. At
recovery in E1 trx may be found in prepared state, unless before the
crash I'd do something like Binlog CheckPoint (BCP) notification request
and wait of it.
In this just a selected engine would be requested, unlike in BCP event
case that is generated when both (in this case) have persistently committed.
(That is we even can't involve BCP event to wait for, and the involvement
would require some simulation as well).

So I propose to keep the current method leaving the "honest" crash to
RQG that Alice has been training the patch with.

and the 2nd for ALTERing the slave table
Subject: Re: [Maria-developers] e92037989f7: ~~MDEV-21117~~: refine the server binlog-based recovery for semisync

>> Actually not need to. There's no crash in the middle of a slave
>> transaction.
>> So it must be a copy-paste leftover.

actually the test crashes both servers (somehow I managed to forget that).

>>
>> [x]
>
> My bad.
>
> [x]

so I am reverting to keep ALTER.

Andrei Elkin added a comment - 2021-05-26 14:57 serg , there were two replies from me that I've not seen your ack on. Just in case, the 1st one is from Subject: Re: [Maria-developers] e92037989f7: MDEV-21117 : refine the server binlog-based recovery for semisync To: Sergei Golubchik <serg@mariadb.org> Cc: Andrei Elkin <andrei.elkin@mariadb.com>, maria-developers@lists.launchpad.net, alice.sherepa@mariadb.com, sujatha.sivakumar@mariadb.com Date: Tue, 27 Apr 2021 18:03:47 +0300 (4 weeks, 23 hours, 46 minutes ago) to clarify on a simulation in this part > in the first review I wrote > > this seems to be a rather crude way of faking a partially committed > transaction. better to crash after the first engine has committed, > that'd be much more natural. > > and you replied > > This simulation aimed at (allows for) more complicated recovery time > event sequences. > In this case, indeed, crashing by demand is about of the same efforts. > I can convert to that. > > [x] I had a rather sobering followup, slipped communicating to you though, sorry. It's actually turns out not to be easy. A sequence of execution events E1.prepare(trx), E2.prepare(trx), E1.commit(trx), crash does not make E1 such as Innodb committed persistently in file. At recovery in E1 trx may be found in prepared state, unless before the crash I'd do something like Binlog CheckPoint (BCP) notification request and wait of it. In this just a selected engine would be requested, unlike in BCP event case that is generated when both (in this case) have persistently committed. (That is we even can't involve BCP event to wait for, and the involvement would require some simulation as well). So I propose to keep the current method leaving the "honest" crash to RQG that Alice has been training the patch with. and the 2nd for ALTERing the slave table Subject: Re: [Maria-developers] e92037989f7: MDEV-21117 : refine the server binlog-based recovery for semisync >> Actually not need to. There's no crash in the middle of a slave >> transaction. >> So it must be a copy-paste leftover. actually the test crashes both servers (somehow I managed to forget that). >> >> [x] > > My bad. > > [x] so I am reverting to keep ALTER.

Andrei Elkin added a comment - 2021-05-27 14:40 - edited

serg, fyi commit `557ab2992d` "increments" `origin/bb-10.6_~~MDEV-21117~~` to address today's review notes (but the cleaner partial commit simulation and the std::pair related).

Andrei Elkin added a comment - 2021-05-27 14:40 - edited serg , fyi commit `557ab2992d` "increments" `origin/bb-10.6_ MDEV-21117 ` to address today's review notes (but the cleaner partial commit simulation and the std::pair related).

Andrei Elkin added a comment - 2021-06-03 12:16

The latest notes are addressed. The branch updated:
8e8c80a187c..5319a893b15 HEAD -> bb-10.6_~~MDEV-21117~~

Notice 8e8c80a187c also follows the one by Sujatha
8e8c80a187c (origin/bb-10.6_~~MDEV-21117~~) ~~MDEV-21117~~: binlog_truncate_multi_engine.test improvements which covers
the proper partial commit simulation.

Andrei Elkin added a comment - 2021-06-03 12:16 The latest notes are addressed. The branch updated: 8e8c80a187c..5319a893b15 HEAD -> bb-10.6_ MDEV-21117 Notice 8e8c80a187c also follows the one by Sujatha 8e8c80a187c (origin/bb-10.6_ MDEV-21117 ) MDEV-21117 : binlog_truncate_multi_engine.test improvements which covers the proper partial commit simulation.

Andrei Elkin added a comment - 2021-06-07 14:38

The branch is just forcibly updated
5319a893b15...8bd36afdb40 HEAD -> bb-10.6_~~MDEV-21117~~

with kill -SIGUSR2 pid commit and collapsed one for simulating partial commits in
mtr tests; additionally to the collapsing the former (at recovery) simulation's debug_dbug var
is finally gone.

No more changes has been done.

Andrei Elkin added a comment - 2021-06-07 14:38 The branch is just forcibly updated 5319a893b15...8bd36afdb40 HEAD -> bb-10.6_ MDEV-21117 with kill -SIGUSR2 pid commit and collapsed one for simulating partial commits in mtr tests; additionally to the collapsing the former (at recovery) simulation's debug_dbug var is finally gone. No more changes has been done.

Andrei Elkin added a comment - 2021-06-08 18:50

Merged with the atomic DDL patch.
8bd36afdb40...2b3d1b977dd HEAD -> bb-10.6_~~MDEV-21117~~ (forced update)

Andrei Elkin added a comment - 2021-06-08 18:50 Merged with the atomic DDL patch. 8bd36afdb40...2b3d1b977dd HEAD -> bb-10.6_ MDEV-21117 (forced update)

Andrei Elkin added a comment - 2021-06-11 17:32

Pushed as 6c39eaeb126.

Andrei Elkin added a comment - 2021-06-11 17:32 Pushed as 6c39eaeb126 .

Andrei Elkin added a comment - 2021-06-11 18:24

Documentation is updated with the following diff.

Andrei Elkin added a comment - 2021-06-11 18:24 Documentation is updated with the following diff .

Michael Widenius added a comment - 2024-02-07 16:05

Currently we have a problems with the binary logs:

We truncate transactions from the binary log that was already fully written there (and thus may be on the slave)
When we truncate a half-written-transaction we don't take into account that a slave may have already got part of it and will reuse the GTID id if reconnecting to a master that died in the middle of writing the transaction to the binary log.

The fix should be:

All fully written transactions in the binary log should roll forwards. The half or not written one should do rollback.
In case of the master-slave connection closing before the slave gets a full transaction, it should reconnect to the master and ask for the 'transaction after the last fully received GTID.
Alternatively it could use the last binlog file + position (which should work fine as the master binlog was truncated).

Michael Widenius added a comment - 2024-02-07 16:05 Currently we have a problems with the binary logs: We truncate transactions from the binary log that was already fully written there (and thus may be on the slave) When we truncate a half-written-transaction we don't take into account that a slave may have already got part of it and will reuse the GTID id if reconnecting to a master that died in the middle of writing the transaction to the binary log. The fix should be: All fully written transactions in the binary log should roll forwards. The half or not written one should do rollback. In case of the master-slave connection closing before the slave gets a full transaction, it should reconnect to the master and ask for the 'transaction after the last fully received GTID. Alternatively it could use the last binlog file + position (which should work fine as the master binlog was truncated).

Andrei Elkin added a comment - 2024-02-07 16:56

monty, to the fully written ones, when a server restarts having both rpl_semi_sync_MASTER,SLAVE_enabled, the semisync slave recovery does not check the first of the two. Hence a server intended to be a master, that is having rpl_semi_sync_MASTER_enabled = true rolls back transactions in doubt unnecessarily and possibly harmfully.

That's an issue. It could be trivially fixed with adding the check of rpl_semi_sync_MASTER_enabled and when it's true it would override.

The half-written is an issue only because the slave is reconnecting to a master that passed through the semisync recovery with the intent to become slave (such master may discard from its binlog a transaction being received by slave.
It will disappear when the new rpl_semi_sync_MASTER_enabled-based decision will be implemented.

Andrei Elkin added a comment - 2024-02-07 16:56 monty , to the fully written ones, when a server restarts having both rpl_semi_sync_MASTER,SLAVE_enabled , the semisync slave recovery does not check the first of the two. Hence a server intended to be a master, that is having rpl_semi_sync_MASTER_enabled = true rolls back transactions in doubt unnecessarily and possibly harmfully. That's an issue. It could be trivially fixed with adding the check of rpl_semi_sync_MASTER_enabled and when it's true it would override. The half-written is an issue only because the slave is reconnecting to a master that passed through the semisync recovery with the intent to become slave (such master may discard from its binlog a transaction being received by slave. It will disappear when the new rpl_semi_sync_MASTER_enabled -based decision will be implemented.

Andrei Elkin added a comment - 2024-02-07 17:17

monty, I filed ~~MDEV-33424~~ for this plan which is about 1 line of code.

Andrei Elkin added a comment - 2024-02-07 17:17 monty , I filed MDEV-33424 for this plan which is about 1 line of code.

Michael Widenius added a comment - 2024-02-11 13:00 - edited

~~MDEV-33424~~ is not acceptable as it does not solve the issue at hand. Please fix this issue the way I have described!
To make things clear, this is how to fix this issue:

rpl_semi_sync_slave_enable should not have anything to do with recovery.
We should only delete things from the binary log that was found to be half written during recovery. This could be a new option that could default to always one.
I personally don't think we need an option for this as this should always be safe to do (and should save disk space)
We should that if a slave re-connects it should continue from the next transaction after the last full transaction it has read.
In other words. if the slave reads GTID-1 and master crashes while reading GTID-2 then the slave should ask for 'next transaction after GTD-1'
Alternatively it can ask the transaction based on the last binlog position it was using.

Michael Widenius added a comment - 2024-02-11 13:00 - edited MDEV-33424 is not acceptable as it does not solve the issue at hand. Please fix this issue the way I have described! To make things clear, this is how to fix this issue: rpl_semi_sync_slave_enable should not have anything to do with recovery. We should only delete things from the binary log that was found to be half written during recovery. This could be a new option that could default to always one. I personally don't think we need an option for this as this should always be safe to do (and should save disk space) We should that if a slave re-connects it should continue from the next transaction after the last full transaction it has read. In other words. if the slave reads GTID-1 and master crashes while reading GTID-2 then the slave should ask for 'next transaction after GTD-1' Alternatively it can ask the transaction based on the last binlog position it was using.

People

Assignee:: Brandon Nesterenko

Reporter:: Andrei Elkin

Votes:: 3 Vote for this issue

Watchers:: 24 Start watching this issue

Dates

Created:: 2019-11-21 16:28

Updated:: 2025-03-13 11:40

Resolved:: 2024-03-27 20:38

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration