Details
-
Task
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
None
Description
When run after master server crash --tc-heuristic-recover=rollback produces inconsistent server state with binlog still containing transactions that were rolled back by the option.
Such way recovered server may not be used for replication. E.g when such way recovered
ex-master is demoted into slave its binlog state needs further correction to subtract
the rolled back transactions from its binlog status. Otherwise the "new" slave might claim
those transactions as locally present at the (gtid) master-slave connection protocol. At the same time the actual "new" master may never have seen those transactions (because they never arrived at it when it was formerly slave, due to the crash).
This issue should be fixed with refining the recovery procedure with truncating binlog to cut off the prepared rolled back transactions. The method is also known as pioneered by FB
https://percona.community/blog/2018/08/23/question-about-semi-synchronous-replication-answer-with-all-the-details/
Once a transaction reaches the binary logs it should roll forward.
Attachments
Issue Links
- blocks
-
MDEV-11855 Make semisync crash safe with the cluster
-
- Open
-
- causes
-
MDEV-33465 an option to enable semisync recovery
-
- Closed
-
- includes
-
MDEV-26652 xa transactions binlogged in wrong order
-
- Closed
-
- is duplicated by
-
MDEV-20996 Maxscale auto-failover with semi-sync replication is not providing a true HA solution
-
- Closed
-
- relates to
-
MDEV-21168 Active XA transactions stop slave from working after backup was restored.
-
- Closed
-
-
MDEV-25395 server recovery hits replication event checksum error
-
- Stalled
-
-
MDEV-33424 when both rpl_semi_sync_MASTER,SLAVE_enabled set the server should recover as master
-
- Closed
-
-
MDEV-18959 Engine transaction recovery through persistent binlog
-
- Stalled
-
- links to
Observed reported issue on latest 10.1.44
MariaDB [(none)]> use test;
Database changed
MariaDB [test]> create table t ( f int ) engine=innodb;
Query OK, 0 rows affected (0.02 sec)
MariaDB [test]> insert into t values (10);
Query OK, 1 row affected (0.01 sec)
MariaDB [test]> insert into t values (20);
ERROR 2013 (HY000): Lost connection to MySQL server during query
Crash the server in the middle of commit.
Restart the server with "--tc-heuristic-recover=ROLLBACK"
Observed that the transaction which tried to insert '20' is rolled back in engine.
MariaDB [test]> select * from t;
+------+
| f |
+------+
| 10 |
+------+
But the binarylog retains the transaction which was rolled back in the engine.
The binlog 'master-bin.000001' is present post recovery
MariaDB [test]> show binlog events;
+-------------------+-----+-------------------+-----------+-------------+---------------------------------------------------+
| Log_name | Pos | Event_type | Server_id | End_log_pos | Info |
+-------------------+-----+-------------------+-----------+-------------+---------------------------------------------------+
| master-bin.000001 | 4 | Format_desc | 1 | 249 | Server ver: 10.1.44-MariaDB-debug, Binlog ver: 4 |
| master-bin.000001 | 249 | Gtid_list | 1 | 274 | [] |
| master-bin.000001 | 274 | Binlog_checkpoint | 1 | 314 | master-bin.000001 |
| master-bin.000001 | 314 | Gtid | 1 | 352 | GTID 0-1-1 |
| master-bin.000001 | 352 | Query | 1 | 457 | use `test`; create table t (f int ) engine=innodb |
| master-bin.000001 | 457 | Gtid | 1 | 495 | BEGIN GTID 0-1-2 |
| master-bin.000001 | 495 | Query | 1 | 587 | use `test`; insert into t values(10) |
| master-bin.000001 | 587 | Xid | 1 | 614 | COMMIT /* xid=7 */ |
| master-bin.000001 | 614 | Gtid | 1 | 652 | BEGIN GTID 0-1-3 |
| master-bin.000001 | 652 | Query | 1 | 744 | use `test`; insert into t values(20) |
| master-bin.000001 | 744 | Xid | 1 | 771 | COMMIT /* xid=8 */ |
+-------------------+-----+-------------------+-----------+-------------+---------------------------------------------------+
11 rows in set (0.00 sec)
GTID state is also preserved within a newly created binary log.
MariaDB [test]> show binlog events in 'master-bin.000002';
+-------------------+-----+-------------------+-----------+-------------+--------------------------------------------------+
| Log_name | Pos | Event_type | Server_id | End_log_pos | Info |
+-------------------+-----+-------------------+-----------+-------------+--------------------------------------------------+
| master-bin.000002 | 4 | Format_desc | 1 | 249 | Server ver: 10.1.44-MariaDB-debug, Binlog ver: 4 |
| master-bin.000002 | 249 | Gtid_list | 1 | 288 | [0-1-3] |
| master-bin.000002 | 288 | Binlog_checkpoint | 1 | 328 | master-bin.000002 |
| master-bin.000002 | 328 | Stop | 1 | 347 | |
+-------------------+-----+-------------------+-----------+-------------+--------------------------------------------------+
4 rows in set (0.00 sec)
1 row in set (0.00 sec)
This results in inconsistent state.