Details
-
Task
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
None
Description
When run after master server crash --tc-heuristic-recover=rollback produces inconsistent server state with binlog still containing transactions that were rolled back by the option.
Such way recovered server may not be used for replication. E.g when such way recovered
ex-master is demoted into slave its binlog state needs further correction to subtract
the rolled back transactions from its binlog status. Otherwise the "new" slave might claim
those transactions as locally present at the (gtid) master-slave connection protocol. At the same time the actual "new" master may never have seen those transactions (because they never arrived at it when it was formerly slave, due to the crash).
This issue should be fixed with refining the recovery procedure with truncating binlog to cut off the prepared rolled back transactions. The method is also known as pioneered by FB
https://percona.community/blog/2018/08/23/question-about-semi-synchronous-replication-answer-with-all-the-details/
Once a transaction reaches the binary logs it should roll forward.
Attachments
Issue Links
- blocks
-
MDEV-11855 Make semisync crash safe with the cluster
-
- Open
-
- causes
-
MDEV-33465 an option to enable semisync recovery
-
- Closed
-
- includes
-
MDEV-26652 xa transactions binlogged in wrong order
-
- Closed
-
- is duplicated by
-
MDEV-20996 Maxscale auto-failover with semi-sync replication is not providing a true HA solution
-
- Closed
-
- relates to
-
MDEV-21168 Active XA transactions stop slave from working after backup was restored.
-
- Closed
-
-
MDEV-25395 server recovery hits replication event checksum error
-
- Stalled
-
-
MDEV-33424 when both rpl_semi_sync_MASTER,SLAVE_enabled set the server should recover as master
-
- Closed
-
-
MDEV-18959 Engine transaction recovery through persistent binlog
-
- Stalled
-
- links to
Activity
Field | Original Value | New Value |
---|---|---|
Link |
This issue relates to |
Link | This issue relates to MENT-203 [ MENT-203 ] |
Description |
When run after master server crash {{--tc-heuristic-recover=rollback}} produces inconsistent server state with binlog still containing transactions that were rolled back by the option.
Such way recovered server may not be used for replication. E.g when such way recovered ex-master is demoted into slave its binlog state needs further correction to subtract the rolled back transactions from its binlog status. Otherwise the "new" slave might claim those transactions as locally present in the master-slave gtid connection protocol. At the same time the actual "new" master may never have seen those transactions (because they never arrived at it when it was formerly slave, due to the crash). This issue should be fixed with refining the recovery procedure with truncating binlog to cut off the prepared rolled back transactions. The method is also known as pioneered by FB https://www.percona.com/community-blog/2018/08/23/question-about-semi-synchronous-replication-answer-with-all-the-details/. |
When run after master server crash {{--tc-heuristic-recover=rollback}} produces inconsistent server state with binlog still containing transactions that were rolled back by the option.
Such way recovered server may not be used for replication. E.g when such way recovered ex-master is demoted into slave its binlog state needs further correction to subtract the rolled back transactions from its binlog status. Otherwise the "new" slave might claim those transactions as locally present at the (gtid) master-slave connection protocol. At the same time the actual "new" master may never have seen those transactions (because they never arrived at it when it was formerly slave, due to the crash). This issue should be fixed with refining the recovery procedure with truncating binlog to cut off the prepared rolled back transactions. The method is also known as pioneered by FB https://www.percona.com/community-blog/2018/08/23/question-about-semi-synchronous-replication-answer-with-all-the-details/. |
Remote Link | This issue links to "MySQL WL#5493: Binlog crash-safe when master crashed (Web Link)" [ 29310 ] |
Status | Open [ 1 ] | In Progress [ 3 ] |
Link |
This issue relates to |
Priority | Major [ 3 ] | Critical [ 2 ] |
Priority | Critical [ 2 ] | Blocker [ 1 ] |
Labels | need_feedback |
Labels | need_feedback |
Fix Version/s | 10.5 [ 23123 ] |
Affects Version/s | 10.5 [ 23123 ] |
Assignee | Sujatha Sivakumar [ sujatha.sivakumar ] | Andrei Elkin [ elkin ] |
Status | In Progress [ 3 ] | In Review [ 10002 ] |
Assignee | Andrei Elkin [ elkin ] | Sujatha Sivakumar [ sujatha.sivakumar ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
Link |
This issue is blocked by |
Assignee | Sujatha Sivakumar [ sujatha.sivakumar ] | Sergei Golubchik [ serg ] |
Status | Stalled [ 10000 ] | In Review [ 10002 ] |
Priority | Blocker [ 1 ] | Critical [ 2 ] |
Link |
This issue is blocked by |
Assignee | Sergei Golubchik [ serg ] | Andrei Elkin [ elkin ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
Status | Stalled [ 10000 ] | In Progress [ 3 ] |
Assignee | Andrei Elkin [ elkin ] | Sergei Golubchik [ serg ] |
Status | In Progress [ 3 ] | In Review [ 10002 ] |
Fix Version/s | 10.1 [ 16100 ] |
Fix Version/s | 10.5 [ 23123 ] |
Link | This issue blocks MENT-203 [ MENT-203 ] |
Link | This issue relates to MENT-203 [ MENT-203 ] |
Assignee | Sergei Golubchik [ serg ] | Andrei Elkin [ elkin ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
Status | Stalled [ 10000 ] | In Progress [ 3 ] |
Link | This issue relates to MDEV-24654 [ MDEV-24654 ] |
Assignee | Andrei Elkin [ elkin ] | Sergei Golubchik [ serg ] |
Status | In Progress [ 3 ] | In Review [ 10002 ] |
Link | This issue is blocked by MDEV-24654 [ MDEV-24654 ] |
Link | This issue relates to MDEV-24654 [ MDEV-24654 ] |
Attachment | recovery_design.txt [ 55817 ] |
Summary | --tc-heuristic-recover=rollback is not replication safe | recovery for --rpl-semi-sync-slave-enabled server |
Attachment | recovery_design.txt [ 55817 ] |
Attachment | recovery_design.txt [ 55820 ] |
Link | This issue is blocked by MDEV-24654 [ MDEV-24654 ] |
Attachment | recovery_design.txt [ 55892 ] |
Attachment | recovery_design.txt [ 55820 ] |
Assignee | Sergei Golubchik [ serg ] | Sujatha Sivakumar [ sujatha.sivakumar ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
Assignee | Sujatha Sivakumar [ sujatha.sivakumar ] | Andrei Elkin [ elkin ] |
Link |
This issue is duplicated by |
Link |
This issue relates to |
Link | This issue relates to MDEV-25395 [ MDEV-25395 ] |
Assignee | Andrei Elkin [ elkin ] | Sergei Golubchik [ serg ] |
Status | Stalled [ 10000 ] | In Review [ 10002 ] |
Summary | recovery for --rpl-semi-sync-slave-enabled server | refine the server binlog-based recovery for semisync |
Assignee | Sergei Golubchik [ serg ] | Andrei Elkin [ elkin ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
Link | This issue blocks MDEV-11855 [ MDEV-11855 ] |
Affects Version/s | 10.2 [ 14601 ] | |
Affects Version/s | 10.1 [ 16100 ] | |
Affects Version/s | 10.3 [ 22126 ] | |
Affects Version/s | 10.4 [ 22408 ] | |
Issue Type | Bug [ 1 ] | Task [ 3 ] |
Issue Type | Task [ 3 ] | Bug [ 1 ] |
Affects Version/s | 10.1 [ 16100 ] | |
Affects Version/s | 10.2 [ 14601 ] | |
Affects Version/s | 10.3 [ 22126 ] | |
Affects Version/s | 10.4 [ 22408 ] | |
Affects Version/s | 10.5 [ 23123 ] |
Affects Version/s | 10.2 [ 14601 ] | |
Affects Version/s | 10.1 [ 16100 ] | |
Affects Version/s | 10.3 [ 22126 ] | |
Affects Version/s | 10.4 [ 22408 ] | |
Affects Version/s | 10.5 [ 23123 ] | |
Issue Type | Bug [ 1 ] | Task [ 3 ] |
Fix Version/s | 10.6 [ 24028 ] | |
Fix Version/s | 10.2 [ 14601 ] | |
Fix Version/s | 10.3 [ 22126 ] | |
Fix Version/s | 10.4 [ 22408 ] | |
Fix Version/s | 10.5 [ 23123 ] |
Link | This issue blocks MENT-1187 [ MENT-1187 ] |
Comment | [ A comment with security level 'Developers' was removed. ] |
Status | Stalled [ 10000 ] | In Progress [ 3 ] |
Assignee | Andrei Elkin [ elkin ] | Sergei Golubchik [ serg ] |
Status | In Progress [ 3 ] | In Review [ 10002 ] |
Assignee | Sergei Golubchik [ serg ] | Andrei Elkin [ elkin ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
Assignee | Andrei Elkin [ elkin ] | Sergei Golubchik [ serg ] |
Status | Stalled [ 10000 ] | In Review [ 10002 ] |
Assignee | Sergei Golubchik [ serg ] | Andrei Elkin [ elkin ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
Fix Version/s | 10.6.2 [ 25800 ] | |
Fix Version/s | 10.6 [ 24028 ] | |
Resolution | Fixed [ 1 ] | |
Status | Stalled [ 10000 ] | Closed [ 6 ] |
Assignee | Andrei Elkin [ elkin ] | Ian Gilfillan [ greenman ] |
Labels | need_feedback |
Labels | need_feedback |
Link |
This issue relates to |
Link |
This issue includes |
Link |
This issue relates to |
Workflow | MariaDB v3 [ 101333 ] | MariaDB v4 [ 134141 ] |
Description |
When run after master server crash {{--tc-heuristic-recover=rollback}} produces inconsistent server state with binlog still containing transactions that were rolled back by the option.
Such way recovered server may not be used for replication. E.g when such way recovered ex-master is demoted into slave its binlog state needs further correction to subtract the rolled back transactions from its binlog status. Otherwise the "new" slave might claim those transactions as locally present at the (gtid) master-slave connection protocol. At the same time the actual "new" master may never have seen those transactions (because they never arrived at it when it was formerly slave, due to the crash). This issue should be fixed with refining the recovery procedure with truncating binlog to cut off the prepared rolled back transactions. The method is also known as pioneered by FB https://www.percona.com/community-blog/2018/08/23/question-about-semi-synchronous-replication-answer-with-all-the-details/. |
When run after master server crash {{--tc-heuristic-recover=rollback}} produces inconsistent server state with binlog still containing transactions that were rolled back by the option.
Such way recovered server may not be used for replication. E.g when such way recovered ex-master is demoted into slave its binlog state needs further correction to subtract the rolled back transactions from its binlog status. Otherwise the "new" slave might claim those transactions as locally present at the (gtid) master-slave connection protocol. At the same time the actual "new" master may never have seen those transactions (because they never arrived at it when it was formerly slave, due to the crash). This issue should be fixed with refining the recovery procedure with truncating binlog to cut off the prepared rolled back transactions. The method is also known as pioneered by FB https://percona.community/blog/2018/08/23/question-about-semi-synchronous-replication-answer-with-all-the-details/ |
Description |
When run after master server crash {{--tc-heuristic-recover=rollback}} produces inconsistent server state with binlog still containing transactions that were rolled back by the option.
Such way recovered server may not be used for replication. E.g when such way recovered ex-master is demoted into slave its binlog state needs further correction to subtract the rolled back transactions from its binlog status. Otherwise the "new" slave might claim those transactions as locally present at the (gtid) master-slave connection protocol. At the same time the actual "new" master may never have seen those transactions (because they never arrived at it when it was formerly slave, due to the crash). This issue should be fixed with refining the recovery procedure with truncating binlog to cut off the prepared rolled back transactions. The method is also known as pioneered by FB https://percona.community/blog/2018/08/23/question-about-semi-synchronous-replication-answer-with-all-the-details/ |
When run after master server crash {{--tc-heuristic-recover=rollback}} produces inconsistent server state with binlog still containing transactions that were rolled back by the option.
Such way recovered server may not be used for replication. E.g when such way recovered ex-master is demoted into slave its binlog state needs further correction to subtract the rolled back transactions from its binlog status. Otherwise the "new" slave might claim those transactions as locally present at the (gtid) master-slave connection protocol. At the same time the actual "new" master may never have seen those transactions (because they never arrived at it when it was formerly slave, due to the crash). This issue should be fixed with refining the recovery procedure with truncating binlog to cut off the prepared rolled back transactions. The method is also known as pioneered by FB https://percona.community/blog/2018/08/23/question-about-semi-synchronous-replication-answer-with-all-the-details/ Once a transaction reaches the binary logs it should roll forward. |
Assignee | Ian Gilfillan [ greenman ] | Andrei Elkin [ elkin ] |
Resolution | Fixed [ 1 ] | |
Status | Closed [ 6 ] | Stalled [ 10000 ] |
Assignee | Andrei Elkin [ elkin ] | Brandon Nesterenko [ JIRAUSER48702 ] |
Resolution | Fixed [ 1 ] | |
Status | Stalled [ 10000 ] | Closed [ 6 ] |
Link |
This issue relates to |
Resolution | Fixed [ 1 ] | |
Status | Closed [ 6 ] | Stalled [ 10000 ] |
Link |
This issue causes |
Resolution | Fixed [ 1 ] | |
Status | Stalled [ 10000 ] | Closed [ 6 ] |
Zendesk Related Tickets | 125800 172110 134539 |
Link | This issue relates to MDEV-18959 [ MDEV-18959 ] |
Observed reported issue on latest 10.1.44
MariaDB [(none)]> use test;
Database changed
MariaDB [test]> create table t ( f int ) engine=innodb;
Query OK, 0 rows affected (0.02 sec)
MariaDB [test]> insert into t values (10);
Query OK, 1 row affected (0.01 sec)
MariaDB [test]> insert into t values (20);
ERROR 2013 (HY000): Lost connection to MySQL server during query
Crash the server in the middle of commit.
Restart the server with "--tc-heuristic-recover=ROLLBACK"
Observed that the transaction which tried to insert '20' is rolled back in engine.
MariaDB [test]> select * from t;
+------+
| f |
+------+
| 10 |
+------+
But the binarylog retains the transaction which was rolled back in the engine.
The binlog 'master-bin.000001' is present post recovery
MariaDB [test]> show binlog events;
+-------------------+-----+-------------------+-----------+-------------+---------------------------------------------------+
| Log_name | Pos | Event_type | Server_id | End_log_pos | Info |
+-------------------+-----+-------------------+-----------+-------------+---------------------------------------------------+
| master-bin.000001 | 4 | Format_desc | 1 | 249 | Server ver: 10.1.44-MariaDB-debug, Binlog ver: 4 |
| master-bin.000001 | 249 | Gtid_list | 1 | 274 | [] |
| master-bin.000001 | 274 | Binlog_checkpoint | 1 | 314 | master-bin.000001 |
| master-bin.000001 | 314 | Gtid | 1 | 352 | GTID 0-1-1 |
| master-bin.000001 | 352 | Query | 1 | 457 | use `test`; create table t (f int ) engine=innodb |
| master-bin.000001 | 457 | Gtid | 1 | 495 | BEGIN GTID 0-1-2 |
| master-bin.000001 | 495 | Query | 1 | 587 | use `test`; insert into t values(10) |
| master-bin.000001 | 587 | Xid | 1 | 614 | COMMIT /* xid=7 */ |
| master-bin.000001 | 614 | Gtid | 1 | 652 | BEGIN GTID 0-1-3 |
| master-bin.000001 | 652 | Query | 1 | 744 | use `test`; insert into t values(20) |
| master-bin.000001 | 744 | Xid | 1 | 771 | COMMIT /* xid=8 */ |
+-------------------+-----+-------------------+-----------+-------------+---------------------------------------------------+
11 rows in set (0.00 sec)
GTID state is also preserved within a newly created binary log.
MariaDB [test]> show binlog events in 'master-bin.000002';
+-------------------+-----+-------------------+-----------+-------------+--------------------------------------------------+
| Log_name | Pos | Event_type | Server_id | End_log_pos | Info |
+-------------------+-----+-------------------+-----------+-------------+--------------------------------------------------+
| master-bin.000002 | 4 | Format_desc | 1 | 249 | Server ver: 10.1.44-MariaDB-debug, Binlog ver: 4 |
| master-bin.000002 | 249 | Gtid_list | 1 | 288 | [0-1-3] |
| master-bin.000002 | 288 | Binlog_checkpoint | 1 | 328 | master-bin.000002 |
| master-bin.000002 | 328 | Stop | 1 | 347 | |
+-------------------+-----+-------------------+-----------+-------------+--------------------------------------------------+
4 rows in set (0.00 sec)
1 row in set (0.00 sec)
This results in inconsistent state.