Details
-
New Feature
-
Status: Stalled (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
Description
A de-facto present recovery-related requirement of two calls of fsync() at
transaction prepare and commit by Engine per transaction
can be relaxed in favor of replacing the first fsync() by a group-fsync
of Binlog. Since when Binlog is turned ON transactions
group-committed/prepared the only fsync() per group resolves
optimization requests such as MDEV-11376.
When a trx is deposited into an fsynced binlog file its image
consisting of xid and payload suffices for its recovery. Specifically the
payload part can be effectively made use of to replay the transaction should
it have missed out the Engine write to disk.
As long as Engine maintains its last committed in binlog order durable
transaction tracking all the transactions above the last if found in binlog upon a
crash could are regarded as lost and be restored by re-applying of
their payload, that is their binlogged replication events.
The existing binlog checkpoint mechanism will continue to serve to
limit binlog files for recovery.
In the light of _MDEV-16589 sync_binlog = 1_ performance becomes a more concern.
MDEV-24386 shows up to 3 times grown latency and halved throughput with the new default value
and remained default of innodb_flush_log_at_trx_commit = 1.
At the same time innodb_flush_log_at_trx_commit = 0 still allows for recovery (though to be
extended) and
further benchmarking sysbench4.pdf of MDEV-24386 ensures the latency and performance of
(B = 1, I = 0) may be even better compare to (B = 0, I = 1) of the current (10.5) default.
Here B stands for sync_binlog, I for innodb_flush_log_at_trx_commit.
To the refined recovery, it needs to know engines involved in a transaction in doubt.
Specifically whether all the engines maintain the last committed transaction's binlog offset
in their persistent metadata.
For instance Innodb does so. This piece of info is crucial as at recovery
the engine may have the transaction or its branch
either a) already committed or b) not even prepared and which of the two is the case can be resolved only
with an "external" help such as the tracking facility: when the transaction starts in binlog
at an offset greater than that that the engine remembers of its last committed then
this transaction obviously is not yet committed.
Unlike all other cases in case of the single Innodb engine transaction
there is no need to specify the engine explicitly in the transaction's
binlog events.
The recovery procedure follows most of the conventional one's steps and adds up
the following rule, simplified here to a single engine:
when a transaction updates an engine that track binlog offset of their commits and
|
its binlog offset is greater than one of the last committed trx in the engine
|
then the transaction is to be re\-executed (unless it's already prepared then it is to
|
commit by the regular rules).
|
|
For the multiple engine and not-Innodb cases the property of involved engines can be
specified through extended Gtid_log_event. Consider a bitmap with the bits mapped to engines
on that local server.
The mapping is local for the server so it must be mere stable through crashes.
Gtid_log_event remembers the engines involved (except there is only
one Innodb) and at recovery the engines will be found and asked for the last commit binlog offset.
When there's an engine that does not track this transaction can't be re-executed, otherwise
branches of the in-doubt multi-engine transaction are considered individually taking into account
what the engine branch remembers of its last committed and the transaction binlog offset.
For re-execution consider MDEV-21469 as a template. MIXED binlog format guarantees re-execution
to repeat/reproduce the original changes.
Attachments
Issue Links
- blocks
-
MDEV-16589 default value for sync_binlog should be the safer value 1 instead of 0
-
- Stalled
-
- duplicates
-
MDEV-11376 AliSQL: [Feature] Issue#11 REDO LOG GROUP COMMIT AT SERVER LAYER
-
- Stalled
-
- is blocked by
-
MDEV-22351 InnoDB may report incorrect binlog position information after RESET MASTER
-
- Closed
-
- relates to
-
MDEV-11937 InnoDB flushes redo log too often
-
- Closed
-
-
MDEV-21117 refine the server binlog-based recovery for semisync
-
- Closed
-
-
MDEV-21469 Implement crash-safe logging of the user XA
-
- Stalled
-
-
MDEV-24341 Innodb - do not block in foreground thread in log_write_up_to()
-
- Closed
-
-
MDEV-25611 RESET MASTER still causes the server to hang
-
- Closed
-
-
MDEV-26603 asynchronous redo log write
-
- Closed
-
-
MDEV-34705 Storing binlog in InnoDB
-
- In Progress
-
- links to
Activity
Field | Original Value | New Value |
---|---|---|
Link | This issue duplicates MDEV-11376 [ MDEV-11376 ] |
Link |
This issue relates to |
Link |
This issue relates to |
NRE Projects | RM_105_CANDIDATE |
Link | This issue relates to MDEV-16589 [ MDEV-16589 ] |
Link | This issue relates to MDEV-16589 [ MDEV-16589 ] |
Assignee | Sujatha Sivakumar [ sujatha.sivakumar ] |
Link | This issue relates to MDEV-16589 [ MDEV-16589 ] |
Status | Open [ 1 ] | In Progress [ 3 ] |
Fix Version/s | 10.6 [ 24028 ] | |
Fix Version/s | 10.5 [ 23123 ] |
Link |
This issue is blocked by |
Remote Link | This issue links to "(AliSQL) Binlog in Redo (Web Link)" [ 29904 ] |
Priority | Major [ 3 ] | Critical [ 2 ] |
Link |
This issue relates to |
Description |
A de-facto present recovery-related requirement of two calls of {{fsync()}} at
transaction prepare and commit by Engine per transaction can be relaxed in favor of replacing the first {{fsync()}} by a group-fsync of Binlog. Since when Binlog is turned ON transactions group-committed/prepared the only {{fsync()}} per group resolves optimization requests such as MDEV-11376. When a trx is deposited into an fsynced binlog file its image consisting of xid and payload suffices for its recovery. Specifically the payload part can be effecively made use of to replay the transaction should it have missed out the Engine write to disk. As long as Engine maintains its last committed in binlog order durable transaction tracking all the transactions above the last if found in binlog upon a crash could are regarded as lost and be restored by re-applying of their payload, that is their binlogged replication events. The existing binlog checkpoint mechanism will continue to serve to limit binlog files for recovery. |
A de-facto present recovery-related requirement of two calls of {{fsync()}} at
transaction prepare and commit by Engine per transaction can be relaxed in favor of replacing the first {{fsync()}} by a group-fsync of Binlog. Since when Binlog is turned ON transactions group-committed/prepared the only {{fsync()}} per group resolves optimization requests such as MDEV-11376. When a trx is deposited into an fsynced binlog file its image consisting of xid and payload suffices for its recovery. Specifically the payload part can be effecively made use of to replay the transaction should it have missed out the Engine write to disk. As long as Engine maintains its last committed in binlog order durable transaction tracking all the transactions above the last if found in binlog upon a crash could are regarded as lost and be restored by re-applying of their payload, that is their binlogged replication events. The existing binlog checkpoint mechanism will continue to serve to limit binlog files for recovery. In the light of _MDEV-16589 sync_binlog = 1_ performance becomes a more concern. and remained default of {{innodb_flush_log_at_trx_commit = 1}}. At the same time {{innodb_flush_log_at_trx_commit = 0}} still allows for recovery (though to be extened) *and* further benchmarking *sysbench4.pdf* of {{(B = 1, I = 0)}} may be even better compare to {{(B = 0, I = 1)}} of the current (10.5) default. Here {{B}} stands for {{sync_binlog}}, {{I}} for {{innodb_flush_log_at_trx_commit}}. To the refined recovery, it needs to know engines involved in a transaction in doubt. Specifically whether all the engines maintain the last committed transaction's binlog offset in their persistent metadata. For instance Innodb does so. This piece of info is crucial as at recovery the engine may have the transaction or its branch already *committed* or *not even prepared* and which of the two is the case can be resolved only with an "external" help such as the tracking facility: when the transaction starts in binlog at an offset greater than that that the engine remembers of its last committed then this transaction obviously is not yet committed. Unlike all other cases in case of the single Innodb engine transaction there is no need to specify the engine explicitly in the transaction's binlog events. The recovery procedure follows most of the conventional one's steps and adds up the following rule, simplified here to a single engine: when a transaction updates an engine that track binlog offset of their commits and its binlog offset is greater than one of the last committed trx in the engine then the transaction is to be re-executed (unless it's already prepared then it is to commit by the regular rules). For the multiple engine and not-Innodb cases the property of involved engines can be specified through extended {{Gtid_log_event}}. Consider a bitmap with the bits mapped to engines on that local server. The mapping is local for the server so it must be mere stable through crashes. Gtid_log_event remembers the engines involved (except there is only one Innodb) and at recovery the engines will be found and asked for the last commit binlog offset. When there's an engine that does not track this transaction can't be re-executed, otherwise branches of the in-dbout multi-engine transaction are considered individually taking into account what the engine branch remembers of its last committed and the transaction binlog offset. For re-execution consider MDEV-21465 as a template. MIXED binlog format guarantees re-execution to repeat/reproduce the original changes. |
Assignee | Sujatha Sivakumar [ sujatha.sivakumar ] | Sergei Golubchik [ serg ] |
Description |
A de-facto present recovery-related requirement of two calls of {{fsync()}} at
transaction prepare and commit by Engine per transaction can be relaxed in favor of replacing the first {{fsync()}} by a group-fsync of Binlog. Since when Binlog is turned ON transactions group-committed/prepared the only {{fsync()}} per group resolves optimization requests such as MDEV-11376. When a trx is deposited into an fsynced binlog file its image consisting of xid and payload suffices for its recovery. Specifically the payload part can be effecively made use of to replay the transaction should it have missed out the Engine write to disk. As long as Engine maintains its last committed in binlog order durable transaction tracking all the transactions above the last if found in binlog upon a crash could are regarded as lost and be restored by re-applying of their payload, that is their binlogged replication events. The existing binlog checkpoint mechanism will continue to serve to limit binlog files for recovery. In the light of _MDEV-16589 sync_binlog = 1_ performance becomes a more concern. and remained default of {{innodb_flush_log_at_trx_commit = 1}}. At the same time {{innodb_flush_log_at_trx_commit = 0}} still allows for recovery (though to be extened) *and* further benchmarking *sysbench4.pdf* of {{(B = 1, I = 0)}} may be even better compare to {{(B = 0, I = 1)}} of the current (10.5) default. Here {{B}} stands for {{sync_binlog}}, {{I}} for {{innodb_flush_log_at_trx_commit}}. To the refined recovery, it needs to know engines involved in a transaction in doubt. Specifically whether all the engines maintain the last committed transaction's binlog offset in their persistent metadata. For instance Innodb does so. This piece of info is crucial as at recovery the engine may have the transaction or its branch already *committed* or *not even prepared* and which of the two is the case can be resolved only with an "external" help such as the tracking facility: when the transaction starts in binlog at an offset greater than that that the engine remembers of its last committed then this transaction obviously is not yet committed. Unlike all other cases in case of the single Innodb engine transaction there is no need to specify the engine explicitly in the transaction's binlog events. The recovery procedure follows most of the conventional one's steps and adds up the following rule, simplified here to a single engine: when a transaction updates an engine that track binlog offset of their commits and its binlog offset is greater than one of the last committed trx in the engine then the transaction is to be re-executed (unless it's already prepared then it is to commit by the regular rules). For the multiple engine and not-Innodb cases the property of involved engines can be specified through extended {{Gtid_log_event}}. Consider a bitmap with the bits mapped to engines on that local server. The mapping is local for the server so it must be mere stable through crashes. Gtid_log_event remembers the engines involved (except there is only one Innodb) and at recovery the engines will be found and asked for the last commit binlog offset. When there's an engine that does not track this transaction can't be re-executed, otherwise branches of the in-dbout multi-engine transaction are considered individually taking into account what the engine branch remembers of its last committed and the transaction binlog offset. For re-execution consider MDEV-21465 as a template. MIXED binlog format guarantees re-execution to repeat/reproduce the original changes. |
A de-facto present recovery-related requirement of two calls of {{fsync()}} at
transaction prepare and commit by Engine per transaction can be relaxed in favor of replacing the first {{fsync()}} by a group-fsync of Binlog. Since when Binlog is turned ON transactions group-committed/prepared the only {{fsync()}} per group resolves optimization requests such as MDEV-11376. When a trx is deposited into an fsynced binlog file its image consisting of xid and payload suffices for its recovery. Specifically the payload part can be effectively made use of to replay the transaction should it have missed out the Engine write to disk. As long as Engine maintains its last committed in binlog order durable transaction tracking all the transactions above the last if found in binlog upon a crash could are regarded as lost and be restored by re-applying of their payload, that is their binlogged replication events. The existing binlog checkpoint mechanism will continue to serve to limit binlog files for recovery. In the light of _MDEV-16589 sync_binlog = 1_ performance becomes a more concern. and remained default of {{innodb_flush_log_at_trx_commit = 1}}. At the same time {{innodb_flush_log_at_trx_commit = 0}} still allows for recovery (though to be extended) *and* further benchmarking *sysbench4.pdf* of {{(B = 1, I = 0)}} may be even better compare to {{(B = 0, I = 1)}} of the current (10.5) default. Here {{B}} stands for {{sync_binlog}}, {{I}} for {{innodb_flush_log_at_trx_commit}}. To the refined recovery, it needs to know engines involved in a transaction in doubt. Specifically whether all the engines maintain the last committed transaction's binlog offset in their persistent metadata. For instance Innodb does so. This piece of info is crucial as at recovery the engine may have the transaction or its branch already *committed* or *not even prepared* and which of the two is the case can be resolved only with an "external" help such as the tracking facility: when the transaction starts in binlog at an offset greater than that that the engine remembers of its last committed then this transaction obviously is not yet committed. Unlike all other cases in case of the single Innodb engine transaction there is no need to specify the engine explicitly in the transaction's binlog events. The recovery procedure follows most of the conventional one's steps and adds up the following rule, simplified here to a single engine: when a transaction updates an engine that track binlog offset of their commits and its binlog offset is greater than one of the last committed trx in the engine then the transaction is to be re-executed (unless it's already prepared then it is to commit by the regular rules). For the multiple engine and not-Innodb cases the property of involved engines can be specified through extended {{Gtid_log_event}}. Consider a bitmap with the bits mapped to engines on that local server. The mapping is local for the server so it must be mere stable through crashes. Gtid_log_event remembers the engines involved (except there is only one Innodb) and at recovery the engines will be found and asked for the last commit binlog offset. When there's an engine that does not track this transaction can't be re-executed, otherwise branches of the in-doubt multi-engine transaction are considered individually taking into account what the engine branch remembers of its last committed and the transaction binlog offset. For re-execution consider MDEV-21465 as a template. MIXED binlog format guarantees re-execution to repeat/reproduce the original changes. |
Description |
A de-facto present recovery-related requirement of two calls of {{fsync()}} at
transaction prepare and commit by Engine per transaction can be relaxed in favor of replacing the first {{fsync()}} by a group-fsync of Binlog. Since when Binlog is turned ON transactions group-committed/prepared the only {{fsync()}} per group resolves optimization requests such as MDEV-11376. When a trx is deposited into an fsynced binlog file its image consisting of xid and payload suffices for its recovery. Specifically the payload part can be effectively made use of to replay the transaction should it have missed out the Engine write to disk. As long as Engine maintains its last committed in binlog order durable transaction tracking all the transactions above the last if found in binlog upon a crash could are regarded as lost and be restored by re-applying of their payload, that is their binlogged replication events. The existing binlog checkpoint mechanism will continue to serve to limit binlog files for recovery. In the light of _MDEV-16589 sync_binlog = 1_ performance becomes a more concern. and remained default of {{innodb_flush_log_at_trx_commit = 1}}. At the same time {{innodb_flush_log_at_trx_commit = 0}} still allows for recovery (though to be extended) *and* further benchmarking *sysbench4.pdf* of {{(B = 1, I = 0)}} may be even better compare to {{(B = 0, I = 1)}} of the current (10.5) default. Here {{B}} stands for {{sync_binlog}}, {{I}} for {{innodb_flush_log_at_trx_commit}}. To the refined recovery, it needs to know engines involved in a transaction in doubt. Specifically whether all the engines maintain the last committed transaction's binlog offset in their persistent metadata. For instance Innodb does so. This piece of info is crucial as at recovery the engine may have the transaction or its branch already *committed* or *not even prepared* and which of the two is the case can be resolved only with an "external" help such as the tracking facility: when the transaction starts in binlog at an offset greater than that that the engine remembers of its last committed then this transaction obviously is not yet committed. Unlike all other cases in case of the single Innodb engine transaction there is no need to specify the engine explicitly in the transaction's binlog events. The recovery procedure follows most of the conventional one's steps and adds up the following rule, simplified here to a single engine: when a transaction updates an engine that track binlog offset of their commits and its binlog offset is greater than one of the last committed trx in the engine then the transaction is to be re-executed (unless it's already prepared then it is to commit by the regular rules). For the multiple engine and not-Innodb cases the property of involved engines can be specified through extended {{Gtid_log_event}}. Consider a bitmap with the bits mapped to engines on that local server. The mapping is local for the server so it must be mere stable through crashes. Gtid_log_event remembers the engines involved (except there is only one Innodb) and at recovery the engines will be found and asked for the last commit binlog offset. When there's an engine that does not track this transaction can't be re-executed, otherwise branches of the in-doubt multi-engine transaction are considered individually taking into account what the engine branch remembers of its last committed and the transaction binlog offset. For re-execution consider MDEV-21465 as a template. MIXED binlog format guarantees re-execution to repeat/reproduce the original changes. |
A de-facto present recovery-related requirement of two calls of {{fsync()}} at
transaction prepare and commit by Engine per transaction can be relaxed in favor of replacing the first {{fsync()}} by a group-fsync of Binlog. Since when Binlog is turned ON transactions group-committed/prepared the only {{fsync()}} per group resolves optimization requests such as MDEV-11376. When a trx is deposited into an fsynced binlog file its image consisting of xid and payload suffices for its recovery. Specifically the payload part can be effectively made use of to replay the transaction should it have missed out the Engine write to disk. As long as Engine maintains its last committed in binlog order durable transaction tracking all the transactions above the last if found in binlog upon a crash could are regarded as lost and be restored by re-applying of their payload, that is their binlogged replication events. The existing binlog checkpoint mechanism will continue to serve to limit binlog files for recovery. In the light of _MDEV-16589 sync_binlog = 1_ performance becomes a more concern. and remained default of {{innodb_flush_log_at_trx_commit = 1}}. At the same time {{innodb_flush_log_at_trx_commit = 0}} still allows for recovery (though to be extended) *and* further benchmarking *sysbench4.pdf* of {{(B = 1, I = 0)}} may be even better compare to {{(B = 0, I = 1)}} of the current (10.5) default. Here {{B}} stands for {{sync_binlog}}, {{I}} for {{innodb_flush_log_at_trx_commit}}. To the refined recovery, it needs to know engines involved in a transaction in doubt. Specifically whether all the engines maintain the last committed transaction's binlog offset in their persistent metadata. For instance Innodb does so. This piece of info is crucial as at recovery the engine may have the transaction or its branch already *committed* or *not even prepared* and which of the two is the case can be resolved only with an "external" help such as the tracking facility: when the transaction starts in binlog at an offset greater than that that the engine remembers of its last committed then this transaction obviously is not yet committed. Unlike all other cases in case of the single Innodb engine transaction there is no need to specify the engine explicitly in the transaction's binlog events. The recovery procedure follows most of the conventional one's steps and adds up the following rule, simplified here to a single engine: {noformat} when a transaction updates an engine that track binlog offset of their commits and its binlog offset is greater than one of the last committed trx in the engine then the transaction is to be re-executed (unless it's already prepared then it is to commit by the regular rules). {noformat} For the multiple engine and not-Innodb cases the property of involved engines can be specified through extended {{Gtid_log_event}}. Consider a bitmap with the bits mapped to engines on that local server. The mapping is local for the server so it must be mere stable through crashes. Gtid_log_event remembers the engines involved (except there is only one Innodb) and at recovery the engines will be found and asked for the last commit binlog offset. When there's an engine that does not track this transaction can't be re-executed, otherwise branches of the in-doubt multi-engine transaction are considered individually taking into account what the engine branch remembers of its last committed and the transaction binlog offset. For re-execution consider MDEV-21465 as a template. MIXED binlog format guarantees re-execution to repeat/reproduce the original changes. |
Description |
A de-facto present recovery-related requirement of two calls of {{fsync()}} at
transaction prepare and commit by Engine per transaction can be relaxed in favor of replacing the first {{fsync()}} by a group-fsync of Binlog. Since when Binlog is turned ON transactions group-committed/prepared the only {{fsync()}} per group resolves optimization requests such as MDEV-11376. When a trx is deposited into an fsynced binlog file its image consisting of xid and payload suffices for its recovery. Specifically the payload part can be effectively made use of to replay the transaction should it have missed out the Engine write to disk. As long as Engine maintains its last committed in binlog order durable transaction tracking all the transactions above the last if found in binlog upon a crash could are regarded as lost and be restored by re-applying of their payload, that is their binlogged replication events. The existing binlog checkpoint mechanism will continue to serve to limit binlog files for recovery. In the light of _MDEV-16589 sync_binlog = 1_ performance becomes a more concern. and remained default of {{innodb_flush_log_at_trx_commit = 1}}. At the same time {{innodb_flush_log_at_trx_commit = 0}} still allows for recovery (though to be extended) *and* further benchmarking *sysbench4.pdf* of {{(B = 1, I = 0)}} may be even better compare to {{(B = 0, I = 1)}} of the current (10.5) default. Here {{B}} stands for {{sync_binlog}}, {{I}} for {{innodb_flush_log_at_trx_commit}}. To the refined recovery, it needs to know engines involved in a transaction in doubt. Specifically whether all the engines maintain the last committed transaction's binlog offset in their persistent metadata. For instance Innodb does so. This piece of info is crucial as at recovery the engine may have the transaction or its branch already *committed* or *not even prepared* and which of the two is the case can be resolved only with an "external" help such as the tracking facility: when the transaction starts in binlog at an offset greater than that that the engine remembers of its last committed then this transaction obviously is not yet committed. Unlike all other cases in case of the single Innodb engine transaction there is no need to specify the engine explicitly in the transaction's binlog events. The recovery procedure follows most of the conventional one's steps and adds up the following rule, simplified here to a single engine: {noformat} when a transaction updates an engine that track binlog offset of their commits and its binlog offset is greater than one of the last committed trx in the engine then the transaction is to be re-executed (unless it's already prepared then it is to commit by the regular rules). {noformat} For the multiple engine and not-Innodb cases the property of involved engines can be specified through extended {{Gtid_log_event}}. Consider a bitmap with the bits mapped to engines on that local server. The mapping is local for the server so it must be mere stable through crashes. Gtid_log_event remembers the engines involved (except there is only one Innodb) and at recovery the engines will be found and asked for the last commit binlog offset. When there's an engine that does not track this transaction can't be re-executed, otherwise branches of the in-doubt multi-engine transaction are considered individually taking into account what the engine branch remembers of its last committed and the transaction binlog offset. For re-execution consider MDEV-21465 as a template. MIXED binlog format guarantees re-execution to repeat/reproduce the original changes. |
A de-facto present recovery-related requirement of two calls of {{fsync()}} at
transaction prepare and commit by Engine per transaction can be relaxed in favor of replacing the first {{fsync()}} by a group-fsync of Binlog. Since when Binlog is turned ON transactions group-committed/prepared the only {{fsync()}} per group resolves optimization requests such as MDEV-11376. When a trx is deposited into an fsynced binlog file its image consisting of xid and payload suffices for its recovery. Specifically the payload part can be effectively made use of to replay the transaction should it have missed out the Engine write to disk. As long as Engine maintains its last committed in binlog order durable transaction tracking all the transactions above the last if found in binlog upon a crash could are regarded as lost and be restored by re-applying of their payload, that is their binlogged replication events. The existing binlog checkpoint mechanism will continue to serve to limit binlog files for recovery. In the light of _MDEV-16589 sync_binlog = 1_ performance becomes a more concern. and remained default of {{innodb_flush_log_at_trx_commit = 1}}. At the same time {{innodb_flush_log_at_trx_commit = 0}} still allows for recovery (though to be extended) *and* further benchmarking *sysbench4.pdf* of {{(B = 1, I = 0)}} may be even better compare to {{(B = 0, I = 1)}} of the current (10.5) default. Here {{B}} stands for {{sync_binlog}}, {{I}} for {{innodb_flush_log_at_trx_commit}}. To the refined recovery, it needs to know engines involved in a transaction in doubt. Specifically whether all the engines maintain the last committed transaction's binlog offset in their persistent metadata. For instance Innodb does so. This piece of info is crucial as at recovery the engine may have the transaction or its branch either a) already *committed* or b) *not even prepared* and which of the two is the case can be resolved only with an "external" help such as the tracking facility: when the transaction starts in binlog at an offset greater than that that the engine remembers of its last committed then this transaction obviously is not yet committed. Unlike all other cases in case of the single Innodb engine transaction there is no need to specify the engine explicitly in the transaction's binlog events. The recovery procedure follows most of the conventional one's steps and adds up the following rule, simplified here to a single engine: {noformat} when a transaction updates an engine that track binlog offset of their commits and its binlog offset is greater than one of the last committed trx in the engine then the transaction is to be re-executed (unless it's already prepared then it is to commit by the regular rules). {noformat} For the multiple engine and not-Innodb cases the property of involved engines can be specified through extended {{Gtid_log_event}}. Consider a bitmap with the bits mapped to engines on that local server. The mapping is local for the server so it must be mere stable through crashes. Gtid_log_event remembers the engines involved (except there is only one Innodb) and at recovery the engines will be found and asked for the last commit binlog offset. When there's an engine that does not track this transaction can't be re-executed, otherwise branches of the in-doubt multi-engine transaction are considered individually taking into account what the engine branch remembers of its last committed and the transaction binlog offset. For re-execution consider MDEV-21465 as a template. MIXED binlog format guarantees re-execution to repeat/reproduce the original changes. |
Status | In Progress [ 3 ] | In Review [ 10002 ] |
Link | This issue relates to MDEV-21469 [ MDEV-21469 ] |
Assignee | Sergei Golubchik [ serg ] | Andrei Elkin [ elkin ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
Link | This issue blocks MDEV-16589 [ MDEV-16589 ] |
Link | This issue relates to MDEV-16589 [ MDEV-16589 ] |
Status | Stalled [ 10000 ] | In Progress [ 3 ] |
Fix Version/s | 10.7 [ 24805 ] | |
Fix Version/s | 10.6 [ 24028 ] |
Priority | Critical [ 2 ] | Major [ 3 ] |
Status | In Progress [ 3 ] | Stalled [ 10000 ] |
Link |
This issue relates to |
Link |
This issue relates to |
Link |
This issue relates to |
Fix Version/s | 10.8 [ 26121 ] | |
Fix Version/s | 10.7 [ 24805 ] |
Workflow | MariaDB v3 [ 93351 ] | MariaDB v4 [ 131717 ] |
Fix Version/s | 10.9 [ 26905 ] | |
Fix Version/s | 10.8 [ 26121 ] |
Fix Version/s | 10.10 [ 27530 ] | |
Fix Version/s | 10.9 [ 26905 ] |
Fix Version/s | 10.11 [ 27614 ] | |
Fix Version/s | 10.10 [ 27530 ] |
Fix Version/s | 10.12 [ 28320 ] | |
Fix Version/s | 10.11 [ 27614 ] |
Description |
A de-facto present recovery-related requirement of two calls of {{fsync()}} at
transaction prepare and commit by Engine per transaction can be relaxed in favor of replacing the first {{fsync()}} by a group-fsync of Binlog. Since when Binlog is turned ON transactions group-committed/prepared the only {{fsync()}} per group resolves optimization requests such as MDEV-11376. When a trx is deposited into an fsynced binlog file its image consisting of xid and payload suffices for its recovery. Specifically the payload part can be effectively made use of to replay the transaction should it have missed out the Engine write to disk. As long as Engine maintains its last committed in binlog order durable transaction tracking all the transactions above the last if found in binlog upon a crash could are regarded as lost and be restored by re-applying of their payload, that is their binlogged replication events. The existing binlog checkpoint mechanism will continue to serve to limit binlog files for recovery. In the light of _MDEV-16589 sync_binlog = 1_ performance becomes a more concern. and remained default of {{innodb_flush_log_at_trx_commit = 1}}. At the same time {{innodb_flush_log_at_trx_commit = 0}} still allows for recovery (though to be extended) *and* further benchmarking *sysbench4.pdf* of {{(B = 1, I = 0)}} may be even better compare to {{(B = 0, I = 1)}} of the current (10.5) default. Here {{B}} stands for {{sync_binlog}}, {{I}} for {{innodb_flush_log_at_trx_commit}}. To the refined recovery, it needs to know engines involved in a transaction in doubt. Specifically whether all the engines maintain the last committed transaction's binlog offset in their persistent metadata. For instance Innodb does so. This piece of info is crucial as at recovery the engine may have the transaction or its branch either a) already *committed* or b) *not even prepared* and which of the two is the case can be resolved only with an "external" help such as the tracking facility: when the transaction starts in binlog at an offset greater than that that the engine remembers of its last committed then this transaction obviously is not yet committed. Unlike all other cases in case of the single Innodb engine transaction there is no need to specify the engine explicitly in the transaction's binlog events. The recovery procedure follows most of the conventional one's steps and adds up the following rule, simplified here to a single engine: {noformat} when a transaction updates an engine that track binlog offset of their commits and its binlog offset is greater than one of the last committed trx in the engine then the transaction is to be re-executed (unless it's already prepared then it is to commit by the regular rules). {noformat} For the multiple engine and not-Innodb cases the property of involved engines can be specified through extended {{Gtid_log_event}}. Consider a bitmap with the bits mapped to engines on that local server. The mapping is local for the server so it must be mere stable through crashes. Gtid_log_event remembers the engines involved (except there is only one Innodb) and at recovery the engines will be found and asked for the last commit binlog offset. When there's an engine that does not track this transaction can't be re-executed, otherwise branches of the in-doubt multi-engine transaction are considered individually taking into account what the engine branch remembers of its last committed and the transaction binlog offset. For re-execution consider MDEV-21465 as a template. MIXED binlog format guarantees re-execution to repeat/reproduce the original changes. |
A de-facto present recovery-related requirement of two calls of {{fsync()}} at transaction prepare and commit by Engine per transaction can be relaxed in favor of replacing the first {{fsync()}} by a group\-fsync of Binlog. Since when Binlog is turned ON transactions group\-committed/prepared the only {{fsync()}} per group resolves optimization requests such as MDEV\-11376. When a trx is deposited into an fsynced binlog file its image consisting of xid and payload suffices for its recovery. Specifically the payload part can be effectively made use of to replay the transaction should it have missed out the Engine write to disk. As long as Engine maintains its last committed in binlog order durable transaction tracking all the transactions above the last if found in binlog upon a crash could are regarded as lost and be restored by re\-applying of their payload, that is their binlogged replication events. The existing binlog checkpoint mechanism will continue to serve to limit binlog files for recovery. In the light of _MDEV\-16589 sync_binlog = 1\_ performance becomes a more concern. MDEV\-24386 shows up to *3 times* grown latency and *halved* throughput with the new default value and remained default of {{innodb_flush_log_at_trx_commit = 1}}. At the same time {{innodb_flush_log_at_trx_commit = 0}} still allows for recovery (though to be extended) *and* further benchmarking *sysbench4.pdf* of MDEV\-24386 ensures the latency and performance of {{(B = 1, I = 0)}} may be even better compare to {{(B = 0, I = 1)}} of the current (10.5) default. Here {{B}} stands for {{sync_binlog}}, {{I}} for {{innodb_flush_log_at_trx_commit}}. To the refined recovery, it needs to know engines involved in a transaction in doubt. Specifically whether all the engines maintain the last committed transaction's binlog offset in their persistent metadata. For instance Innodb does so. This piece of info is crucial as at recovery the engine may have the transaction or its branch either a) already *committed* or b) *not even prepared* and which of the two is the case can be resolved only with an "external" help such as the tracking facility: when the transaction starts in binlog at an offset greater than that that the engine remembers of its last committed then this transaction obviously is not yet committed. Unlike all other cases in case of the single Innodb engine transaction there is no need to specify the engine explicitly in the transaction's binlog events. The recovery procedure follows most of the conventional one's steps and adds up the following rule, simplified here to a single engine: \{noformat\} when a transaction updates an engine that track binlog offset of their commits and its binlog offset is greater than one of the last committed trx in the engine then the transaction is to be re\-executed (unless it's already prepared then it is to commit by the regular rules). \{noformat\} For the multiple engine and not\-Innodb cases the property of involved engines can be specified through extended {{Gtid_log_event}}. Consider a bitmap with the bits mapped to engines on that local server. The mapping is local for the server so it must be mere stable through crashes. Gtid_log_event remembers the engines involved (except there is only one Innodb) and at recovery the engines will be found and asked for the last commit binlog offset. When there's an engine that does not track this transaction can't be re\-executed, otherwise branches of the in-doubt multi-engine transaction are considered individually taking into account what the engine branch remembers of its last committed and the transaction binlog offset. For re-execution consider MDEV-21465 as a template. MIXED binlog format guarantees re\-execution to repeat/reproduce the original changes. |
Description |
A de-facto present recovery-related requirement of two calls of {{fsync()}} at transaction prepare and commit by Engine per transaction can be relaxed in favor of replacing the first {{fsync()}} by a group\-fsync of Binlog. Since when Binlog is turned ON transactions group\-committed/prepared the only {{fsync()}} per group resolves optimization requests such as MDEV\-11376. When a trx is deposited into an fsynced binlog file its image consisting of xid and payload suffices for its recovery. Specifically the payload part can be effectively made use of to replay the transaction should it have missed out the Engine write to disk. As long as Engine maintains its last committed in binlog order durable transaction tracking all the transactions above the last if found in binlog upon a crash could are regarded as lost and be restored by re\-applying of their payload, that is their binlogged replication events. The existing binlog checkpoint mechanism will continue to serve to limit binlog files for recovery. In the light of _MDEV\-16589 sync_binlog = 1\_ performance becomes a more concern. MDEV\-24386 shows up to *3 times* grown latency and *halved* throughput with the new default value and remained default of {{innodb_flush_log_at_trx_commit = 1}}. At the same time {{innodb_flush_log_at_trx_commit = 0}} still allows for recovery (though to be extended) *and* further benchmarking *sysbench4.pdf* of MDEV\-24386 ensures the latency and performance of {{(B = 1, I = 0)}} may be even better compare to {{(B = 0, I = 1)}} of the current (10.5) default. Here {{B}} stands for {{sync_binlog}}, {{I}} for {{innodb_flush_log_at_trx_commit}}. To the refined recovery, it needs to know engines involved in a transaction in doubt. Specifically whether all the engines maintain the last committed transaction's binlog offset in their persistent metadata. For instance Innodb does so. This piece of info is crucial as at recovery the engine may have the transaction or its branch either a) already *committed* or b) *not even prepared* and which of the two is the case can be resolved only with an "external" help such as the tracking facility: when the transaction starts in binlog at an offset greater than that that the engine remembers of its last committed then this transaction obviously is not yet committed. Unlike all other cases in case of the single Innodb engine transaction there is no need to specify the engine explicitly in the transaction's binlog events. The recovery procedure follows most of the conventional one's steps and adds up the following rule, simplified here to a single engine: \{noformat\} when a transaction updates an engine that track binlog offset of their commits and its binlog offset is greater than one of the last committed trx in the engine then the transaction is to be re\-executed (unless it's already prepared then it is to commit by the regular rules). \{noformat\} For the multiple engine and not\-Innodb cases the property of involved engines can be specified through extended {{Gtid_log_event}}. Consider a bitmap with the bits mapped to engines on that local server. The mapping is local for the server so it must be mere stable through crashes. Gtid_log_event remembers the engines involved (except there is only one Innodb) and at recovery the engines will be found and asked for the last commit binlog offset. When there's an engine that does not track this transaction can't be re\-executed, otherwise branches of the in-doubt multi-engine transaction are considered individually taking into account what the engine branch remembers of its last committed and the transaction binlog offset. For re-execution consider MDEV-21465 as a template. MIXED binlog format guarantees re\-execution to repeat/reproduce the original changes. |
A de-facto present recovery-related requirement of two calls of {{fsync()}} at
transaction prepare and commit by Engine per transaction can be relaxed in favor of replacing the first {{fsync()}} by a group\-fsync of Binlog. Since when Binlog is turned ON transactions group\-committed/prepared the only {{fsync()}} per group resolves optimization requests such as MDEV-11376. When a trx is deposited into an fsynced binlog file its image consisting of xid and payload suffices for its recovery. Specifically the payload part can be effectively made use of to replay the transaction should it have missed out the Engine write to disk. As long as Engine maintains its last committed in binlog order durable transaction tracking all the transactions above the last if found in binlog upon a crash could are regarded as lost and be restored by re\-applying of their payload, that is their binlogged replication events. The existing binlog checkpoint mechanism will continue to serve to limit binlog files for recovery. In the light of _MDEV-16589 sync_binlog = 1\_ performance becomes a more concern. and remained default of {{innodb_flush_log_at_trx_commit = 1}}. At the same time {{innodb_flush_log_at_trx_commit = 0}} still allows for recovery (though to be extended) *and* further benchmarking *sysbench4.pdf* of MDEV\-24386 ensures the latency and performance of {{(B = 1, I = 0)}} may be even better compare to {{(B = 0, I = 1)}} of the current (10.5) default. Here {{B}} stands for {{sync_binlog}}, {{I}} for {{innodb_flush_log_at_trx_commit}}. To the refined recovery, it needs to know engines involved in a transaction in doubt. Specifically whether all the engines maintain the last committed transaction's binlog offset in their persistent metadata. For instance Innodb does so. This piece of info is crucial as at recovery the engine may have the transaction or its branch either a) already *committed* or b) *not even prepared* and which of the two is the case can be resolved only with an "external" help such as the tracking facility: when the transaction starts in binlog at an offset greater than that that the engine remembers of its last committed then this transaction obviously is not yet committed. Unlike all other cases in case of the single Innodb engine transaction there is no need to specify the engine explicitly in the transaction's binlog events. The recovery procedure follows most of the conventional one's steps and adds up the following rule, simplified here to a single engine: {noformat} when a transaction updates an engine that track binlog offset of their commits and its binlog offset is greater than one of the last committed trx in the engine then the transaction is to be re\-executed (unless it's already prepared then it is to commit by the regular rules). {noformat} For the multiple engine and not\-Innodb cases the property of involved engines can be specified through extended {{Gtid_log_event}}. Consider a bitmap with the bits mapped to engines on that local server. The mapping is local for the server so it must be mere stable through crashes. Gtid_log_event remembers the engines involved (except there is only one Innodb) and at recovery the engines will be found and asked for the last commit binlog offset. When there's an engine that does not track this transaction can't be re\-executed, otherwise branches of the in-doubt multi-engine transaction are considered individually taking into account what the engine branch remembers of its last committed and the transaction binlog offset. For re-execution consider MDEV-21465 as a template. MIXED binlog format guarantees re\-execution to repeat/reproduce the original changes. |
Fix Version/s | 10.11 [ 27614 ] |
Priority | Major [ 3 ] | Critical [ 2 ] |
Fix Version/s | 10.11 [ 27614 ] |
Fix Version/s | 10.13 [ 28501 ] | |
Fix Version/s | 10.12 [ 28320 ] |
Link | This issue relates to MDEV-21469 [ MDEV-21469 ] |
Link | This issue is blocked by MDEV-21469 [ MDEV-21469 ] |
Fix Version/s | 11.1 [ 28549 ] |
Fix Version/s | 10.13 [ 28501 ] |
Fix Version/s | 11.2 [ 28603 ] | |
Fix Version/s | 11.1 [ 28549 ] |
Assignee | Andrei Elkin [ elkin ] | Brandon Nesterenko [ JIRAUSER48702 ] |
Link | This issue is blocked by MDEV-21469 [ MDEV-21469 ] |
Link | This issue relates to MDEV-21469 [ MDEV-21469 ] |
Fix Version/s | 11.3 [ 28565 ] | |
Fix Version/s | 11.2 [ 28603 ] |
Status | Stalled [ 10000 ] | In Progress [ 3 ] |
Fix Version/s | 11.4 [ 29301 ] | |
Fix Version/s | 11.3 [ 28565 ] |
Fix Version/s | 11.5 [ 29506 ] | |
Fix Version/s | 11.4 [ 29301 ] |
Issue Type | Task [ 3 ] | New Feature [ 2 ] |
Status | In Progress [ 3 ] | Stalled [ 10000 ] |
Description |
A de-facto present recovery-related requirement of two calls of {{fsync()}} at
transaction prepare and commit by Engine per transaction can be relaxed in favor of replacing the first {{fsync()}} by a group\-fsync of Binlog. Since when Binlog is turned ON transactions group\-committed/prepared the only {{fsync()}} per group resolves optimization requests such as MDEV-11376. When a trx is deposited into an fsynced binlog file its image consisting of xid and payload suffices for its recovery. Specifically the payload part can be effectively made use of to replay the transaction should it have missed out the Engine write to disk. As long as Engine maintains its last committed in binlog order durable transaction tracking all the transactions above the last if found in binlog upon a crash could are regarded as lost and be restored by re\-applying of their payload, that is their binlogged replication events. The existing binlog checkpoint mechanism will continue to serve to limit binlog files for recovery. In the light of _MDEV-16589 sync_binlog = 1\_ performance becomes a more concern. and remained default of {{innodb_flush_log_at_trx_commit = 1}}. At the same time {{innodb_flush_log_at_trx_commit = 0}} still allows for recovery (though to be extended) *and* further benchmarking *sysbench4.pdf* of MDEV\-24386 ensures the latency and performance of {{(B = 1, I = 0)}} may be even better compare to {{(B = 0, I = 1)}} of the current (10.5) default. Here {{B}} stands for {{sync_binlog}}, {{I}} for {{innodb_flush_log_at_trx_commit}}. To the refined recovery, it needs to know engines involved in a transaction in doubt. Specifically whether all the engines maintain the last committed transaction's binlog offset in their persistent metadata. For instance Innodb does so. This piece of info is crucial as at recovery the engine may have the transaction or its branch either a) already *committed* or b) *not even prepared* and which of the two is the case can be resolved only with an "external" help such as the tracking facility: when the transaction starts in binlog at an offset greater than that that the engine remembers of its last committed then this transaction obviously is not yet committed. Unlike all other cases in case of the single Innodb engine transaction there is no need to specify the engine explicitly in the transaction's binlog events. The recovery procedure follows most of the conventional one's steps and adds up the following rule, simplified here to a single engine: {noformat} when a transaction updates an engine that track binlog offset of their commits and its binlog offset is greater than one of the last committed trx in the engine then the transaction is to be re\-executed (unless it's already prepared then it is to commit by the regular rules). {noformat} For the multiple engine and not\-Innodb cases the property of involved engines can be specified through extended {{Gtid_log_event}}. Consider a bitmap with the bits mapped to engines on that local server. The mapping is local for the server so it must be mere stable through crashes. Gtid_log_event remembers the engines involved (except there is only one Innodb) and at recovery the engines will be found and asked for the last commit binlog offset. When there's an engine that does not track this transaction can't be re\-executed, otherwise branches of the in-doubt multi-engine transaction are considered individually taking into account what the engine branch remembers of its last committed and the transaction binlog offset. For re-execution consider MDEV-21465 as a template. MIXED binlog format guarantees re\-execution to repeat/reproduce the original changes. |
A de-facto present recovery-related requirement of two calls of {{fsync()}} at
transaction prepare and commit by Engine per transaction can be relaxed in favor of replacing the first {{fsync()}} by a group\-fsync of Binlog. Since when Binlog is turned ON transactions group\-committed/prepared the only {{fsync()}} per group resolves optimization requests such as MDEV-11376. When a trx is deposited into an fsynced binlog file its image consisting of xid and payload suffices for its recovery. Specifically the payload part can be effectively made use of to replay the transaction should it have missed out the Engine write to disk. As long as Engine maintains its last committed in binlog order durable transaction tracking all the transactions above the last if found in binlog upon a crash could are regarded as lost and be restored by re\-applying of their payload, that is their binlogged replication events. The existing binlog checkpoint mechanism will continue to serve to limit binlog files for recovery. In the light of _MDEV-16589 sync_binlog = 1\_ performance becomes a more concern. and remained default of {{innodb_flush_log_at_trx_commit = 1}}. At the same time {{innodb_flush_log_at_trx_commit = 0}} still allows for recovery (though to be extended) *and* further benchmarking *sysbench4.pdf* of MDEV\-24386 ensures the latency and performance of {{(B = 1, I = 0)}} may be even better compare to {{(B = 0, I = 1)}} of the current (10.5) default. Here {{B}} stands for {{sync_binlog}}, {{I}} for {{innodb_flush_log_at_trx_commit}}. To the refined recovery, it needs to know engines involved in a transaction in doubt. Specifically whether all the engines maintain the last committed transaction's binlog offset in their persistent metadata. For instance Innodb does so. This piece of info is crucial as at recovery the engine may have the transaction or its branch either a) already *committed* or b) *not even prepared* and which of the two is the case can be resolved only with an "external" help such as the tracking facility: when the transaction starts in binlog at an offset greater than that that the engine remembers of its last committed then this transaction obviously is not yet committed. Unlike all other cases in case of the single Innodb engine transaction there is no need to specify the engine explicitly in the transaction's binlog events. The recovery procedure follows most of the conventional one's steps and adds up the following rule, simplified here to a single engine: {noformat} when a transaction updates an engine that track binlog offset of their commits and its binlog offset is greater than one of the last committed trx in the engine then the transaction is to be re\-executed (unless it's already prepared then it is to commit by the regular rules). {noformat} For the multiple engine and not\-Innodb cases the property of involved engines can be specified through extended {{Gtid_log_event}}. Consider a bitmap with the bits mapped to engines on that local server. The mapping is local for the server so it must be mere stable through crashes. Gtid_log_event remembers the engines involved (except there is only one Innodb) and at recovery the engines will be found and asked for the last commit binlog offset. When there's an engine that does not track this transaction can't be re\-executed, otherwise branches of the in-doubt multi-engine transaction are considered individually taking into account what the engine branch remembers of its last committed and the transaction binlog offset. For re-execution consider MDEV-21469 as a template. MIXED binlog format guarantees re\-execution to repeat/reproduce the original changes. |
Fix Version/s | 11.6 [ 29515 ] | |
Fix Version/s | 11.5 [ 29506 ] |
Description |
A de-facto present recovery-related requirement of two calls of {{fsync()}} at
transaction prepare and commit by Engine per transaction can be relaxed in favor of replacing the first {{fsync()}} by a group\-fsync of Binlog. Since when Binlog is turned ON transactions group\-committed/prepared the only {{fsync()}} per group resolves optimization requests such as MDEV-11376. When a trx is deposited into an fsynced binlog file its image consisting of xid and payload suffices for its recovery. Specifically the payload part can be effectively made use of to replay the transaction should it have missed out the Engine write to disk. As long as Engine maintains its last committed in binlog order durable transaction tracking all the transactions above the last if found in binlog upon a crash could are regarded as lost and be restored by re\-applying of their payload, that is their binlogged replication events. The existing binlog checkpoint mechanism will continue to serve to limit binlog files for recovery. In the light of _MDEV-16589 sync_binlog = 1\_ performance becomes a more concern. and remained default of {{innodb_flush_log_at_trx_commit = 1}}. At the same time {{innodb_flush_log_at_trx_commit = 0}} still allows for recovery (though to be extended) *and* further benchmarking *sysbench4.pdf* of MDEV\-24386 ensures the latency and performance of {{(B = 1, I = 0)}} may be even better compare to {{(B = 0, I = 1)}} of the current (10.5) default. Here {{B}} stands for {{sync_binlog}}, {{I}} for {{innodb_flush_log_at_trx_commit}}. To the refined recovery, it needs to know engines involved in a transaction in doubt. Specifically whether all the engines maintain the last committed transaction's binlog offset in their persistent metadata. For instance Innodb does so. This piece of info is crucial as at recovery the engine may have the transaction or its branch either a) already *committed* or b) *not even prepared* and which of the two is the case can be resolved only with an "external" help such as the tracking facility: when the transaction starts in binlog at an offset greater than that that the engine remembers of its last committed then this transaction obviously is not yet committed. Unlike all other cases in case of the single Innodb engine transaction there is no need to specify the engine explicitly in the transaction's binlog events. The recovery procedure follows most of the conventional one's steps and adds up the following rule, simplified here to a single engine: {noformat} when a transaction updates an engine that track binlog offset of their commits and its binlog offset is greater than one of the last committed trx in the engine then the transaction is to be re\-executed (unless it's already prepared then it is to commit by the regular rules). {noformat} For the multiple engine and not\-Innodb cases the property of involved engines can be specified through extended {{Gtid_log_event}}. Consider a bitmap with the bits mapped to engines on that local server. The mapping is local for the server so it must be mere stable through crashes. Gtid_log_event remembers the engines involved (except there is only one Innodb) and at recovery the engines will be found and asked for the last commit binlog offset. When there's an engine that does not track this transaction can't be re\-executed, otherwise branches of the in-doubt multi-engine transaction are considered individually taking into account what the engine branch remembers of its last committed and the transaction binlog offset. For re-execution consider MDEV-21469 as a template. MIXED binlog format guarantees re\-execution to repeat/reproduce the original changes. |
A de-facto present recovery-related requirement of two calls of {{fsync()}} at
transaction prepare and commit by Engine per transaction can be relaxed in favor of replacing the first {{fsync()}} by a group\-fsync of Binlog. Since when Binlog is turned ON transactions group\-committed/prepared the only {{fsync()}} per group resolves optimization requests such as MDEV-11376. When a trx is deposited into an fsynced binlog file its image consisting of xid and payload suffices for its recovery. Specifically the payload part can be effectively made use of to replay the transaction should it have missed out the Engine write to disk. As long as Engine maintains its last committed in binlog order durable transaction tracking all the transactions above the last if found in binlog upon a crash could are regarded as lost and be restored by re\-applying of their payload, that is their binlogged replication events. The existing binlog checkpoint mechanism will continue to serve to limit binlog files for recovery. In the light of _MDEV-16589 sync_binlog = 1\_ performance becomes a more concern. and remained default of {{innodb_flush_log_at_trx_commit = 1}}. At the same time {{innodb_flush_log_at_trx_commit = 0}} still allows for recovery (though to be extended) *and* further benchmarking *sysbench4.pdf* of {{(B = 1, I = 0)}} may be even better compare to {{(B = 0, I = 1)}} of the current (10.5) default. Here {{B}} stands for {{sync_binlog}}, {{I}} for {{innodb_flush_log_at_trx_commit}}. To the refined recovery, it needs to know engines involved in a transaction in doubt. Specifically whether all the engines maintain the last committed transaction's binlog offset in their persistent metadata. For instance Innodb does so. This piece of info is crucial as at recovery the engine may have the transaction or its branch either a) already *committed* or b) *not even prepared* and which of the two is the case can be resolved only with an "external" help such as the tracking facility: when the transaction starts in binlog at an offset greater than that that the engine remembers of its last committed then this transaction obviously is not yet committed. Unlike all other cases in case of the single Innodb engine transaction there is no need to specify the engine explicitly in the transaction's binlog events. The recovery procedure follows most of the conventional one's steps and adds up the following rule, simplified here to a single engine: {noformat} when a transaction updates an engine that track binlog offset of their commits and its binlog offset is greater than one of the last committed trx in the engine then the transaction is to be re\-executed (unless it's already prepared then it is to commit by the regular rules). {noformat} For the multiple engine and not\-Innodb cases the property of involved engines can be specified through extended {{Gtid_log_event}}. Consider a bitmap with the bits mapped to engines on that local server. The mapping is local for the server so it must be mere stable through crashes. Gtid_log_event remembers the engines involved (except there is only one Innodb) and at recovery the engines will be found and asked for the last commit binlog offset. When there's an engine that does not track this transaction can't be re\-executed, otherwise branches of the in-doubt multi-engine transaction are considered individually taking into account what the engine branch remembers of its last committed and the transaction binlog offset. For re-execution consider MDEV-21469 as a template. MIXED binlog format guarantees re\-execution to repeat/reproduce the original changes. |
Fix Version/s | 11.7 [ 29815 ] | |
Fix Version/s | 11.6 [ 29515 ] |
Link |
This issue relates to |
Link | This issue relates to MDEV-21469 [ MDEV-21469 ] |
Link | This issue relates to MDEV-21469 [ MDEV-21469 ] |
Link | This issue relates to MDEV-34705 [ MDEV-34705 ] |
Fix Version/s | 11.8 [ 29921 ] | |
Fix Version/s | 11.7 [ 29815 ] |
Priority | Critical [ 2 ] | Major [ 3 ] |
Fix Version/s | 11.8 [ 29921 ] |
As far as I understand, if sync_binlog=1, at transaction commit we could skip not only the fsync() call for the InnoDB redo log files, but also the call log_write_up_to(mtr.commit_lsn()). That is, we could group all writes from the log_sys buffer to the InnoDB redo log files in bigger batches.
Furthermore, my understanding is that the internal use of 2-phase commit (XA distributed transactions) can be removed in this case. That mechanism would only be needed when XA START/END/PREPARE/COMMIT/ROLLBACK statements are being issued from SQL.
The fsync() in InnoDB would still be needed for preventing harmful reordering of writes (to stick to write-ahead logging). The primary mechanisms for driving that should be redo log checkpoints and dirty page replacement in the buffer pool.