[CONJ-803] failover re-execution using "redo transaction" Created: 2020-06-29  Updated: 2021-05-07  Resolved: 2021-04-22

Status: Closed
Project: MariaDB Connector/J
Component/s: Failover
Affects Version/s: None
Fix Version/s: 3.0.0

Type: New Feature Priority: Major
Reporter: Diego Dupin Assignee: Diego Dupin
Resolution: Fixed Votes: 0
Labels: None


 Description   

The driver has failover implementation, but there is some limitation that can be solved using a redo transaction implementation. There are many benefits to using this approach.

Current state

On a failover occurring on a slave connection, reconnection is done to another slave if possible and the query is re-executed on that new slave or on the master connection if no slave connection is reestablished without interruption.

Problem is when failover occurs on a master connection: the only case when that is handled transparently is when a query was not in a transaction and was a SELECT command.
There is no other possibility because when this failover occurs, the driver has no way to know that interruption occurs after the server received and handles command or not. Then, connection is reestablished and error is changed from SQLNonTransientConnectionException to SQLTransientConnectionException.

Proposed implementation

Jdbc default with autocommit enable. When a failover occurs on a primary connection with auto-commit enable, the driver can still not know more and will just reconnect connection and throw an exception like current implementation. (An exception can be done for PING command)

Redo transaction approach is to save commands (COM_EXECUTE / COM_STMT_EXECUTE, COM_STMT_LONG_DATA) in transaction. When a failover occurs during a transaction and failing command is not a COMMIT/ROLLBACK command, the connector can automatically reconnect and replay transaction, making failover completely transparent.

Auto-commit and transaction state already exist in protocol using SERVER_STATUS_IN_TRANS and SERVER_STATUS_AUTOCOMMIT flag in server status flag in all server version.

The drawback of redo transaction implementation is saving transaction in buffer until completion, but this can be avoided setting a maximum buffer length (if transaction is too big, clearing saving buffer, then throwing a exception on failover).

Most of the time, queries occurs in transaction (ORM for example doesn't permit using auto-commit), so redo transaction implementation will solve most of failover cases transparently for user point of view.

Typology consideration

This solution is particulary adapted for

  • galera: driver can reconnect to another master with more transparent failover
  • maxscale : Maxscale already implement redo transaction, but that concerns connections between maxscale to servers. This will complete MariaDB solution adding that failover layer for connection failure between client and maxscale (like reconnect to another maxscale transparently for example).


 Comments   
Comment by Diego Dupin [ 2021-04-22 ]

option `transactionReplay` permit to enable this functionnality, disabled by default.

Comment by Diego Dupin [ 2021-05-07 ]

Some explaination of implementation :
This replay interrupted transactions when option `transactionReplay` is enabled (disabled by default).

Enabling this parameter, all queries in transaction are cached until commit/rollback.
If connection fails, driver will recreate a new connection and replay cache.

There is a few limitations:

  • When last command was a "commit" send to server, an exception will be thrown, driver cannot know current state.
  • application must have idempotent queries only (queries can be "replayable")
  • There is actually no cache size limit (this feature might be implemented soon), but for now, very big transactions might result in having lot of memory use for cache.

explaination of non-idempotent issues :
tables

create table myCar (id int not null primary key auto_increment, car_name varchar(256));
create table myCarDetail (
    id int not null auto_increment, 
    id_car int,
    car_detail varchar(256),
    foreign key (id_car) references myCar(id),
    PRIMARY KEY (id));

setting ids, no problem

START TRANSACTION;
INSERT INTO myCar(id,car_name) VALUE (1,'car1');
INSERT INTO myCarDetail(id, id_car, car_detail) VALUE (10, 1, 'motor');
INSERT INTO myCarDetail(id, id_car, car_detail) VALUE (11, 1, 'wheel');
COMMIT;

other possibility:

START TRANSACTION;
INSERT INTO myCar(car_name) VALUE ('car1');
SET @a = LAST_INSERT_ID();
INSERT INTO myCarDetail(id_car, car_detail) VALUE (@a, 'motor');
INSERT INTO myCarDetail(id_car, car_detail) VALUE (@a, 'wheel');
COMMIT;

But here is an example of non-replayable queries

START TRANSACTION;
INSERT INTO myCar(car_name) VALUE ('car1'); // generated id is 1
INSERT INTO myCarDetail(id_car, car_detail) VALUE (1, 'motor');
INSERT INTO myCarDetail(id_car, car_detail) VALUE (1, 'wheel');
COMMIT;

problem here is that generated id has been retrieved from first command, and used to create new query. Replaying that transaction, generated id will surely be different, and other inserts foreign key will fails.

Generated at Thu Feb 08 03:18:25 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.