[MDEV-27862] Galera should replicate nextval()-related changes in sequences with INCREMENT <> 0, at least NOCACHE ones with engine=InnoDB Created: 2022-02-16  Updated: 2023-12-22  Resolved: 2022-05-30

Status: Closed
Project: MariaDB Server
Component/s: Galera, Sequences
Affects Version/s: 10.5.14
Fix Version/s: 10.4.26, 10.5.17, 10.6.9, 10.7.5, 10.8.4

Type: Bug Priority: Critical
Reporter: Valerii Kravchuk Assignee: Mario Karuza (Inactive)
Resolution: Fixed Votes: 0
Labels: None


 Description   

Consider the following text case (on a local 2-node Galera cluster):

...
MariaDB [test]> create sequence s3 increment 1 nocache engine=InnoDB;
Query OK, 0 rows affected (0.375 sec)
 
MariaDB [test]> select nextval(s3), @@wsrep_node_name;
+-------------+-------------------+
| nextval(s3) | @@wsrep_node_name |
+-------------+-------------------+
|           1 | node2             |
+-------------+-------------------+
1 row in set (0.035 sec)
 
MariaDB [test]> select nextval(s3), @@wsrep_node_name;
+-------------+-------------------+
| nextval(s3) | @@wsrep_node_name |
+-------------+-------------------+
|           2 | node2             |
+-------------+-------------------+
1 row in set (0.002 sec)
 
MariaDB [test]> exit
Bye
openxs@ao756:~/dbs/maria10.5$ bin/mysql --socket=/tmp/mysql-node1.sock test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
 
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 11
Server version: 10.5.14-MariaDB-log MariaDB Server
 
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
MariaDB [test]> select nextval(s3), @@wsrep_node_name;
+-------------+-------------------+
| nextval(s3) | @@wsrep_node_name |
+-------------+-------------------+
|           1 | node1             |
+-------------+-------------------+
1 row in set (0.002 sec)
 
MariaDB [test]> select nextval(s3), @@wsrep_node_name;
+-------------+-------------------+
| nextval(s3) | @@wsrep_node_name |
+-------------+-------------------+
|           2 | node1             |
+-------------+-------------------+
1 row in set (0.002 sec)
 
MariaDB [test]> create table t1(id int);
Query OK, 0 rows affected (0.340 sec)
 
MariaDB [test]> insert into t1 select nextval(s3);
Query OK, 1 row affected (0.015 sec)
Records: 1  Duplicates: 0  Warnings: 0
 
MariaDB [test]> select * from t1;
+------+
| id   |
+------+
|    3 |
+------+
1 row in set (0.000 sec)
 
MariaDB [test]> exit
Bye
openxs@ao756:~/dbs/maria10.5$ bin/mysql --socket=/tmp/mysql-node2.sock test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
 
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 11
Server version: 10.5.14-MariaDB-log MariaDB Server
 
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
MariaDB [test]> select * from t1;
+------+
| id   |
+------+
|    3 |
+------+
1 row in set (0.001 sec)
 
MariaDB [test]> insert into t1 select nextval(s3);
Query OK, 1 row affected (0.037 sec)
Records: 1  Duplicates: 0  Warnings: 0
 
MariaDB [test]> select * from t1;
+------+
| id   |
+------+
|    3 |
|    3 |
+------+
2 rows in set (0.001 sec)
 
MariaDB [test]> show create sequence s3\G
*************************** 1. row ***************************
       Table: s3
Create Table: CREATE SEQUENCE `s3` start with 1 minvalue 1 maxvalue 9223372036854775806 increment by 1 nocache nocycle ENGINE=InnoDB
1 row in set (0.000 sec)
 
MariaDB [test]> show create table s3\G
*************************** 1. row ***************************
       Table: s3
Create Table: CREATE TABLE `s3` (
  `next_not_cached_value` bigint(21) NOT NULL,
  `minimum_value` bigint(21) NOT NULL,
  `maximum_value` bigint(21) NOT NULL,
  `start_value` bigint(21) NOT NULL COMMENT 'start value when sequences is created or value if RESTART is used',
  `increment` bigint(21) NOT NULL COMMENT 'increment value',
  `cache_size` bigint(21) unsigned NOT NULL,
  `cycle_option` tinyint(1) unsigned NOT NULL COMMENT '0 if no cycles are allowed, 1 if the sequence should begin a new cycle when maximum_value is passed',
  `cycle_count` bigint(21) NOT NULL COMMENT 'How many cycles have been done'
) ENGINE=InnoDB SEQUENCE=1
1 row in set (0.001 sec)

Normally Galera replicates changes in InnoDB table, that must happen for NOCACHE sequence with engine=InnoDB. In the test above we see that changes to back end InnoDB table seem NOT to be replicated neither for SELEXT (kind of expected) nor for INSERT (which must produce a write set to replicate).

As a result we can not use sequences with INCREMENT<>0 in Galera the same way as on standalone node. As I fail to find where is it documented as a limitation I'd say it's a bug actually.



 Comments   
Comment by Jan Lindström (Inactive) [ 2022-04-13 ]

ralf.gebhardt@mariadb.com Because sequence handlerton is DB_TYPE_SEQUENCE and it has not defined HTON_WSREP_REPLICATION i.e. that it should be replicated by Galera. Adding this is not enough to make it happen and that reason is most likely because somewhere we should do binlogging when WSREP_EMULATE_BINLOG(thd) is set. However, I could not figure out where that binlogging is skipped.

2022-04-13 10:07:43 12 [Note] WSREP: empty rbr buffer, query: select nextval(s3), @@wsrep_node_name
2022-04-13 10:07:43 12 [Note] WSREP: wsrep_after_commit: 1, 1, 5, 1
2022-04-13 10:07:43 12 [Note] WSREP: wsrep_after_statement for 12 client_state exec  client_mode local trans_state committed
2022-04-13 10:07:43 12 [Note] WSREP: wsrep_thd_binlog_reset
2022-04-13 10:07:43 12 [Note] WSREP: pending events in stmt cache, sql: select nextval(s3), @@wsrep_node_name
2022-04-13 10:07:43 2 [Note] WSREP: empty rbr buffer to apply: 6
2022-04-13 10:07:43 2 [Note] WSREP: wsrep_commit_empty(2)
2022-04-13 10:07:43 2 [Note] WSREP: Commit not finished for applier 2
2022-04-13 10:07:43 2 [Note] WSREP: wsrep_after_apply 2
2022-04-13 10:07:43 2 [Note] WSREP: wsrep_thd_binlog_reset

Message empty rbr buffer means that no binlog events where created. If I start servers with --log-bin=1 I can see from binlog that:

START TRANSACTION
/*!*/;
# at 734
# at 794
#220413 10:17:44 server id 1  end_log_pos 794 CRC32 0xa6af7955 	Annotate_rows:
#Q> select nextval(s3), @@wsrep_node_name
#220413 10:17:44 server id 1  end_log_pos 846 CRC32 0x94a146b1 	Table_map: `test`.`s3` mapped to number 57
# at 846
#220413 10:17:44 server id 1  end_log_pos 896 CRC32 0x0750ac56 	Write_rows: table id 57 flags: STMT_END_F
 
BINLOG '
GHlWYhMBAAAANAAAAE4DAAAAADkAAAAAAAEABHRlc3QAAnMzAAgICAgICAgBCAAAsUahlA==
GHlWYhcBAAAAMgAAAIADAAAAADkAAAAAAAEACIH8AgAAAAAAAAAAAAAAAAAAAFasUAc=
'/*!*/;
# Number of rows: 1
# at 896
#220413 10:17:44 server id 1  end_log_pos 974 CRC32 0x898aa339 	Query	thread_id=13	exec_time=0	error_code=0

This means that when binlog is enabled actual binlog event is written to binlog i.e. there is one Write_rows event for select nextval(s3) but this is still not replicated by Galera.

Comment by Jan Lindström (Inactive) [ 2022-04-13 ]

If I correctly understand this necessary binlog events are in stmt cache but binlog_commit is newer called to flush them. Instead binlog reset is called that notices that there is still events in stmt cache.

Comment by Jan Lindström (Inactive) [ 2022-04-14 ]

This is not enough to get nextval(s3) to replicate in Galera:

diff --git a/sql/ha_sequence.cc b/sql/ha_sequence.cc
index 2447242593c..8948553ba4d 100644
--- a/sql/ha_sequence.cc
+++ b/sql/ha_sequence.cc
@@ -452,7 +452,8 @@ static int sequence_initialize(void *p)
                                HTON_HIDDEN |
                                HTON_TEMPORARY_NOT_SUPPORTED |
                                HTON_ALTER_NOT_SUPPORTED |
-                               HTON_NO_PARTITION);
+                               HTON_NO_PARTITION |
+                               HTON_WSREP_REPLICATION);
   DBUG_RETURN(0);
 }

Note that I have already fixed the issue that CREATE|ALTER SEQUENCE did not replicate by Galera .

Comment by Jan Lindström (Inactive) [ 2022-04-14 ]

seppo I have now found the reason. We do write binlog events to stmt cache but not in wsrep_client_server cache. However, I do not know how to get those binlog events there.

Comment by Seppo Jaakola [ 2022-04-14 ]

jplindst take a look at wsrep_get_trans_cache(), wsrep patch uses this to get access to binlog cache where to read binlog events from. I suppose the binlog events related to sequence usage go only to statement cache, and not to transactional cache, which wsrep patch is polling.

Generated at Thu Feb 08 09:56:09 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.