Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27862

Galera should replicate nextval()-related changes in sequences with INCREMENT <> 0, at least NOCACHE ones with engine=InnoDB

Details

    Description

      Consider the following text case (on a local 2-node Galera cluster):

      ...
      MariaDB [test]> create sequence s3 increment 1 nocache engine=InnoDB;
      Query OK, 0 rows affected (0.375 sec)
       
      MariaDB [test]> select nextval(s3), @@wsrep_node_name;
      +-------------+-------------------+
      | nextval(s3) | @@wsrep_node_name |
      +-------------+-------------------+
      |           1 | node2             |
      +-------------+-------------------+
      1 row in set (0.035 sec)
       
      MariaDB [test]> select nextval(s3), @@wsrep_node_name;
      +-------------+-------------------+
      | nextval(s3) | @@wsrep_node_name |
      +-------------+-------------------+
      |           2 | node2             |
      +-------------+-------------------+
      1 row in set (0.002 sec)
       
      MariaDB [test]> exit
      Bye
      openxs@ao756:~/dbs/maria10.5$ bin/mysql --socket=/tmp/mysql-node1.sock test
      Reading table information for completion of table and column names
      You can turn off this feature to get a quicker startup with -A
       
      Welcome to the MariaDB monitor.  Commands end with ; or \g.
      Your MariaDB connection id is 11
      Server version: 10.5.14-MariaDB-log MariaDB Server
       
      Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
       
      Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
       
      MariaDB [test]> select nextval(s3), @@wsrep_node_name;
      +-------------+-------------------+
      | nextval(s3) | @@wsrep_node_name |
      +-------------+-------------------+
      |           1 | node1             |
      +-------------+-------------------+
      1 row in set (0.002 sec)
       
      MariaDB [test]> select nextval(s3), @@wsrep_node_name;
      +-------------+-------------------+
      | nextval(s3) | @@wsrep_node_name |
      +-------------+-------------------+
      |           2 | node1             |
      +-------------+-------------------+
      1 row in set (0.002 sec)
       
      MariaDB [test]> create table t1(id int);
      Query OK, 0 rows affected (0.340 sec)
       
      MariaDB [test]> insert into t1 select nextval(s3);
      Query OK, 1 row affected (0.015 sec)
      Records: 1  Duplicates: 0  Warnings: 0
       
      MariaDB [test]> select * from t1;
      +------+
      | id   |
      +------+
      |    3 |
      +------+
      1 row in set (0.000 sec)
       
      MariaDB [test]> exit
      Bye
      openxs@ao756:~/dbs/maria10.5$ bin/mysql --socket=/tmp/mysql-node2.sock test
      Reading table information for completion of table and column names
      You can turn off this feature to get a quicker startup with -A
       
      Welcome to the MariaDB monitor.  Commands end with ; or \g.
      Your MariaDB connection id is 11
      Server version: 10.5.14-MariaDB-log MariaDB Server
       
      Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
       
      Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
       
      MariaDB [test]> select * from t1;
      +------+
      | id   |
      +------+
      |    3 |
      +------+
      1 row in set (0.001 sec)
       
      MariaDB [test]> insert into t1 select nextval(s3);
      Query OK, 1 row affected (0.037 sec)
      Records: 1  Duplicates: 0  Warnings: 0
       
      MariaDB [test]> select * from t1;
      +------+
      | id   |
      +------+
      |    3 |
      |    3 |
      +------+
      2 rows in set (0.001 sec)
       
      MariaDB [test]> show create sequence s3\G
      *************************** 1. row ***************************
             Table: s3
      Create Table: CREATE SEQUENCE `s3` start with 1 minvalue 1 maxvalue 9223372036854775806 increment by 1 nocache nocycle ENGINE=InnoDB
      1 row in set (0.000 sec)
       
      MariaDB [test]> show create table s3\G
      *************************** 1. row ***************************
             Table: s3
      Create Table: CREATE TABLE `s3` (
        `next_not_cached_value` bigint(21) NOT NULL,
        `minimum_value` bigint(21) NOT NULL,
        `maximum_value` bigint(21) NOT NULL,
        `start_value` bigint(21) NOT NULL COMMENT 'start value when sequences is created or value if RESTART is used',
        `increment` bigint(21) NOT NULL COMMENT 'increment value',
        `cache_size` bigint(21) unsigned NOT NULL,
        `cycle_option` tinyint(1) unsigned NOT NULL COMMENT '0 if no cycles are allowed, 1 if the sequence should begin a new cycle when maximum_value is passed',
        `cycle_count` bigint(21) NOT NULL COMMENT 'How many cycles have been done'
      ) ENGINE=InnoDB SEQUENCE=1
      1 row in set (0.001 sec)
      

      Normally Galera replicates changes in InnoDB table, that must happen for NOCACHE sequence with engine=InnoDB. In the test above we see that changes to back end InnoDB table seem NOT to be replicated neither for SELEXT (kind of expected) nor for INSERT (which must produce a write set to replicate).

      As a result we can not use sequences with INCREMENT<>0 in Galera the same way as on standalone node. As I fail to find where is it documented as a limitation I'd say it's a bug actually.

      Attachments

        Activity

          ralf.gebhardt@mariadb.com Because sequence handlerton is DB_TYPE_SEQUENCE and it has not defined HTON_WSREP_REPLICATION i.e. that it should be replicated by Galera. Adding this is not enough to make it happen and that reason is most likely because somewhere we should do binlogging when WSREP_EMULATE_BINLOG(thd) is set. However, I could not figure out where that binlogging is skipped.

          2022-04-13 10:07:43 12 [Note] WSREP: empty rbr buffer, query: select nextval(s3), @@wsrep_node_name
          2022-04-13 10:07:43 12 [Note] WSREP: wsrep_after_commit: 1, 1, 5, 1
          2022-04-13 10:07:43 12 [Note] WSREP: wsrep_after_statement for 12 client_state exec  client_mode local trans_state committed
          2022-04-13 10:07:43 12 [Note] WSREP: wsrep_thd_binlog_reset
          2022-04-13 10:07:43 12 [Note] WSREP: pending events in stmt cache, sql: select nextval(s3), @@wsrep_node_name
          2022-04-13 10:07:43 2 [Note] WSREP: empty rbr buffer to apply: 6
          2022-04-13 10:07:43 2 [Note] WSREP: wsrep_commit_empty(2)
          2022-04-13 10:07:43 2 [Note] WSREP: Commit not finished for applier 2
          2022-04-13 10:07:43 2 [Note] WSREP: wsrep_after_apply 2
          2022-04-13 10:07:43 2 [Note] WSREP: wsrep_thd_binlog_reset
          

          Message empty rbr buffer means that no binlog events where created. If I start servers with --log-bin=1 I can see from binlog that:

          START TRANSACTION
          /*!*/;
          # at 734
          # at 794
          #220413 10:17:44 server id 1  end_log_pos 794 CRC32 0xa6af7955 	Annotate_rows:
          #Q> select nextval(s3), @@wsrep_node_name
          #220413 10:17:44 server id 1  end_log_pos 846 CRC32 0x94a146b1 	Table_map: `test`.`s3` mapped to number 57
          # at 846
          #220413 10:17:44 server id 1  end_log_pos 896 CRC32 0x0750ac56 	Write_rows: table id 57 flags: STMT_END_F
           
          BINLOG '
          GHlWYhMBAAAANAAAAE4DAAAAADkAAAAAAAEABHRlc3QAAnMzAAgICAgICAgBCAAAsUahlA==
          GHlWYhcBAAAAMgAAAIADAAAAADkAAAAAAAEACIH8AgAAAAAAAAAAAAAAAAAAAFasUAc=
          '/*!*/;
          # Number of rows: 1
          # at 896
          #220413 10:17:44 server id 1  end_log_pos 974 CRC32 0x898aa339 	Query	thread_id=13	exec_time=0	error_code=0
          

          This means that when binlog is enabled actual binlog event is written to binlog i.e. there is one Write_rows event for select nextval(s3) but this is still not replicated by Galera.

          jplindst Jan Lindström (Inactive) added a comment - ralf.gebhardt@mariadb.com Because sequence handlerton is DB_TYPE_SEQUENCE and it has not defined HTON_WSREP_REPLICATION i.e. that it should be replicated by Galera. Adding this is not enough to make it happen and that reason is most likely because somewhere we should do binlogging when WSREP_EMULATE_BINLOG(thd) is set. However, I could not figure out where that binlogging is skipped. 2022-04-13 10:07:43 12 [Note] WSREP: empty rbr buffer, query: select nextval(s3), @@wsrep_node_name 2022-04-13 10:07:43 12 [Note] WSREP: wsrep_after_commit: 1, 1, 5, 1 2022-04-13 10:07:43 12 [Note] WSREP: wsrep_after_statement for 12 client_state exec client_mode local trans_state committed 2022-04-13 10:07:43 12 [Note] WSREP: wsrep_thd_binlog_reset 2022-04-13 10:07:43 12 [Note] WSREP: pending events in stmt cache, sql: select nextval(s3), @@wsrep_node_name 2022-04-13 10:07:43 2 [Note] WSREP: empty rbr buffer to apply: 6 2022-04-13 10:07:43 2 [Note] WSREP: wsrep_commit_empty(2) 2022-04-13 10:07:43 2 [Note] WSREP: Commit not finished for applier 2 2022-04-13 10:07:43 2 [Note] WSREP: wsrep_after_apply 2 2022-04-13 10:07:43 2 [Note] WSREP: wsrep_thd_binlog_reset Message empty rbr buffer means that no binlog events where created. If I start servers with --log-bin=1 I can see from binlog that: START TRANSACTION /*!*/; # at 734 # at 794 #220413 10:17:44 server id 1 end_log_pos 794 CRC32 0xa6af7955 Annotate_rows: #Q> select nextval(s3), @@wsrep_node_name #220413 10:17:44 server id 1 end_log_pos 846 CRC32 0x94a146b1 Table_map: `test`.`s3` mapped to number 57 # at 846 #220413 10:17:44 server id 1 end_log_pos 896 CRC32 0x0750ac56 Write_rows: table id 57 flags: STMT_END_F   BINLOG ' GHlWYhMBAAAANAAAAE4DAAAAADkAAAAAAAEABHRlc3QAAnMzAAgICAgICAgBCAAAsUahlA== GHlWYhcBAAAAMgAAAIADAAAAADkAAAAAAAEACIH8AgAAAAAAAAAAAAAAAAAAAFasUAc= '/*!*/; # Number of rows: 1 # at 896 #220413 10:17:44 server id 1 end_log_pos 974 CRC32 0x898aa339 Query thread_id=13 exec_time=0 error_code=0 This means that when binlog is enabled actual binlog event is written to binlog i.e. there is one Write_rows event for select nextval(s3) but this is still not replicated by Galera.

          If I correctly understand this necessary binlog events are in stmt cache but binlog_commit is newer called to flush them. Instead binlog reset is called that notices that there is still events in stmt cache.

          jplindst Jan Lindström (Inactive) added a comment - If I correctly understand this necessary binlog events are in stmt cache but binlog_commit is newer called to flush them. Instead binlog reset is called that notices that there is still events in stmt cache.

          This is not enough to get nextval(s3) to replicate in Galera:

          diff --git a/sql/ha_sequence.cc b/sql/ha_sequence.cc
          index 2447242593c..8948553ba4d 100644
          --- a/sql/ha_sequence.cc
          +++ b/sql/ha_sequence.cc
          @@ -452,7 +452,8 @@ static int sequence_initialize(void *p)
                                          HTON_HIDDEN |
                                          HTON_TEMPORARY_NOT_SUPPORTED |
                                          HTON_ALTER_NOT_SUPPORTED |
          -                               HTON_NO_PARTITION);
          +                               HTON_NO_PARTITION |
          +                               HTON_WSREP_REPLICATION);
             DBUG_RETURN(0);
           }
          

          Note that I have already fixed the issue that CREATE|ALTER SEQUENCE did not replicate by Galera .

          jplindst Jan Lindström (Inactive) added a comment - This is not enough to get nextval(s3) to replicate in Galera: diff --git a/sql/ha_sequence.cc b/sql/ha_sequence.cc index 2447242593c..8948553ba4d 100644 --- a/sql/ha_sequence.cc +++ b/sql/ha_sequence.cc @@ -452,7 +452,8 @@ static int sequence_initialize(void *p) HTON_HIDDEN | HTON_TEMPORARY_NOT_SUPPORTED | HTON_ALTER_NOT_SUPPORTED | - HTON_NO_PARTITION); + HTON_NO_PARTITION | + HTON_WSREP_REPLICATION); DBUG_RETURN(0); } Note that I have already fixed the issue that CREATE|ALTER SEQUENCE did not replicate by Galera .

          seppo I have now found the reason. We do write binlog events to stmt cache but not in wsrep_client_server cache. However, I do not know how to get those binlog events there.

          jplindst Jan Lindström (Inactive) added a comment - seppo I have now found the reason. We do write binlog events to stmt cache but not in wsrep_client_server cache. However, I do not know how to get those binlog events there.
          seppo Seppo Jaakola added a comment -

          jplindst take a look at wsrep_get_trans_cache(), wsrep patch uses this to get access to binlog cache where to read binlog events from. I suppose the binlog events related to sequence usage go only to statement cache, and not to transactional cache, which wsrep patch is polling.

          seppo Seppo Jaakola added a comment - jplindst take a look at wsrep_get_trans_cache(), wsrep patch uses this to get access to binlog cache where to read binlog events from. I suppose the binlog events related to sequence usage go only to statement cache, and not to transactional cache, which wsrep patch is polling.

          People

            mkaruza Mario Karuza (Inactive)
            valerii Valerii Kravchuk
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.