[MDEV-16304] Sequences with INCREMENT=0 produce wrong values with some combinations of START WITH and auto_increment_* options Created: 2018-05-27  Updated: 2018-08-30  Resolved: 2018-08-30

Status: Closed
Project: MariaDB Server
Component/s: Sequences
Affects Version/s: 10.3.7
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Juan Assignee: Michael Widenius
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

CentOS Linux release 7.4.1708 (Core)



 Description   

Sequences produce duplicate initial values in multiple nodes after restarting if sequence has been used on node 1 while node 2 was off-line.

On a 2 node cluster with a sequence created with

MariaDB [test]> create sequence s start with 100 increment 0;

as recommended in the documentation

resulting (on node 1) in:

MariaDB [test]> show global variables like 'auto_increment_%';
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| auto_increment_increment | 2     |
| auto_increment_offset    | 1     |
+--------------------------+-------+
2 rows in set (0.003 sec)
 
MariaDB [test]> select * from s;
+-----------------------+---------------+---------------------+-------------+-----------+------------+--------------+-------------+
| next_not_cached_value | minimum_value | maximum_value       | start_value | increment | cache_size | cycle_option | cycle_count |
+-----------------------+---------------+---------------------+-------------+-----------+------------+--------------+-------------+
|                   100 |             1 | 9223372036854775806 |         100 |         0 |       1000 |            0 |           0 |
+-----------------------+---------------+---------------------+-------------+-----------+------------+--------------+-------------+
1 row in set (0.001 sec)

Now get one value from the sequence:

MariaDB [test]> select next value for s;
+------------------+
| next value for s |
+------------------+
|              101 |
+------------------+
1 row in set (0.000 sec)
 
MariaDB [test]> select * from s;
+-----------------------+---------------+---------------------+-------------+-----------+------------+--------------+-------------+
| next_not_cached_value | minimum_value | maximum_value       | start_value | increment | cache_size | cycle_option | cycle_count |
+-----------------------+---------------+---------------------+-------------+-----------+------------+--------------+-------------+
|                  2100 |             1 | 9223372036854775806 |         100 |         0 |       1000 |            0 |           0 |
+-----------------------+---------------+---------------------+-------------+-----------+------------+--------------+-------------+
1 row in set (0.000 sec)

So in the case of this 2-node cluster, the server has reserved 999 numbers starting from next_not_cached_value + auto_increment_offset with an interval of auto_increment_increment, resulting in the next available non-cached value being 2100.

If we now bring down the cluster (first node 2, then node 1), bootstrap node 1 again, we have a problem:

Node 1:

~# galera_new_cluster
~# mysql -A test
MariaDB [test]> select * from s;
+-----------------------+---------------+---------------------+-------------+-----------+------------+--------------+-------------+
| next_not_cached_value | minimum_value | maximum_value       | start_value | increment | cache_size | cycle_option | cycle_count |
+-----------------------+---------------+---------------------+-------------+-----------+------------+--------------+-------------+
|                  4100 |             1 | 9223372036854775806 |         100 |         0 |       1000 |            0 |           0 |
+-----------------------+---------------+---------------------+-------------+-----------+------------+--------------+-------------+
1 row in set (0.000 sec)
 
MariaDB [test]> select nextval(s);
+------------+
| nextval(s) |
+------------+
|       4100 |
+------------+
1 row in set (0.000 sec)

For some reason, on restart in the bootstrap node the offset is not applied to the next_not_cached_value for the first invocation of "next value".

now start
Node 2:

~# systemctl start mariadb
~# mysql -A test
MariaDB [test]> select * from s;
+-----------------------+---------------+---------------------+-------------+-----------+------------+--------------+-------------+
| next_not_cached_value | minimum_value | maximum_value       | start_value | increment | cache_size | cycle_option | cycle_count |
+-----------------------+---------------+---------------------+-------------+-----------+------------+--------------+-------------+
|                  4100 |             1 | 9223372036854775806 |         100 |         0 |       1000 |            0 |           0 |
+-----------------------+---------------+---------------------+-------------+-----------+------------+--------------+-------------+
1 row in set (0.000 sec)
 
MariaDB [test]> select next value for s;
+------------------+
| next value for s |
+------------------+
|             4100 |
+------------------+
1 row in set (0.000 sec)

QED



 Comments   
Comment by Elena Stepanova [ 2018-05-28 ]

Sorry, I don't understand the nature of the problem here. For the result which you consider to be wrong, please indicate what result you would expect instead, and why.

Comment by Juan [ 2018-05-29 ]

The confusion is understandable. A SQL sequence is not what is called a sequence in math. A SQL sequence is a series of unique numbers intended to be used as, and enhance, the concept of auto_increment.

A 'sequence' in SQL as defined in ANSI SQL:2003 should return a list of unique numbers. Reference implementations such as Oracle and IBM also expect uniqueness.

In the case described, the number 4100, which is the 'next value' returned for the sequence in node 1, is equal to the number 4100, which is the 'next value' returned for the sequence in node 2. In other words, multiple calls to 'next value' for the same sequence in multiple Galera nodes return the same, duplicate, value. ie. sequences don't work in Galera for MariaDB 10.3.7, as they produce duplicate numbers in the same cluster.

So to be clear, it is wrong for 2 calls to 'next value' for the same sequence to return the same value. In the case of a Galera installation it is wrong for two calls to 'next value' for the same sequence from 2 nodes to produce the same value.

Comment by Elena Stepanova [ 2018-05-29 ]

Thanks.
The confusion was not about the scientific meaning of sequences, but about the form of the bug report.
Anyway, the scenario is overcomplicated. The problem is unrelated to re-bootstrap (or to Galera, for that matter). If you had run nextval(s) on the 2nd node right away, before any restart, you would have seen it already – the second node would have returned 100 instead of the expected 102. The simple matter of fact is that with certain combinations of sequence parameters and auto_increment_* values, the offset does not get applied. The rest is just variations.

Here are some examples where it raises from questionable to obviously wrong.

set global auto_increment_increment= 2, auto_increment_offset= 1;
create sequence s start with 1 increment 0;
select nextval(s);
nextval(s)
1

This is questionable. Should it really be 1, or should it be 2 (start value + offset)? It might seem okay that it's 1 at the first glance, but then

set global auto_increment_increment= 2, auto_increment_offset= 1;
create sequence s start with 100 increment 0;
select nextval(s);
nextval(s)
101

This seems logical, and here the offset is applied? So, it's inconsistent with the previous example.

Further, if we have

set global auto_increment_increment= 2, auto_increment_offset= 2;
create sequence s start with 2 increment 0;
select nextval(s);
nextval(s)
2

Same case as with the first example above, again questionable, it might seem okay, but then

set global auto_increment_increment= 2, auto_increment_offset= 2;
create sequence s start with 100 increment 0;
select nextval(s);
nextval(s)
100

Now, this is obviously wrong.

In terms of your rebootstrap scenario, it boils down to this.
First, you create the sequence on node 1 of a 2-node cluster, thus the node has auto_increment_increment=2 and auto_increment_offset=1, so the "good" logic works:

set global auto_increment_increment= 2, auto_increment_offset= 1;
create or replace sequence s start with 100 increment 0;
select nextval(s);
nextval(s)
101

but then, when you rebootstrap the cluster and start only one node, which means at this point it has auto_increment_increment=1 and auto_increment_offset=1, so this logic works instead

flush tables;
set global auto_increment_increment= 1, auto_increment_offset= 1;
select nextval(s);
nextval(s)
2100

or, in other terms,

set global auto_increment_increment= 1, auto_increment_offset= 1;
create or replace sequence s start with 100 increment 0;
select nextval(s);
nextval(s)
100

– no offset.

Then you start the second node, which gets auto_increment_increment=2 and auto_increment_offset=2, so it works like this

set global auto_increment_increment= 2, auto_increment_offset= 2;
create or replace sequence s start with 2100 increment 0;
select nextval(s);
nextval(s)
2100

All in all, the logic is flawed in the calculation of the initial offset, it's just less obvious when start with value is default. I've briefly looked into the place where it's calculated, there are many rather conflicting conditions and branches, so it's not surprising.
MDEV-16313 is probably another indication of the same problem (for a debug version), and not just a bogus assertion failure.

Comment by Michael Widenius [ 2018-08-30 ]

First auto_increment_increment= X and auto_increment_offset= Y doesn't affect the initial value for a sequence, except if it's not part of the sequence that these two variables are generating.

set global auto_increment_increment= 2, auto_increment_offset= 2;
create sequence s start with 100 increment 0;
select nextval(s);
nextval(s)
100

Is perfectly fine as 100 is part of the sequence 0,2,4,...98,100,102 and so one.

The problem is probably, as Elena stated above, related to the new nodes doesn't have the variables set properly.
For things to work, all nodes should at all times have the same auto_increment_increment value and at all times have a fixed (for that node) and different to all other nodes auto_increment_offset.

Comment by Elena Stepanova [ 2018-08-30 ]

Given the above, I happily close the issue.

Generated at Thu Feb 08 08:27:54 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.