[MDEV-33277] In-place migration from MySQL 5.7 causes invalid AUTO_INCREMENT values Created: 2024-01-19  Updated: 2024-02-08

Status: Stalled
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.11, 11.0, 11.1, 11.2, 11.3, 11.4, 10.6.14
Fix Version/s: 10.5, 10.6, 10.11, 11.0, 11.1, 11.2, 11.3, 11.4

Type: Bug Priority: Major
Reporter: John Flatness Assignee: Marko Mäkelä
Resolution: Unresolved Votes: 1
Labels: None

Attachments: File data-10.1.tar.xz     File data-5.7.tar.xz    
Issue Links:
Relates
relates to MDEV-6076 Persistent AUTO_INCREMENT for InnoDB Closed
relates to MDEV-12123 Page contains nonzero PAGE_MAX_TRX_ID Closed
relates to MDEV-20892 AUTO_INCREMENT is set lower than the ... Closed
relates to MDEV-33308 CHECK TABLE is resetting TABLE_SHARE:... Confirmed

 Description   

After a migration of an RDS MySQL 5.7 instance with a large number of tables to MariaDB 10.6.14, significant numbers of tables had incorrect auto_increment values selected that were less than the maximum value of the column. This application heavily uses IODKU, so too-low auto_increment has quite significant negative effects, causing overwritten rows when creation of new ones is instead expected.

This problem occurred across many different tables with different structures and columns (though the auto_increment column in all cases was a simple int primary key). All used the InnoDB engine.

The problem does not appear immediately; I attempted to discover and manually change the auto_increment values of affected tables using information_schema.tables to look for tables where auto_increment was less than or equal to the max of the relevant column. As best I could tell, the auto_increment only became "wrong" once an insert (or possibly any update) was made. The result was that a scan over all tables for this problem would fix some tables, then later others would show the problem.

Ultimately the only solution was to run ALTER TABLE x AUTO_INCREMENT = y for every table, even where the new autoinc value "y" was the same as what information_schema and SHOW CREATE TABLE reported already. As far as I can tell the problem no longer occurs after taking this step.

I'm aware that since MDEV-6076 MariaDB stores the AUTO_INCREMENT for InnoDB persistently, and this was not the case for MySQL 5.7, the source DB for these tables. As far as I can see this is the same issue as MDEV-20892 as well as a few others.

The affected tables ranged from ones that were several years old and originally created by older MySQL versions to those created by 5.7. Prior discussion on other issues leads me to believe it's therefore unlikely that the source of the problem is junk data written into the field repurposed for the persistent autoincrement value. Unfortunately as this is an RDS instance I don't have direct access to the data files to investigate.

I only noted values that were too low (though admittedly too high would not have drawn the same kind of notice). As far as I can tell the initialization of the in-memory autoincrement goes wrong in these cases. My working theories include that initialization is setting the value to the first "gap" in the column, rather than above the max, or alternatively, the first INSERT/IODKU is "setting" the autoinc when it shouldn't.

Some discussion in the prior issues noted that since 10.2 is getting old, this issue while still not solved wouldn't really occur much anymore. I filed this in part just to note that moving from MySQL continues to create a "vector" for this problem.



 Comments   
Comment by Marko Mäkelä [ 2024-01-19 ]

MDEV-20892 was closed as incomplete, because we did not know the entire history of the data directory for which it was reproducible, and we were not provided any data file samples.

If the data file had originally been created before MySQL 5.1.48, then this could share an explanation with MDEV-27800.

However, a more likely explanation could be that the adjustment MDEV-12123 for MDEV-6076 is not present in MySQL. The commit message mentions ALTER TABLE…IMPORT TABLESPACE, but similar issues would occur when starting on a MySQL data directory.

By the way, just like with upgrades from MySQL 5.6 to 5.7, you may expect some glitches related to undo logs, because in MySQL 5.7 the format was changed in an incompatible way. In MariaDB Server 10.2, the 5.6 compatible format was retained; see MDEV-12289.

A possible fix would be to execute

ALTER TABLE t AUTO_INCREMENT=1;

on every InnoDB table that contains an AUTO_INCREMENT column. This should update the counter to the current maximum value of the column. Would that work for you?

Comment by John Flatness [ 2024-01-19 ]

In terms of my own situation, I've already done an ALTER TABLE...AUTO_INCREMENT on everything (though I manually calculated the desired value rather than using 1). As far as I can tell that's resolved the problem.

Comment by Michael Widenius [ 2024-01-20 ]

Regarding ALTER TABLE t AUTO_INCREMENT=1;
What do you think about doing the above automatically as part of mariadb-upgrade if we notice the table was created in MySQL 5.6 or 5.7 ?
I would be even better to have a server method to fix the auto increment!

We could add to ha_check_for_upgrade() that if the table has an auto_increment column and the table was created with either of the above versions, we could return HA_ADMIN_NEEDS_CHECK which would ensure that the handler::check() functions is called. InnoDB could then check that the cached auto_increment column looks reasonable and fix it if not.

Comment by John Flatness [ 2024-01-23 ]

Just in my own situation, if mariadb_upgrade had set the persistent auto_increment for all these tables it would have avoided my problems (I presume, anyway).

In the course of writing this message I was able to trigger the problem at least one way through poking at a much smaller staging server which had also been brought over from MySQL 5.7 to MariaDB 10.6.

The table in question (though the details of the table don't seem to matter much)

CREATE TABLE `example` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL,
  `description` text DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_unicode_ci;

and its contents (it hadn't been touched since the upgrade):

+----+----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | name                 | description                                                                                                                                                                                                                                                                                                                                                  |
+----+----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|  1 | Text                 | A resource consisting primarily of words for reading. Examples include books, letters, dissertations, poems, newspapers, articles, archives of mailing lists. Note that facsimiles or images of texts are still of the genre Text.                                                                                                                           |
|  3 | Moving Image         | A series of visual representations imparting an impression of motion when shown in succession. Examples include animations, movies, television programs, videos, zoetropes, or visual output from a simulation.                                                                                                                                              |
|  4 | Oral History         | A resource containing historical information obtained in interviews with persons having firsthand knowledge.                                                                                                                                                                                                                                                 |
|  5 | Sound                | A resource primarily intended to be heard. Examples include a music playback file format, an audio compact disc, and recorded speech or sounds.                                                                                                                                                                                                              |
|  6 | Still Image          | A static visual representation. Examples include paintings, drawings, graphic designs, plans and maps. Recommended best practice is to assign the type Text to images of textual materials.                                                                                                                                                                  |
|  7 | Website              | A resource comprising of a web page or web pages and all related assets ( such as images, sound and video files, etc. ).                                                                                                                                                                                                                                     |
|  8 | Event                | A non-persistent, time-based occurrence. Metadata for an event provides descriptive information that is the basis for discovery of the purpose, location, duration, and responsible agents associated with an event. Examples include an exhibition, webcast, conference, workshop, open day, performance, battle, trial, wedding, tea party, conflagration. |
|  9 | Email                | A resource containing textual messages and binary attachments sent electronically from one person to another or one person to many people.                                                                                                                                                                                                                   |
| 10 | Lesson Plan          | A resource that gives a detailed description of a course of instruction.                                                                                                                                                                                                                                                                                     |
| 11 | Hyperlink            | A link, or reference, to another resource on the Internet.                                                                                                                                                                                                                                                                                                   |
| 12 | Person               | An individual.                                                                                                                                                                                                                                                                                                                                               |
| 13 | Interactive Resource | A resource requiring interaction from the user to be understood, executed, or experienced. Examples include forms on Web pages, applets, multimedia learning objects, chat services, or virtual reality environments.                                                                                                                                        |
| 14 | Dataset              | Data encoded in a defined structure. Examples include lists, tables, and databases. A dataset may be useful for direct machine processing.                                                                                                                                                                                                                   |
| 15 | Physical Object      | An inanimate, three-dimensional object or substance. Note that digital representations of, or surrogates for, these objects should use Moving Image, Still Image, Text or one of the other types.                                                                                                                                                            |
| 16 | Service              | A system that provides one or more functions. Examples include a photocopying service, a banking service, an authentication service, interlibrary loans, a Z39.50 or Web server.                                                                                                                                                                             |
| 17 | Software             | A computer program in source or compiled form. Examples include a C source file, MS-Windows .exe executable, or Perl script.                                                                                                                                                                                                                                 |
| 18 | Document             | A resource containing textual data.  Note that facsimiles or images of texts are still of the genre text.                                                                                                                                                                                                                                                    |
| 19 | People               | An individual, biographical data, birth and death, etc.                                                                                                                                                                                                                                                                                                      |
| 20 | Place                | Important spaces on the mall (See the "Places" writeboard in basecamp.)                                                                                                                                                                                                                                                                                      |
| 21 | Object               | Representation of an object, e.g. plate, instruments, tools.                                                                                                                                                                                                                                                                                                 |
+----+----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

On first inspection of the running server, SHOW CREATE TABLE shows this table with an AUTO_INCREMENT of 22, as expected. On info from another related issue, I ran

od -Ax -t x1 -N 8 -j 0xc038

on the table's ibd file to check/confirm the persistent autoinc value: as expected for an untouched table from MySQL 5.7, that returned all zeros.

On a hunch as to the mechanism of the problem, I ran several close variants of this query (a simplification of what the application often runs when editing existing rows):

insert into `example` (`id`, `name`) values (last_insert_id(3), 'Moving Image test') on duplicate key update `name` = 'Moving Image test';

(I assume the last_insert_id() call here is irrelevant but it is in the queries the application uses so I included it in my tests)

The displayed AUTO_INCREMENT value from SHOW CREATE TABLE was still correct throughout this process after running several IODKUs. I'd noticed from poking around at another pre-5.7 table that the ibd file didn't immediately show a change to the persistent value even after an autoincrementing insertion; figuring this was because the write was not yet on disk, I ran FLUSH TABLES and ultimately restarted the MariaDB server.

On restart it now shows the problem: SHOW CREATE TABLE now shows AUTO_INCREMENT=4, and an IODKU at that point passing a NULL id indeed causes an update of the row with id 4 rather than creating a new one. In other words, the ID passed to those earlier IODKUs had been used as the "last" value for determining the AUTO_INCREMENT somewhere. An important detail: when checked again after the restart, that value of 3 was written out to the table on disk as the persistent autoinc (as shown by the "od" command).

This test isn't an exact match for what happened on the live server. The live server hadn't been restarted at all since the upgrade but still showed this issue, but it has much more traffic and many more tables than the table open cache... I assume my restart of this smaller server maybe simulated a flushing of the table that could have happened on the live server just due to things being ejected from working sets or caches and written to disk.

I'll have to check some more to see if just this is sufficient but the heart of the issue seems to be that running the IODKU that was actually an UPDATE set the persistent autoinc (but not the in-memory one) to the value used in the query. This matches with the behavior I saw where the "bad" autoinc values were always within the bounds of used IDs on the tables (in other words, never "too high," seemingly not random).

Comment by John Flatness [ 2024-01-23 ]

OK, I tried this again in a simplified form, this time just this query one time (on another untouched copy of a table with the same structure):

insert into `example2` (`id`, `name`) values (3, 'Moving Image test') on duplicate key update `name` = 'Moving Image test';

and restarted the server. It shows the exact same symptoms: "od" shows 0x3 written into the ibd after the restart, and SHOW CREATE TABLE reports the AUTO_INCREMENT at 4, well under the max for the column and itself a conflicting value with an existing row.

So no last_insert_id(), no multiple queries, no flush tables, etc, just the one IODKU and a restart.

Still not really certain this is the exact problem as what I saw in production (not sure how, or if it's even possible, to have the in-memory autoinc counter reinitialized without a restart) but it's a way to get at least the same kind of result.

Comment by Marko Mäkelä [ 2024-01-23 ]

Thank you for confirming that the workaround works.

I think that this should also affect upgrades from MariaDB Server 10.1 or earlier. I think that one customer did hit this problem when upgrading from 10.1 to 10.3, but sadly I did not remember the change MDEV-12123 back then.

The change MDEV-12123 was not applied to MariaDB Server 10.1. I was able to compile the last commit of the MariaDB Server 10.1 branch as follows:

mkdir build
cd build
cmake -DCMAKE_C_COMPILER=gcc-10 -DCMAKE_CXX_COMPILER=g++-10 -DDISABLE_LIBMYSQLCLIENT_SYMBOL_VERSIONING=TRUE \
-DPLUGIN_{PERFSCHEMA,PARTITION,ARCHIVE,TOKUDB,MROONGA,OQGRAPH,ROCKSDB,CONNECT,SPIDER}=NO \
-DCMAKE_BUILD_TYPE=RelWithDebInfo .. -G Ninja
cmake --build .

When using a newer version of GCC (I currently normally use 13.2.0), something in the C++ library would cause a conflict with byte somewhere.

I was so far unable to produce a nonzero PAGE_MAX_TRX_ID a.k.a. PAGE_ROOT_AUTO_INC on the clustered index root page. I think that I’d better compile mysql-5.7.44 (the last release of the series) to reproduce this.

Comment by Marko Mäkelä [ 2024-01-23 ]

I did not get a nonzero field from MySQL 5.7.44 either, but I will keep trying. Maybe there is a difference with regard to trailing space. On my system, with both MySQL 5.7.44 and MariaDB Server "10.1.49" (which was never released), the table data would end at byte offset 0xcedd, well before the start of the page directory (0xffee). I think that I must grow the records a little, so that a page split will occur.

Comment by John Flatness [ 2024-01-23 ]

nonzero PAGE_MAX_TRX_ID... on the clustered index root page

Is this what I would have been checking with

od -Ax -t x1 -N 8 -j 0xc038

?

If so, please note that the affected tables here have that value at zero.

Comment by Marko Mäkelä [ 2024-01-23 ]

Here is a simplified version of the SQL that produces a root page with the equivalent contents:

--source include/have_innodb.inc
CREATE TABLE example (
  id int unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
  name varchar(255) NOT NULL,
  description text DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_unicode_ci;
 
INSERT INTO example VALUES
(1,'Text',repeat('x',226)),
(3,'Moving Image',repeat('x',207)),
(4,'Oral History',repeat('x',108)),
(5,'Sound',repeat('x',143)),
(6,'Still Image',repeat('x',187)),
(7,'Website',repeat('x',120)),
(8,'Event',repeat('x',348)),
(9,'Email',repeat('x',138)),
(10,'Lesson Plan',repeat('x',72)),
(11,'Hyperlink',repeat('x',58)),
(12,'Person',repeat('x',14)),
(13,'Interactive Resource',repeat('x',213)),
(14,'Dataset',repeat('x',138)),
(15,'Physical Object',repeat('x',193)),
(16,'Service',repeat('x',176)),
(17,'Software',repeat('x',124)),
(18,'Document',repeat('x',105)),
(19,'People',repeat('x',55)),
(20,'Place',repeat('x',71)),
(21,'Object',repeat('x',60));
 
UPDATE example SET name='Moving Image test' WHERE id=3;
 
--source include/shutdown_mysqld.inc

Because the duplicate key error would be flagged on the PRIMARY KEY, we might as well issue an UPDATE statement straight ahead. I then found a threshold for a page split by modifying the test as follows:

--source include/have_innodb.inc
CREATE TABLE example (
  id int unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
  name varchar(255) NOT NULL,
  description text DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_unicode_ci;
 
INSERT INTO example VALUES
(1,'Text',repeat('x',452)),
(3,'Moving Image',repeat('x',494)),
(4,'Oral History',repeat('x',496)),
(5,'Sound',repeat('x',4486)),
(6,'Still Image',repeat('x',4774)),
(7,'Website',repeat('x',440)),
(8,'Event',repeat('x',96)),
(9,'Email',repeat('x',276)),
(10,'Lesson Plan',repeat('x',544)),
(11,'Hyperlink',repeat('x',116)),
(12,'Person',repeat('x',28)),
(13,'Interactive Resource',repeat('x',426)),
(14,'Dataset',repeat('x',276)),
(15,'Physical Object',repeat('x',386)),
(16,'Service',repeat('x',352)),
(17,'Software',repeat('x',248)),
(18,'Document',repeat('x',210)),
(19,'People',repeat('x',110)),
(20,'Place',repeat('x',142)),
(21,'Object',repeat('x',120));
 
UPDATE example SET name='Moving Image test',description=repeat('x',1575)
WHERE id=3;
 
--source include/shutdown_mysqld.inc

If I change the 1575 to 1576 in the UPDATE statement, the page will be split (the 16-bit field PAGE_LEVEL at 0xc040 will be 0x0001 instead of 0x0000), but the PAGE_MAX_TRX_ID will remain at 0. With the 1575, there will be 4 wasted bytes between the end of the last record and the sparse page directory (0xffea to 0xffed).

I will have to check MDEV-12123 and other changes more carefully to understand how we could get a nonzero PAGE_MAX_TRX_ID to the clustered index root page.

Comment by Marko Mäkelä [ 2024-01-23 ]

zerocrates, yes, the PAGE_MAX_TRX_ID field of interest is the 8 bytes at the byte offset 0xc038 (3*16384 + 38 + 18). I guess that the problem is not that those bytes would be nonzero garbage, but something wrong in the upgrade logic. I’ll move to testing that.

Comment by Marko Mäkelä [ 2024-01-23 ]

I tried the following test on 10.6. It is showing the expected LAST_INSERT_ID() of 43.

--source include/have_innodb.inc
--source include/innodb_checksum_algorithm.inc
 
CREATE TABLE t1(id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(255) NOT NULL)
ENGINE=InnoDB;
INSERT INTO t1 VALUES(42,'The answer');
 
let MYSQLD_DATADIR=`select @@datadir`;
let PAGE_SIZE=`select @@innodb_page_size`;
 
--source include/shutdown_mysqld.inc
 
perl;
do "$ENV{MTR_SUITE_DIR}/include/crc32.pl";
my $file = "$ENV{MYSQLD_DATADIR}/test/t1.ibd";
open(FILE, "+<$file") || die "Unable to open $file";
binmode FILE;
my $ps= $ENV{PAGE_SIZE};
my $page;
die "Unable to read $file" unless sysread(FILE, $page, $ps) == $ps;
my $full_crc32 = unpack("N",substr($page,54,4)) & 0x10; # FIL_SPACE_FLAGS
sysseek(FILE, 3*$ps, 0) || die "Unable to seek $file\n";
die "Unable to read $file" unless sysread(FILE, $page, $ps) == $ps;
substr($page,56,8)=pack("x[8]");
my $polynomial = 0x82f63b78; # CRC-32C
if ($full_crc32)
{
    my $ck = mycrc32(substr($page, 0, $ps-4), 0, $polynomial);
    substr($page, $ps-4, 4) = pack("N", $ck);
}
else
{
    my $ck= pack("N",mycrc32(substr($page, 4, 22), 0, $polynomial) ^
		 mycrc32(substr($page, 38, $ps - 38 - 8), 0, $polynomial));
    substr($page,0,4)=$ck;
    substr($page,$ps-8,4)=$ck;
}
sysseek(FILE, 3*$ps, 0) || die "Unable to rewind $file\n";
syswrite(FILE, $page, $ps)==$ps || die "Unable to write $file\n";
close(FILE) || die "Unable to close $file";
EOF
 
--source include/start_mysqld.inc
INSERT INTO t1 VALUES(NULL,'the question');
SELECT LAST_INSERT_ID();
DROP TABLE t1;

I also tried changing the value written to the PAGE_MAX_TRX_ID to something else, like this:

substr($page,56,8)=pack("x[4]N",101);

The LAST_INSERT_ID() would be reported as 102, just like expected.

So, it looks like there really must have been some garbage PAGE_MAX_TRX_ID in the file, but we do not know yet how.

Comment by Marko Mäkelä [ 2024-01-23 ]

I started a fishing expedition by running MySQL 5.7.44 compiled with the following patch, to catch any clustered index root page writes where the PAGE_MAX_TRX_ID would be nonzero:

diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc
index 726e5b4f54d..61a65c24faf 100644
--- a/storage/innobase/buf/buf0flu.cc
+++ b/storage/innobase/buf/buf0flu.cc
@@ -857,6 +857,10 @@ buf_flush_init_for_writing(
 {
 	ib_uint32_t	checksum = BUF_NO_CHECKSUM_MAGIC;
 
+	if (page_get_page_no(page) == 3 && !page_get_space_id(page)) {
+		ut_a(!memcmp(page + 0x38, field_ref_zero, 8));
+	}
+
 	ut_ad(block == NULL || block->frame == page);
 	ut_ad(block == NULL || page_zip_ == NULL
 	      || &block->page.zip == page_zip_);

This could create false alarms when using nonzero innodb_undo_tablespaces. But, no tests in the default suites failed due to this.

The commit message of MDEV-12123 mentions ALTER TABLE…IMPORT TABLESPACE. The change does add some zeroing of PAGE_MAX_TRX_ID to the function btr_root_raise_and_insert(), which would splits the root page during an INSERT. I think that this change would only limit the damage if something else had already caused a nonzero PAGE_MAX_TRX_ID to be written to a clustered index leaf page and then propagate towards the root page. The main change is to PageConverter::update_index_page(), which before the MDEV-12123 would have written a nonzero PAGE_MAX_TRX_ID also to clustered index pages. Let us try:

--source include/have_innodb.inc
CREATE TABLE t1 (id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY) ENGINE=InnoDB;
INSERT INTO t1 VALUES(42);
 
--let $datadir=`select @@datadir`
FLUSH TABLES t1 FOR EXPORT;
--copy_file $datadir/test/t1.ibd $datadir/test/t2.ibd
--copy_file $datadir/test/t1.cfg $datadir/test/t2.cfg
UNLOCK TABLES;
ALTER TABLE t1 DISCARD TABLESPACE;
--move_file $datadir/test/t2.ibd $datadir/test/t1.ibd
--move_file $datadir/test/t2.cfg $datadir/test/t1.cfg
ALTER TABLE t1 IMPORT TABLESPACE;

This test reproduces the issue:

od -Ax -t x1 -j 0xc038 -N 8 var/mysqld.1/data/test/t1.ibd 

mysql-5.7.55

00c038 00 00 00 00 00 00 05 0d
00c040

mariadb-10.1.48

00c038 00 00 00 00 00 00 05 08
00c040

Before MDEV-15158 and possibly other related changes in MariaDB Server 10.3, the InnoDB transaction ID would start at a multiple of 256 or 512 on server startup. Here we had 0x50d or 0x508 written to the PAGE_MAX_TRX_ID field. After an upgrade from data-5.7.tar.xz or data-10.1.tar.xz to 10.6 and

insert into t1 values(null);
select * from t1;

I would see a new record with id=1294 or id=1289, instead of the expected id=43.

Comment by Marko Mäkelä [ 2024-01-23 ]

Because MariaDB Server 10.4 is soon reaching its end of life (EOL), I do not think that it makes sense to fix this bug there.

Comment by Marko Mäkelä [ 2024-01-23 ]

monty, I think that I need your help with regard to the changes outside InnoDB. My suggested fix would be as follows:

When opening a table (maybe only as part of running an upgrade script), if the .frm file indicates that there is an AUTO_INCREMENT column in an InnoDB table and the table definition had been created or last modified before MariaDB Server 10.2.4 or by MySQL, we’d reset the PAGE_ROOT_AUTO_INC in the clustered index root page, and possibly mark the .frm file in such a way that the check will only be executed once.

Comment by John Flatness [ 2024-01-23 ]

So, in this comment in particular I described my reproducer: https://jira.mariadb.org/browse/MDEV-33277?focusedCommentId=278594&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-278594

It starts with a table with a zero PAGE_ROOT_AUTO_INC (coming from MySQL) and just doing one "updating" IODKU in 10.6 and restarting the server causes the field to be written as the duplicate value of the key. I assume this is a pretty much totally different route that leads to similar results? As yours is about creating a table that will come into the upgrade with a spurious value in that field, but mine starts with zero there and seems to be more a logic issue in the code that intentionally sets the persistent autoinc, if I had to hazard a guess.

Regardless, I'd imagine your proposed solution would address both, plus maybe other undiscovered routes to the same result, by setting a proper PAGE_ROOT_AUTO_INC upfront. So the different methods of reproduction might not matter much, I suppose.

Comment by Marko Mäkelä [ 2024-01-23 ]

zerocrates, I think that the value that is written is the last assigned value.

Comment by Michael Widenius [ 2024-01-24 ]

zerocrates I am not sure what you mean with your last comment "restarting the server causes the field to be written as the duplicate value of the key".
a) Do you mean that you get a duplicate key error on your next insert that should use the next available auto-increment (bug)
b) After doing an insert, the PAGE_ROOT_AUTO_INC contains the value of the last inserted primary key, regardless if auto-increment is generated or if the explicit inserted primary key is larger than any other primary key (correct behaviour)

Comment by John Flatness [ 2024-01-24 ]

The sequence goes like this:

First the table has no PAGE_ROOT_AUTO_INC value. Server reports AUTO_INCREMENT as MAX(pk) + 1.

I do a INSERT... ON DUPLICATE KEY UPDATE that doesn't insert (I choose a value for the PK that is in use, so it performs an update). The server reports AUTO_INCREMENT still at the same value as before.

Restart the server. PAGE_ROOT_AUTO_INC for the table's .ibd file now contains the value used in the IODKU (the pk of the updated row). The server now reports AUTO_INCREMENT for the table as that PAGE_ROOT_AUTO_INC value plus 1. The AUTO_INCREMENT is now "too low."

A normal INSERT not specifying a value for the PK at this point produces "ERROR 1062 (23000): Duplicate entry 'x' for key 'PRIMARY'". Doing an IODKU instead with NULL as the PK value does an UPDATE on an existing row rather than doing an insert as expected.

Comment by Marko Mäkelä [ 2024-01-24 ]

The field PAGE_ROOT_AUTO_INC is supposed to contain the last assigned AUTO_INCREMENT value of the table, provided it is larger than the previous value of the field. It is supposed to be set also if an INSERT or UPDATE statement specifies a value for the AUTO_INCREMENT column.

Comment by Marko Mäkelä [ 2024-01-24 ]

Based on discussion with monty, I created https://github.com/MariaDB/server/pull/3022, currently in draft state. The first commit would automatically correct obviously wrong AUTO_INCREMENT values. Correcting values that are larger than the current maximum value of the AUTO_INCREMENT column in the table would require CHECK TABLE…FOR UPGRADE to be implemented for InnoDB. That is what the second commit aims and currently fails to do.

Comment by Marko Mäkelä [ 2024-01-24 ]

I don’t think that we need to implement CHECK TABLE … FOR UPGRADE in InnoDB to fix this. A few examples, for cases where the table had been created before MariaDB 10.2.10:

  1. The AUTO_INCREMENT column is a shorter type, say, INT UNSIGNED, and we find a garbage PAGE_ROOT_AUTO_INC value that does not fit (is at least 2³²): We will reset the field.
  2. The PAGE_ROOT_AUTO_INC value is less than the maximum value of the column: We will reset it to the maximum.

Any other cases can be fixed manually by executing ALTER TABLE…AUTO_INCREMENT=1.

If the column is defined as BIGINT UNSIGNED AUTO_INCREMENT, then the worst damage that can happen is that we the counter will jump close to 2⁴⁸ (assuming that the nonzero values only come from ALTER TABLE…IMPORT TABLESPACE and are transaction identifiers. In this case, we will still have 2⁶⁴-2⁴⁸=1.8664e19 available values for the AUTO_INCREMENT column. Note: the full range 2⁶⁴ is not much bigger than that: 1.8667e19.

Comment by Marko Mäkelä [ 2024-01-25 ]

I filed MDEV-33308 for a CHECK TABLE anomaly that I found while writing a regression test for this.

Comment by Michael Widenius [ 2024-01-26 ]

I have updated 10.5-MDEV-33277 with changes to do proper reporting with CHECK TABLE, CHECK TABLE ... FOR UPGRADE and mariadb-upgrade

Comment by Debarun Banerjee [ 2024-02-07 ]

https://github.com/MariaDB/server/pull/3022 looks fine to me.

Generated at Thu Feb 08 10:37:43 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.