[MDEV-30531] Corrupt index(es) on busy table when using FOREIGN KEY with CASCADE or SET NULL Created: 2023-02-01  Updated: 2024-01-18  Resolved: 2023-09-11

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.6.11, 10.10.2, 11.3, 11.1.2
Fix Version/s: 10.6.16, 10.10.7, 10.11.6, 11.0.4, 11.1.3, 11.2.2

Type: Bug Priority: Critical
Reporter: Bento Assignee: Marko Mäkelä
Resolution: Fixed Votes: 5
Labels: corruption, foreign-keys, regression, sporadic
Environment:

Ubuntu 22.04.1
Kernel 5.15.0-27
MariaDB 10.10.2-MariaDB-1:10.10.2+maria~ubu2204

Filesystem:
ZFS (zfs-2.1.4-0ubuntu0.1) on 4 NVME drives
MySQL in it's own ZFS dataset using the O_DIRECT flush method.


Attachments: Text File ZFS Dataset Properties.txt     File mariadb-corruption-for-mariadb-ticket.tar.gz     Text File my.cnf.txt     PNG File screenshot-1.png    
Issue Links:
Blocks
is blocked by MDEV-31767 InnoDB tables are being flagged as co... Closed
Problem/Incident
is caused by MDEV-29869 mtr failure: innodb.deadlock_wait_thr... Closed
Relates
relates to MDEV-33205 [ERROR] InnoDB: We detected index cor... Needs Feedback

 Description   

For several months now I have been having issues with index corruption on one table. Not entirely sure when it started as it wasn't directly obvious after upgrading MariaDB for example. It worked fine for months on the same machine, OS and filesystem. I am running 10.10 now but it also happened on 10.6. I know one other person running the same type of software who also has this issue, they are not using ZFS but EXT4 so filesystem is probably not the issue.

This table is constantly being written to and read from. A program analyzes/processes the data every few mins, and then deletes the rows it just processed when it's done.

The only way to "fix" it is to truncate the table, but it takes anywhere from 1h to 3 days for the issue to come up again showing:

SQLSTATE[HY000]: General error: 1712 Index TABLENAME is corrupted

journald then shows the following output (different indexes) (Database & table names replaced for privacy)

Feb 09 20:30:54 SERVER_HOSTNAME mariadbd[3829372]: 2023-02-09 20:30:54 0 [ERROR] InnoDB: Flagged corruption of `INDEX_NAME` in table `DBNAME`.`TBLNAME` in purge

It seems to start with tried to purge non-delete-marked record in index:

Jan 28 09:38:41 SERVER_HOSTNAME mariadbd[962260]: 2023-01-28  9:38:41 0 [ERROR] InnoDB: tried to purge non-delete-marked record in index `ix_TBLNAME_collection` of table `DBNAME`.`TBLNAME`: tuple: TUPLE (info_bits=0, 2 fields): {[4]    (0x0018CA9D),[8]     7Sc(0x0000000000375363)}, record: COMPACT RECORD(info_bits=0, 2 fields): {[4]    (0x0018CA9D),[8]     7Sc(0x0000000000375363)}
Jan 28 09:38:41 SERVER_HOSTNAME mariadbd[962260]: 2023-01-28  9:38:41 0 [ERROR] InnoDB: tried to purge non-delete-marked record in index `ix_TBLNAME_partcheck` of table `DBNAME`.`TBLNAME`: tuple: TUPLE (info_bits=0, 2 fields): {[1] (0x80),[8]     7Sc(0x0000000000375363)}, record: COMPACT RECORD(info_bits=0, 2 fields): {[1] (0x80),[8]     7Sc(0x0000000000375363)}
Jan 29 20:52:19 SERVER_HOSTNAME mariadbd[962260]: 2023-01-29 20:52:19 0 [ERROR] InnoDB: tried to purge non-delete-marked record in index `ux_collection_id_filenumber` of table `DBNAME`.`TBLNAME`: tuple: TUPLE (info_bits=0, 3 fields): {[4] #  (0x002387DE),[4]    (0x00000001),[8]     Ks (0x00000000004B731F)}, record: COMPACT RECORD(info_bits=0, 3 fields): {[4] #  (0x002387DE),[4]    (0x00000001),[8]     Ks (0x00000000004B731F)}
Jan 30 18:46:44 SERVER_HOSTNAME mariadbd[962260]: 2023-01-30 18:46:44 0 [ERROR] InnoDB: tried to purge non-delete-marked record in index `ix_TBLNAME_collection` of table `DBNAME`.`TBLNAME`: tuple: TUPLE (info_bits=0, 2 fields): {[4]    (0x0006B600),[8]      @ (0x00000000000F400A)}, record: COMPACT RECORD(info_bits=0, 2 fields): {[4]    (0x0006B600),[8]      @ (0x00000000000F400A)}
Jan 30 18:46:44 SERVER_HOSTNAME mariadbd[962260]: 2023-01-30 18:46:44 0 [ERROR] InnoDB: tried to purge non-delete-marked record in index `ix_TBLNAME_partcheck` of table `DBNAME`.`TBLNAME`: tuple: TUPLE (info_bits=0, 2 fields): {[1] (0x80),[8]      @ (0x00000000000F400A)}, record: COMPACT RECORD(info_bits=0, 2 fields): {[1] (0x80),[8]      @ (0x00000000000F400A)

Also shows:

Jan 30 19:50:18 SERVER_HOSTNAME mariadbd[962260]: 2023-01-30 19:50:18 328202837 [ERROR] Got error 180 when reading table './dbname/tblname'
Jan 30 19:50:33 SERVER_HOSTNAME mariadbd[962260]:  6: len 30; hex 616c742e62696e6172692023-01-30 19:50:33 328203835 [ERROR] Got error 180 when reading table './dbname/tblname'

And it also shows thousands of these but I have no idea if that's related:

Jan 29 20:53:49 SERVER_HOSTNAME mariadbd[962260]: RECORD LOCKS space id 22307 page no 7747 n bits 112 index PRIMARY of table `DBNAME`.`TBLNAME` trx id 2609064680 lock mode S locks rec but not gap
Jan 29 20:53:50 SERVER_HOSTNAME mariadbd[962260]: RECORD LOCKS space id 22307 page no 7760 n bits 88 index PRIMARY of table `DBNAME`.`TBLNAME` trx id 2609071557 lock mode S locks rec but not gap waiting
Jan 29 20:53:50 SERVER_HOSTNAME mariadbd[962260]: RECORD LOCKS space id 22307 page no 7760 n bits 88 index PRIMARY of table `DBNAME`.`TBLNAME` trx id 2609071682 lock_mode X locks rec but not gap
Jan 29 20:53:50 SERVER_HOSTNAME mariadbd[962260]: RECORD LOCKS space id 22307 page no 7758 n bits 96 index PRIMARY of table `DBNAME`.`TBLNAME` trx id 2609071682 lock_mode X locks rec but not gap waiting
Jan 29 20:53:50 SERVER_HOSTNAME mariadbd[962260]: RECORD LOCKS space id 22307 page no 7758 n bits 96 index PRIMARY of table `DBNAME`.`TBLNAME` trx id 2609071557 lock mode S locks rec but not gap



 Comments   
Comment by Marko Mäkelä [ 2023-02-01 ]

If the InnoDB change buffer has ever been enabled on that server (it was disabled by default in MDEV-27734 and my.cnf.txt does not mention innodb_change_buffering), that might explain some corruption.

However, if tables become corrupted after TRUNCATE TABLE or OPTIMIZE TABLE, I do not think that we can blame the change buffer. A more likely explanation might be that starting with MDEV-24854, MariaDB enables unbuffered I/O on InnoDB data files. I am aware of corruption problems on btrfs in the past, when O_DIRECT was enabled, but MDEV-27900 or a newer Linux kernel might have fixed that.

I found another open bug MDEV-29276 that mentions ZFS. Does the corruption disappear if you set innodb_flush_method=fsync?

Comment by Marko Mäkelä [ 2023-02-01 ]

I just noticed innodb_doublewrite=0 in my.cnf.txt. Sorry, but that is really asking for trouble, and I suspect that it cannot work reliably. My long-term experience from Linux is that a signal sent to a process may interrupt any buffered or synchronous write (I also noticed innodb_use_native_aio=0) at a multiple of 4096 bytes. Yes, ZFS may be a copy-on-write file system that supports file system snapshots, but an individual 16-kilobyte write might be observed as one 4-kilobyte write and one 12-kilobyte write. If the file was created with innodb_checksum_algorithm=full_crc32, then such corruption should fairly reliably be caught by page checksums, though.

I have been hoping that some of my colleagues would update our knowledge base with information under which circumstances the doublewrite buffer can be safely disabled. One such setting is when using unbuffered I/O (innodb_flush_method=O_DIRECT) on ext4fs created with a bigalloc size that is innodb_page_size, provided that the Linux kernel is recent enough and the underlying storage device actually supports atomic writes of innodb_page_size. (I do not know how to determine that.)

Comment by Bento [ 2023-02-01 ]

innodb_use_native_aio and innodb_doublewrite seem to be very common on MySQL/MariaDB on ZFS setups: https://www.percona.com/blog/mysql-zfs-performance-update/ and I've been running like this for about a year without problems.

I do however see that they use innodb_flush_method=fsync instead of O_DIRECT. That doesn't explain why the other guy is having the same problem while running on plain old EXT4 though.

I can try setting innodb_flush_method=fsync.

Comment by Bento [ 2023-02-06 ]

So I changed innodb_flush_method to fsync, restarted MariaDB (this is a production machine) and it still failed after about 36 hours.

Comment by Bento [ 2023-02-13 ]

Anyone?

Comment by Marko Mäkelä [ 2023-02-20 ]

I do not see anything in that blog post that would convince me that writes up to innodb_page_size on ZFS are atomic. I have corrected similar claims of the same author in the past: https://www.percona.com/blog/2015/06/17/update-on-the-innodb-double-write-buffer-and-ext4-transactions/

Comment by Andrew Hutchings [ 2023-02-20 ]

Hi Fossil,

Yes, Percona are recommending a certain setup for MySQL performance, this is not a recommendation for reliability which appears to be important for your use-case. The first half of Marko's comment is quite important here. It looks like you may have created the table in question without full_crc32 enabled. You can check this by doing:

select name, (flag & 0x10) = 0x10 as full_crc32 from information_schema.innodb_sys_tablespaces;

We highly recommend you enable innodb_doublewrite.

Comment by Bento [ 2023-02-20 ]

Ok I will enable that. I think when I talked to the other person with this issue he was on EXT4 and not using either of those settings though.

Will report back soon.

I ran that query and all tables have full_crc32=1.

Comment by Bento [ 2023-02-23 ]

Okay so I waited until the table corrupted again, truncated the table and stopped MariaDB, edited my.cnf with innodb_doublewrite = 1 and started it again.

Within 24h the corruption happened again so that's not the problem.

Also the other person who is running with all the stock values and is using EXT4 has the exact same issue.

Anyone who can dig a bit deeper?

(Tagging a person with @ doesn't seem to do anything btw)

Comment by Marko Mäkelä [ 2023-02-24 ]

Fossil, thank you for reproducing the corruption with the doublewrite buffer enabled. If the mariadbd process was never killed in the middle of a page write, the doublewrite buffer is not needed. One would need to be rather unlucky to run into corruption because of torn page writes, but I wanted to be ensure that the corruption is not caused by something as trivial as unsafe use of innodb_doublewrite=0. Because reproducing corruption related to this may be hard, there could be a common misconception that it is safe to disable the doublewrite buffer when it is actually unsafe.

Unfortunately, I have a feeling that this could be a bug in the file system or in the underlying storage. Are there any error messages reported by the kernel? Anything suspicious reported by smartctl (if a S.M.A.R.T. interface is available)?

Do you know which Linux kernel version the other person on ext4fs is using? Was the server built with io_uring or with libaio support?

File systems are not free from bugs. One devastating example https://lwn.net/Articles/864363/ was mentioned in the legendary blog post about page_compressed. Do you or the other person use page_compressed?

I recently encountered some discussion on Linux file system testing. As far as I understand, because ZFS is not in the Linux kernel core, it might not receive that much https://github.com/kdave/xfstests testing. https://bugzilla.kernel.org/show_bug.cgi?id=212337 is an example that might be a problem with a part of the test framework, or related to a genuine problem.

Comment by Darko Krišan [ 2023-02-24 ]

I am that other user that has same issue as @Bento. Kernel version is 5.4.0-139-generic and filesystem is EXT4 on Ubuntu 20.04.
I have installed latest MariaDB 10.11.2 and the issue is even worse, as indexes in that particular table get corrupted fast. Interesting bit is that this table is the only one that has this issue, all the other tables are issue free, but this one is the busiest of them all, with large amount of read/write to it.

Do you or the other person use page_compressed?

No, no compression is used at all. And all tables are using full_crc32.

Yesterday i have completely recreated that table from scratch, but corruption started like 2 minutes after i started filling it with data.

It started for me , if i remember correctly, with version 10.6.5 and every version after that. Unfortunately as that is a production server i cannot rollback to older version and test it.

Comment by Marko Mäkelä [ 2023-02-24 ]

DariusIII, MDEV-24854 in MariaDB Server 10.6 changed the default value to innodb_flush_method=O_DIRECT. If you revert to the old default innodb_flush_method=fsync, does it then work reliably for you?

Comment by Darko Krišan [ 2023-02-24 ]

@marko I have been using O_DIRECT for years, so that change makes no difference in this case.

Comment by Marko Mäkelä [ 2023-02-24 ]

DariusIII, I see. When did you upgrade to a 5.4 kernel? Have you considered upgrading to (say) Ubuntu 22.04, which would have a newer kernel with hopefully fewer bugs in the block layer or the file system? I assume that your server is built with libaio, because while the 5.4 kernel supports io_uring, Ubuntu 20.04 does not include liburing; you’d have to compile that library yourself.

We do most of our internal stress testing on /dev/shm and ext4, with focus on crash recovery. Data sets are typically smaller, but we have caught some Linux kernel bugs, such as MDEV-26674 and MDEV-29610 that affect io_uring. I believe that bug in an old Linux kernel is more likely than a bug in MariaDB.

Comment by Marko Mäkelä [ 2023-02-24 ]

DariusIII, I see. When did you upgrade to a 5.4 kernel? Have you considered upgrading to (say) Ubuntu 22.04, which would have a newer kernel with hopefully fewer bugs in the block layer or the file system? I assume that your server is built with libaio, because while the 5.4 kernel supports io_uring, Ubuntu 20.04 does not include liburing; you’d have to compile that library yourself.

We do most of our internal stress testing on /dev/shm and ext4, with focus on crash recovery. Data sets are typically smaller, but we have caught some Linux kernel bugs, such as MDEV-26674 and MDEV-29610 that affect io_uring. I believe that bug in an old Linux kernel is more likely than a bug in MariaDB.

Comment by Bento [ 2023-02-24 ]

I am using 22.04 and kernel 5.15 and have the same issue though.

Anyway, I just used the Ubuntu 22.04 ISO to install this system a year ago. Did not mess with anything related to iouring etc.

grep -i uring /boot/config-$(uname -r)
CONFIG_IO_URING=y

Comment by Darko Krišan [ 2023-02-24 ]

I ran same command @bento posted and result is:

CONFIG_IO_URING=y

Oh, and about the kernel update, i am always using latest updates from ubuntu, nothing added by hand.

Comment by Bento [ 2023-02-24 ]

I have to add that the kernel config also shows AIO:

CONFIG_AIO=y

I don't know how to check what is actually used.

Comment by Michael Widenius [ 2023-03-05 ]

Bento or Darko, can your try to disable uring with
--skip-innodb-use-native-aio
and see if that helps?

If it helps, then there is probably still some issue in the Linux kernel related to uring or AIO.
If it does not help, then the bug is probably on the MariaDB / InnoDB side.

Comment by Bento [ 2023-03-05 ]

Will try. I had innodb_use_native_aio = 0 set in my.cnf already though.

Comment by Michael Widenius [ 2023-03-06 ]

Bento, any chance you could a copy of corrupted table to the Jira issue or FTP it to
ftp://ftp.mariadb.org/private/ ?
Doing a clean shutdown first would be help a lot!

Comment by Bento [ 2023-03-06 ]

Uploaded to FTP: binaries.ibd and .frm

Did a `service mariadb stop` first

Comment by Marko Mäkelä [ 2023-03-09 ]

Fossil, which errors are being reported for the file binaries.ibd that you uploaded? I see that it uses the innodb_checksum_algorithm=full_crc32 format, and there are no page checksum errors reported for it.

Comment by Bento [ 2023-03-09 ]

Tagging doesn't work: @Marko

Anyway. The errors are the ones I have pasted in the original bug report all the way at the top.

SQLSTATE[HY000]: General error: 1712 Index binaries is corrupted

Jan 30 19:50:18 SERVER_HOSTNAME mariadbd[962260]: 2023-01-30 19:50:18 328202837 [ERROR] Got error 180 when reading table './dbname/binaries.ibd'
Jan 30 19:50:33 SERVER_HOSTNAME mariadbd[962260]:  6: len 30; hex 616c742e62696e6172692023-01-30 19:50:33 328203835 [ERROR] Got error 180 when reading table './dbname/binaries.ibd'

The idex is always different, could also be multiple ones failing:

Feb 09 20:30:54 SERVER_HOSTNAME mariadbd[3829372]: 2023-02-09 20:30:54 0 [ERROR] InnoDB: Flagged corruption of `ix_binary_partcheck` in table `DBNAME`.`binaries` in purge

Comment by Marko Mäkelä [ 2023-03-09 ]

Fossil, in my very first comment I suggested that some secondary indexes could be corrupted due to previous use of the InnoDB change buffer. I do not think that this type of corruption can have anything to do with the low level page I/O.

To fix the corruption, you could execute the following:

SET GLOBAL innodb_change_buffering=none;
--rebuilds the whole table, could be very I/O intensive
OPTIMIZE TABLE binaries;
--alternative
ALTER TABLE binaries DROP INDEX ix_binaries_binaryhash, DROP INDEX ux_collection_id_filenumber, DROP INDEX ix_binaries_collection, DROP INDEX ix_binaries_partcheck;
ALTER TABLE binaries ADD UNIQUE INDEX ix_binaries_binaryhash(binaryhash), ADD INDEX …;

We have many open bugs related to indexed virtual columns (MDEV-5800), but there are no virtual columns defined in your table, and none of the UNIQUE KEY seem to involve hidden virtual columns via MDEV-371 either.

Therefore, my main suspect for causing this corruption is the InnoDB change buffer. In my recent talk https://fosdem.org/2023/schedule/event/innodb_change_buffer/ I covered one possible scenario that can cause this type of corruption.

Comment by Bento [ 2023-03-09 ]

@Marko (tagging still doesnt work)

innodb_change_buffering was already disabled:

SHOW GLOBAL VARIABLES LIKE 'innodb_change_buffering'
innodb_change_buffering = none

By the way. When I try to OPTIMIZE the corrupted table I get this error back. I see I haven't posted that before:

Duplicate entry '1721887-0' for key 'ux_collection_id_filenumber'

[EDIT]
So because I can't do a WHERE clause on collections_id=1721887 I manually browsed through my tables until I found the collections_id 1721887 rows. And indeed there are many duplicate rows where collections_id = 1721887 and filenumber = 0.

The index for those 2 columns should be unique so I don't know how it inserted them in the first place. But after insertion it starts to complain about duplicate values and gets marked as corrupted.

[EDIT 2]
I found tons and tons of other entries where a rows with a certain collections_id has dozens of entries with filenumber = 0.

Not sure how that would be possible and why it complains only about collections_id 1721887 and not ones inserted before those.

[EDIT 3]
I just manually inserted 2 rows with the same collections_id and filenumber... Am I just being thick and missing something or??

The SHOW CREATE literally says:

UNIQUE KEY `ux_collection_id_filenumber` (`collections_id`,`filenumber`)

So I created a new test table identical to this one, removed the foreign key constraints though. Then tried to insert duplicate rows and it did not accept it, so that works as intended.

So then I truncated the original BINARIES table. The only way to get it to function again after corruption and tried to insert two identical rows again. Now it did not accept it, as it should.

So somewhere something goes wrong and corrupts the table. Maybe this is normal that it accepts dupes after corruption because it can't check for UNIQUE keys anymore due to said corruption but I don't know for sure.

Anyway, at least it's some more info lol.

Comment by Marko Mäkelä [ 2023-03-10 ]

Fossil, Jira uses a notation like

[~marko]

for user names.

Was the InnoDB change buffering ever enabled in the past? If yes, then the bug that was fixed as part of MDEV-30009 could have contributed to the uniqueness violation. Also, with sufficient amount of luck, if you had inserted duplicate entries while SET unique_checks=0 is in effect, the duplicate may not have been detected if the insert was done via the InnoDB change buffer. Normally, the buffering of inserts into unique indexes is disallowed, but unique_checks=0 disables that safety check. That variable is being set in scripts generated by mysqldump or mariadb-dump.

I created a table that corresponds to binaries.frm and then executed the following:

select * from information_schema.innodb_sys_virtual;

Because the result was empty, there are no hidden virtual columns created due to MDEV-371.

In MDEV-24402 I implemented CHECK TABLE…EXTENDED for InnoDB. It could be a good idea to run that on all your InnoDB tables and run OPTIMIZE TABLE on every table for which any errors or warnings are being reported.

Comment by Bento [ 2023-03-10 ]

marko this table was never restored using mysqldump etc. so unique_checks are not used.

Change buffering could have been enabled by default in the past on an older MariaDB version, but I truncated the table many times now because it's the only way to temp. fix the corruption after it happens. Also did OPTIMIZE before, can only do that when it's not corrupted of course.

Also I think DariusIII even dropped the entire table and recreated it by hand but that did not fix things either.

I think the non uniqueness is caused by inserting things into the table AFTER it already became corrupted. So it might not be related at all.

I run OPTIMIZE every once in a while on all my tables. It's only this binaries one that gets messed up. Same for DariusIII

Comment by Darko Krišan [ 2023-03-10 ]

Bento Yes i have dropped the table and recreated it multiple times and got same issue over and over.

Comment by Darko Krišan [ 2023-03-11 ]

marko This happened just now:

I ran

check table binaries extended;

I have truncated the table less than 12 hours ago.

Comment by Marko Mäkelä [ 2023-03-11 ]

DariusIII, the only other cause of this that comes to my mind is that the transaction system metadata is corrupted. I can name two ways how it could happen. One would be that if the database was originally created before MySQL 5.1.48 and upgraded to MariaDB 10.3 or later before MDEV-27800 had been fixed, the transaction IDs could become corrupted.

Another possible cause is a race condition that affects the purge of committed transaction history. While we only reproduced bad symptoms of MDEV-30671 in connection with innodb_undo_log_truncate=ON, I would not exclude a possibility of corruption when using the default setting innodb_undo_log_truncate=OFF.

I am guessing here, because no server error log contents has been made available.

Do the problems persist after initializing the database and restoring the contents from a logical backup (SQL dump)?

Comment by Darko Krišan [ 2023-03-12 ]

marko This is a production system that uses mariadb from its start, couple of years ago. I believe truncating and recreating the table from scratch should have eliminated any commited transaction history issue. Database has multiple indexes, unique and non unique, has around 20 different tables, but issue is with this one and only specific table.

Maybe this error from syslog could help?

Mar 11 03:22:37 mariadbd[1656]: 2023-03-11  3:22:37 0 [ERROR] InnoDB: tried to purge non-delete-marked record in index `ix_binary_partcheck` of table `nntmux`.`binaries`: tuple: TUPLE (info_bits=0, 2 fields): {[1] (0x01),[8]       _(0x000000000000B45F)}, record: COMPACT RECORD(info_bits=0, 2 fields): {[1] (0x01),[8]       _(0x000000000000B45F)}
Mar 11 03:22:37 mariadbd[1656]: 2023-03-11  3:22:37 0 [ERROR] InnoDB: Flagged corruption of `ix_binary_partcheck` in table `nntmux`.`binaries` in purge
Mar 11 03:22:37 mariadbd[1656]: 2023-03-11  3:22:37 0 [ERROR] InnoDB: tried to purge non-delete-marked record in index `ix_binary_collection` of table `nntmux`.`binaries`: tuple: TUPLE (info_bits=0, 2 fields): {[4]  \ (0x00005CC0),[8]       _(0x000000000000B45F)}, record: COMPACT RECORD(info_bits=0, 2 fields): {[4]  \ (0x00005CC0),[8]       _(0x000000000000B45F)}
Mar 11 03:22:37 mariadbd[1656]: 2023-03-11  3:22:37 0 [ERROR] InnoDB: Flagged corruption of `ix_binary_collection` in table `nntmux`.`binaries` in purge  

I now found an ongoing issue with similar symptoms: https://jira.mariadb.org/browse/MDEV-20640, and also couple of another issues listed there with similar issues.

Comment by Bento [ 2023-03-12 ]

Marko: What DariusIII said is true for my case too. I even mysqldump-ed and recreated the entire database in the beginning of December 2022 and upgraded from 10.6.x to 10.10.x.

It looks like (90% sure) when the table becomes corrupted I can still run OPTIMIZE to "fix" the corruption if the corruption just happened. But after a few hours (like if it happens at night) the only thing I can do is TRUNCATE to get it going again.

innodb_undo_log_truncate = OFF

on my install by the way.

Is there anything else I can do/run/monitor to get some more useful output to resolve this problem? It's been plaguing me for 5 months now.

Might be related: MDEV-26951

Comment by Marko Mäkelä [ 2023-03-13 ]

DariusIII and Fossil, MDEV-20640 and MDEV-26951 are known bugs around indexed virtual columns. As far as I understood, your tables do not include any virtual columns. If they do, bugs like that would be the simple explanation.

Currently, I do not have any idea what could cause such corruption.

Comment by Darko Krišan [ 2023-03-13 ]

marko I really have no idea what else to check, as this is really a weird issue, as table is same as it was for years.

Comment by Marko Mäkelä [ 2023-03-14 ]

Would the indexes become corrupted when using MariaDB 10.6.12?

Comment by Bento [ 2023-03-14 ]

I don't know about 10.60.12 specifically but I was on one of the later 10.6 versions in Dec 2022 and also had this issue. Thats why I tried upgrading to 10.10.x

Comment by Michael Widenius [ 2023-03-14 ]

Bento, please change to use innodb_doublewrite = 1 and see if this helps!

A couple of questions:

What MariaDB version are you using now?
Have you restored the database from a backup ?
Have the server crashed between corruptions (between the truncate and next error)
Is it the same table that is always crashing?
Can you share with us the full error log?
Any chance you can upgrade to 10.10.3. There was some bugs fixed in it, that may help.

It may also help if you once in a while do a 'check table' on the problematic table.
If check table reports corruption, the information it prints can be very helpful.

Comment by Bento [ 2023-03-14 ]

alice All the files are already in the original bug report. And I have enabled doublewrite as per Marko's request a bit further up, did not solve the issue.

monty

  • Still using 10.10.2-MariaDB-1:10.10.2+maria~ubu2204
  • Yes I restored it in December into 10.10.2
  • On the road right now but will share ASAP
  • Can try that when I get home next week (it's a production machine)
Comment by Marko Mäkelä [ 2023-03-14 ]

Fossil, which MariaDB version were you using before the problems started? Can you reproduce this corruption with the latest MariaDB 10.5? Or when using a different storage engine than InnoDB?

I am starting to suspect that the cause of this corruption could be something outside InnoDB. There were some changes to the memory management in the SQL layer in 10.6.

Comment by Bento [ 2023-03-14 ]

marko Hard to say, the issue did not immediately appear after updating so it was some 10.6.x version.

I think dariusiii tried MyRocks but I believe that also caused corruption for him. Not sure though.

Since it happens on a production machine, reverting to 10.5 is a PITA but I will see if I can do that in the next 2 weeks or so. Currently not at home/office.

Comment by Darko Krišan [ 2023-03-14 ]

bento marko I tried MyRocks, but it had other issues, not related to this problem but to how MyRocks works and handles data.

As i said in my previous comment:

It started for me , if i remember correctly, with version 10.6.5 and every version after that. Unfortunately as that is a production server i cannot rollback to older version and test it.

Comment by Marko Mäkelä [ 2023-03-15 ]

It would be great if one of you could revert to the latest 10.5 for test purposes. InnoDB uses the same file format in 10.5 and 10.6; a minor difference is that 10.6 implements crash-safe DDL operations. The biggest InnoDB changes between 10.5 and 10.6 are crash-safe DDL, some changes to locking (such as MDEV-20612), and new low-level synchronization primitives (MDEV-21452, MDEV-24142 and related changes).

In the SQL layer, there were substantial changes, such as this one and a few parent commits. If the trouble is not reproducible with the latest 10.5, a next step might be to create a hybrid of 10.5 with InnoDB adapted from 10.6. If also that turns out fine, we should look at other changes between 10.5 and 10.6.

Comment by Bento [ 2023-03-15 ]

marko I think we're both running 10.10, are the data files backwards compatible with 10.5? My DB is fairly large so if I have to mysqldump it, then downgrade and then import it again it woud mean quite some downtime for production.

Comment by Marko Mäkelä [ 2023-03-16 ]

Unfortunately, a downgrade from 10.8 or later to earlier versions than 10.8 is tricky. An optimistic attempt at removing ib_logfile0 after a clean shutdown and then starting 10.6 or older is bound to fail with error messages "LSN in the future" and all sorts of corruption due to that. Starting with MDEV-27199, shutdown no longer writes any log sequence number to the FIL_PAGE_FILE_FLUSH_LSN field in the system tablespace. The shutdown LSN would have to be patched to the file manually before starting the older server.

We do not test or guarantee downgrades between major versions. I do not know if anything else would be incompatible between 10.5 and 10.10. For InnoDB, this is the only change.

I think that to get meaningful results, it would be best to start from the scratch from a logical dump. Only in that way we can be sure that there was no hidden corruption somewhere in the InnoDB system tablespace or undo logs. It is much easier to reason about the failure if we know that only one MariaDB server version ever touched the files, and no physical backup was restored or no crash recovery took place.

Comment by Bento [ 2023-03-19 ]

I'm going to run a 10.5 Docker container on the production machine and change the underlying code so the table affected and any tables it has FK's too etc. run only on that DB while the rest of the application keeps querying the original DB.

Comment by Marko Mäkelä [ 2023-03-28 ]

Fossil, can you give a status update?

Comment by Bento [ 2023-03-30 ]

marko Working on it. Had to work on some other code before I can start testing this on 10.5. Will report back asap.

Comment by Marko Mäkelä [ 2023-04-18 ]

For the record, for some performance regression testing in MDEV-30986 I created a hacky port of InnoDB from MariaDB 10.6 to 10.5. Something like that could be the next step to try if the corruption is not reproducible with MariaDB 10.5 but is reproducible with 10.6.

Comment by Bento [ 2023-04-18 ]

marko I am working on changing the code so it uses two database instances. Will be able to test soon.

For now I have cronned an OPTIMIZE statement to run every hour which seems to keep the issue away for now.

Comment by Bento [ 2023-04-28 ]

I have started testing with MySQL 10.5 now in Docker. Running alongside 10.10.3 which runs directly on the host. Only the tables needed to test the corruption are in the 10.5 DB and I have edited the code to query these tables only on the 10.5 instance while the rest remains on the main DB.

Will see how it goes, not sure what the performance will be with it running in Docker.

[EDIT] Inserts seem a lot slower for now.

Comment by Marko Mäkelä [ 2023-04-28 ]

A number of InnoDB performance improvements went into 10.6. You might also need to explicitly set innodb_flush_method=O_DIRECT; the default was changed in 10.6 by MDEV-24854.

Comment by Bento [ 2023-04-28 ]

Doesn't help. Using fsync too in the native 10.10.3 install as per your advice. Anyway, will let it run like this for a while.

Comment by Bento [ 2023-05-05 ]

marko Been running 10.5 for 6 days no without any sign of corruption.

Comment by Marko Mäkelä [ 2023-05-08 ]

Fossil, thank you for the update. Next, can you please try to update to the latest 10.6 and see what happens, with explicitly set innodb_flush_method=fsync? It could be useful to wait for the upcoming 10.6.13 release, so that you will avoid the MDEV-29835 hangs, which apparently became easier to hit in the 10.6.12 release.

Comment by Marko Mäkelä [ 2023-05-10 ]

Fossil, you wrote that you encountered this issue on MariaDB Server 10.6 in December 2022. MariaDB Server 10.6.11 had been released in November 2022.

I found some changes as early as MariaDB Server 10.6.10 that could potentially explain this if your active working set is much larger than the InnoDB buffer pool. I am still investigating this. I would advise you to stay on 10.5 for now. Has that been stable for you?

Another user can reproduce corruption with ROW_FORMAT=COMPRESSED tables, but I see that you are not using any form of compression. Using ROW_FORMAT=COMPRESSED could greatly improve the chances of hitting the corruption.

Comment by Bento [ 2023-05-10 ]

marko I encountered this issue before December too, I only updated to 10.10 in December trying to get rid of the issue.

I am still just running the table that causes corruption in 10.5, the rest still runs in the "old" 10.10 instance.

My InnoDB Buffer pool setting is 96G. I have 35G of "raw" data in /var/lib/mysql.

Comment by Marko Mäkelä [ 2023-05-10 ]

I had a hypothesis for a possible cause of this bug, but I believe that it was incorrect. See the last comment in MDEV-30753: A buffer-fix should be enough to prevent a block from being evicted, and some recent optimizations in 10.6+ to remove redundant buffer page lookup should be fine.

Currently, our hope is that someone can reproduce this corruption in a controlled fashion and share the scripts or give debug access to the environment.

Comment by Marko Mäkelä [ 2023-05-29 ]

Fossil or DariusIII, does MariaDB Server 10.5 appear to be stable for you? Would the corruptions emerge as soon as you upgrade to 10.6?

Unfortunately, due to MDEV-31234, I cannot recommend upgrading to the currently latest point releases 10.5.20 or 10.6.13.

Currently, I am aware of one bug that could affect crash recovery and mariadb-backup; see MDEV-31353 for an example of that. I do not know yet if it affects the 10.6 release. Before MariaDB Server 10.6, a crash during DDL operations can be expected to lead to a corrupted database.

Today I diagnosed the race condition MDEV-31347 that I think should explain corruptions that look like MDEV-12434. The corruption here is different: no tablespace ID mismatch is being reported.

Comment by Bento [ 2023-05-29 ]

marko Yeah, 10.5 has been running the affected tables for some while now without any issues.

Comment by Marko Mäkelä [ 2023-06-02 ]

I was thinking that MDEV-31390 could explain this, but in the end I concluded that the code path cannot lead to any corruption. Any bug in that area should have been caught in our internal testing by the debug assertions that were added in MDEV-26933.

I had also thought if MDEV-31347 could explain this, but I think that it should have resulted in server error log messages about page number mismatch. No such messages were mentioned in this ticket so far.

Comment by Marko Mäkelä [ 2023-06-05 ]

Are there any FOREIGN KEY constraints attached to the indexed columns? We have some open bugs in that area, such as MDEV-30869. Between 10.5 and 10.6, MDEV-21175 removed work-arounds for some previously missing locking around FOREIGN KEY constraints.

Comment by Darko Krišan [ 2023-06-05 ]

marko Yes, there is a foreign key in binaries table.

Comment by Marko Mäkelä [ 2023-06-07 ]

Fossil, does your corrupting table involve FOREIGN KEY constraints as well?

Comment by Bento [ 2023-06-07 ]

Yeah its the same table.

Comment by Marko Mäkelä [ 2023-06-07 ]

I think that this is effectively blocked by the FOREIGN KEY related bug fixes that nikitamalyavin has been working on.

Comment by Nikita Malyavin [ 2023-06-09 ]

marko, MDEV-30869 is caused by a work-in-progress MDEV-30021 fix, which is not presented in any mainline branch. MDEV-30021 is itself debug-only at this moment.

Returning to innodb.

Comment by Marko Mäkelä [ 2023-06-09 ]

Fossil or DariusIII, can you share the complete table definitions of all tables that are connected via FOREIGN KEY constraints to a table where an index will be corrupted?

Table and column names can be replaced with something like t1,t2,t3,t4 and c1,c2,c3,c4, but they must remain unique and consistent. This would give us some ideas on how to reproduce the bug. Also, any hints of how these tables are being modified would be helpful. The indexed column whose index gets corrupted could also be modified due to an ON…CASCADE or ON…SET NULL constraint. For those cases, it would be useful to know also the SQL statements that can cause such constraints to be executed.

One last question: Does the application involve updating any PRIMARY KEY columns, or deleting and inserting records with the same primary key?

Comment by Marko Mäkelä [ 2023-06-09 ]

Another user who is hitting this has a table similar to the following. I omitted any unrelated non-indexed columns of the child table and obfuscated the names. I do not know the structure of the parent table, but I assume that it does not matter much for ON DELETE CASCADE.

CREATE TABLE parent(id int PRIMARY KEY) ENGINE=InnoDB;
CREATE TABLE child(
s tinyint unsigned NOT NULL,
id int unsigned NOT NULL,
o int NOT NULL,
PRIMARY KEY (id,s),
KEY (o),
CONSTRAINT FOREIGN KEY (id) REFERENCES parent(id) ON DELETE CASCADE
) ENGINE=InnoDB; 

Comment by Darko Krišan [ 2023-06-09 ]

Here is the structure of three connected tables:

CREATE TABLE `binaries` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(1000) NOT NULL DEFAULT '',
  `collections_id` int(11) unsigned NOT NULL DEFAULT 0,
  `filenumber` int(10) unsigned NOT NULL DEFAULT 0,
  `totalparts` int(11) unsigned NOT NULL DEFAULT 0,
  `currentparts` int(10) unsigned NOT NULL DEFAULT 0,
  `binaryhash` binary(16) NOT NULL DEFAULT '0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0',
  `partcheck` tinyint(1) unsigned NOT NULL DEFAULT 0,
  `partsize` bigint(20) unsigned NOT NULL DEFAULT 0,
  PRIMARY KEY (`id`),
  UNIQUE KEY `ix_binary_binaryhash` (`binaryhash`),
  KEY `ix_binary_partcheck` (`partcheck`),
  KEY `ix_binary_collection` (`collections_id`),
  CONSTRAINT `FK_collections` FOREIGN KEY (`collections_id`) REFERENCES `collections` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED
 
CREATE TABLE `parts` (
  `binaries_id` bigint(20) unsigned NOT NULL DEFAULT 0',
  `messageid` varchar(255) CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL DEFAULT '',
  `number` bigint(20) unsigned NOT NULL DEFAULT 0,
  `partnumber` mediumint(10) unsigned NOT NULL DEFAULT 0,
  `size` mediumint(20) unsigned NOT NULL DEFAULT 0,
  PRIMARY KEY (`binaries_id`,`number`),
  CONSTRAINT `FK_binaries` FOREIGN KEY (`binaries_id`) REFERENCES `binaries` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED
 
CREATE TABLE `collections` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `subject` varchar(255) NOT NULL DEFAULT '',
  `fromname` varchar(255) NOT NULL DEFAULT '',
  `date` datetime DEFAULT NULL,
  `xref` varchar(2000) NOT NULL DEFAULT '',
  `totalfiles` int(11) unsigned NOT NULL DEFAULT 0,
  `groups_id` int(11) unsigned NOT NULL DEFAULT 0 ,
  `collectionhash` varchar(255) NOT NULL DEFAULT '0',
  `collection_regexes_id` int(11) NOT NULL DEFAULT 0 ,
  `dateadded` datetime DEFAULT NULL,
  `filecheck` tinyint(3) unsigned NOT NULL DEFAULT 0,
  `filesize` bigint(20) unsigned NOT NULL DEFAULT 0,
  `releases_id` int(11) DEFAULT NULL,
  `noise` char(32) NOT NULL DEFAULT '',
  `added` timestamp NOT NULL DEFAULT current_timestamp(),
  PRIMARY KEY (`id`),
  UNIQUE KEY `ix_collection_collectionhash` (`collectionhash`),
  KEY `fromname` (`fromname`),
  KEY `DATE` (`date`),
  KEY `group_id` (`groups_id`),
  KEY `ix_collection_filecheck` (`filecheck`),
  KEY `ix_collection_dateadded` (`dateadded`),
  KEY `ix_collection_releaseid` (`releases_id`),
  KEY `ix_collection_added` (`added`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED

Comment by Marko Mäkelä [ 2023-06-09 ]

Oh, I see, this is ROW_FORMAT=COMPRESSED after all. I had mentioned that attribut on 2023-05-10, but got no response to that. Common themes with the other affected user would be ROW_FORMAT=COMPRESSED and ON DELETE CASCADE.

It is possible that some refactoring in 10.6, such as MDEV-27058 (10.6.6) or MDEV-13542 (10.6.9) broke something with regard to ROW_FORMAT=COMPRESSED.

It would help a lot if one of you could upgrade to 10.6.14 after converting the affected table binaries to ROW_FORMAT=DYNAMIC (the default). Would that be stable?

Comment by Bento [ 2023-06-09 ]

Not on mine. I am not using compressed, you can check the files I uploaded in this issue a few months ago. Every table is DYNAMIC.

Comment by Roel Van de Paar [ 2023-06-09 ]

DariusIII Hi! Thank you for providing the DDLs for the tables. The definition for the parts table appears to have an error:

unsigned NOT NULL DEFAULT 0',

I'm curious, were these definitions the exact outputs from a SHOW CREATE TABLE command or similar, or could there have been an error during the copy/paste process? I ask this to ensure that the omitted ' isn't contributing to or associated with the reported corruption. While it seems unlikely, it is not impossible.

Comment by Roel Van de Paar [ 2023-06-09 ]

Tables as provided by DariusIII: generalized table/col names for testing:

CREATE TABLE t3 (c1 int(11) unsigned NOT NULL AUTO_INCREMENT, c2 varchar(255) NOT NULL DEFAULT '', c3 varchar(255) NOT NULL DEFAULT '', c4 datetime DEFAULT NULL, c5 varchar(2000) NOT NULL DEFAULT '', c6 int(11) unsigned NOT NULL DEFAULT 0, c7 int(11) unsigned NOT NULL DEFAULT 0, c8 varchar(255) NOT NULL DEFAULT '0', c9 int(11) NOT NULL DEFAULT 0, c10 datetime DEFAULT NULL, c11 tinyint(3) unsigned NOT NULL DEFAULT 0, c12 bigint(20) unsigned NOT NULL DEFAULT 0, c13 int(11) DEFAULT NULL, c14 char(32) NOT NULL DEFAULT '', c15 timestamp NOT NULL DEFAULT current_timestamp(), PRIMARY KEY (c1), UNIQUE KEY k8 (c8), KEY k1 (c3), KEY k2 (c4), KEY k3 (c7), KEY k4 (c11), KEY k5 (c10), KEY k6 (c13), KEY k7 (c15) ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED;
CREATE TABLE t1 (c1 bigint(20) unsigned NOT NULL AUTO_INCREMENT, c2 varchar(1000) NOT NULL DEFAULT '', c3 int(11) unsigned NOT NULL DEFAULT 0, c4 int(10) unsigned NOT NULL DEFAULT 0, c5 int(11) unsigned NOT NULL DEFAULT 0, c6 int(10) unsigned NOT NULL DEFAULT 0, c7 binary(16) NOT NULL DEFAULT '0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0', c8 tinyint(1) unsigned NOT NULL DEFAULT 0, c9 bigint(20) unsigned NOT NULL DEFAULT 0, PRIMARY KEY (c1), UNIQUE KEY k1 (c7), KEY k2 (c8), KEY k3 (c3), CONSTRAINT fk3 FOREIGN KEY (c3) REFERENCES t3(c1) ON DELETE CASCADE ON UPDATE CASCADE ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED;
CREATE TABLE t2 (c1 bigint(20) unsigned NOT NULL DEFAULT 0, c2 varchar(255) CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL DEFAULT '', c3 bigint(20) unsigned NOT NULL DEFAULT 0, c4 mediumint(10) unsigned NOT NULL DEFAULT 0, c5 mediumint(20) unsigned NOT NULL DEFAULT 0, PRIMARY KEY (c1,c3), CONSTRAINT fk1 FOREIGN KEY (c1) REFERENCES t1(c1) ON DELETE CASCADE ON UPDATE CASCADE ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED;

Comment by Darko Krišan [ 2023-06-10 ]

roel These are from SHOW CREATE TABLE, but i have removed COMMENT part of definition (as they are irrelevant), so it could be an error on my side when deleting that

Comment by Roel Van de Paar [ 2023-06-10 ]

DariusIII Ack, understood. Thank you for confirming.

Comment by Roel Van de Paar [ 2023-06-10 ]

I started a number of testruns with the given table defs. Will report back in case I see anything relevant. marko Please let me know if any startup variable or other changes would likely help reproducibility, thanks.

Comment by Bento [ 2023-06-10 ]

Here are mine, should be the same tables but not using COMPRESSED.

CREATE TABLE `binaries` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(1000) NOT NULL DEFAULT '',
  `collections_id` int(10) unsigned NOT NULL DEFAULT 0,
  `filenumber` int(10) unsigned NOT NULL DEFAULT 0,
  `totalparts` int(10) unsigned NOT NULL DEFAULT 0,
  `currentparts` int(10) unsigned NOT NULL DEFAULT 0,
  `partcheck` tinyint(1) NOT NULL DEFAULT 0,
  `partsize` bigint(20) unsigned NOT NULL DEFAULT 0,
  `binaryhash` binary(16) NOT NULL DEFAULT '0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0',
  `fromname` varchar(500) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `ix_binaries_binaryhash` (`binaryhash`),
  UNIQUE KEY `ux_collection_id_filenumber` (`collections_id`,`filenumber`),
  KEY `ix_binaries_collection` (`collections_id`),
  KEY `ix_binaries_partcheck` (`partcheck`),
  CONSTRAINT `FK_Collections` FOREIGN KEY (`collections_id`) REFERENCES `collections` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=294495 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

CREATE TABLE `parts` (
  `binaries_id` bigint(20) unsigned NOT NULL DEFAULT 0,
  `messageid` varchar(255) NOT NULL DEFAULT '',
  `number` bigint(20) unsigned NOT NULL DEFAULT 0,
  `partnumber` int(10) unsigned NOT NULL DEFAULT 0,
  `size` int(10) unsigned NOT NULL DEFAULT 0,
  PRIMARY KEY (`binaries_id`,`number`),
  CONSTRAINT `FK_binaries` FOREIGN KEY (`binaries_id`) REFERENCES `binaries` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_unicode_ci

CREATE TABLE `collections` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `subject` varchar(255) NOT NULL DEFAULT '',
  `fromname` varchar(255) NOT NULL DEFAULT '',
  `date` datetime DEFAULT NULL,
  `xref` varchar(2000) NOT NULL DEFAULT '',
  `totalfiles` int(10) unsigned NOT NULL DEFAULT 0,
  `groups_id` int(10) unsigned NOT NULL DEFAULT 0,
  `collectionhash` varchar(255) NOT NULL DEFAULT '0',
  `collection_regexes_id` int(11) NOT NULL DEFAULT 0 COMMENT 'FK to collection_regexes.id',
  `dateadded` datetime DEFAULT NULL,
  `added` timestamp NOT NULL DEFAULT current_timestamp(),
  `filecheck` tinyint(1) NOT NULL DEFAULT 0,
  `filesize` bigint(20) unsigned NOT NULL DEFAULT 0,
  `releases_id` int(11) DEFAULT NULL,
  `noise` char(32) NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  UNIQUE KEY `ix_collection_collectionhash` (`collectionhash`),
  KEY `fromname` (`fromname`),
  KEY `date` (`date`),
  KEY `groups_id` (`groups_id`),
  KEY `ix_collection_dateadded` (`dateadded`),
  KEY `ix_collection_filecheck` (`filecheck`),
  KEY `ix_collection_releaseid` (`releases_id`)
) ENGINE=InnoDB AUTO_INCREMENT=61856 DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_unicode_ci

Comment by Roel Van de Paar [ 2023-06-13 ]

In my testrun, I ran into a new iteration of MDEV-19216:

!strcmp(index->table->name.m_name, "SYS_FOREIGN") || !strcmp(index->table->name.m_name, "SYS_FOREIGN_COLS")|SIGABRT|btr_node_ptr_max_size|btr_cur_t::search_leaf|row_ins_sec_index_entry_low|row_ins_sec_index_entry

Reduction not successful this far, but this bug may be related. Note the BINARY column. Other testcases are present in MDEV-19216 leading to very similar stacks (and the same assert) as the one seen in the testrun (which focuses on the given table DDL's). NTS: 515166/4737

Comment by Marko Mäkelä [ 2023-07-03 ]

Fossil, thank you. We now know that this corruption can occur with both ROW_FORMAT=DYNAMIC and ROW_FORMAT=COMPRESSED.

The assertion failure that Roel posted to MDEV-19216 suggests that the maximum size of a non-leaf page record cannot be determined correctly. Ideally, this code path, which was added in the fix of MDEV-14637, would be unreachable, but we need it for the sloppily defined system tables SYS_FOREIGN and SYS_FOREIGN_COLS. I have not yet analyzed that failure in detail, but it does not look related to this bug. Possibly, the assertion fails due to the (mis)use of the data type BINARY(0).

Is there any script that would simulate the actual workload on these tables?

Comment by Timo Lottmann [ 2023-07-11 ]

I got a smiliar problem with MariaDB > 10.6. I created a simple Java project to simulate it. You need Docker, Java JDK 17 and Gradle.

Just extract: mariadb-corruption-for-mariadb-ticket.tar.gz

Start MariaDB via Docker. I added 10.11.4 with default settings.

cd /path/to/project/mariadb-corruption-for-mariadb-ticket
docker compose up

Start gradle

cd /path/to/project/mariadb-corruption-for-mariadb-ticket
./gradlew run

After a few minutes an index gets corrupted:

[ERROR] InnoDB: We detected index corruption in an InnoDB type table. You have to dump + drop + reimport the table or, in a case of widespread corruption, dump all InnoDB tables and recreate the whole tablespace. If the mariadbd server crashes after the startup or when you dump the tables. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.

You can use your own MariaDB instance, if you use port 3808 or just change the port number in DatabaseCorruption.java.

Please let me know, if this is useful or if you need more information.

Comment by Roel Van de Paar [ 2023-07-15 ]

timolottmann, thank you very much. I was able to use this information to flow the testcase into a Bash script w/o any required dependencies. This was really helpful.

Comment by Roel Van de Paar [ 2023-07-15 ]

This Bash testcase with further reduced SQL:

#!/bin/bash
# Start server with --max_connections=10000
# Set variables and ensure ramloc is a ramdisk or tmpfs (i.e. /dev/shm)
 
user="root"
socket="./socket.sock"
db="test"
client="./bin/mariadb"
errorlog="./log/master.err"
ramloc="/dev/shm"
threads=2000   # Number of concurrent threads
queries=100    # Number of t1/t2 INSERTs per thread/per test round
rounds=999999  # Number of max test rounds
 
# Setup
${client} -u ${user} -S ${socket} -D ${db} -e "
DROP TABLE IF EXISTS t1;
DROP TABLE IF EXISTS t2;
CREATE TABLE t1 (c1 INT NOT NULL AUTO_INCREMENT, c2 INT NOT NULL, PRIMARY KEY (c1), UNIQUE KEY u1 (c1,c2)) ENGINE=InnoDB AUTO_INCREMENT=1 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4; 
CREATE TABLE t2 (c1 DATETIME NOT NULL, c2 DOUBLE NOT NULL, t1_c1 INT NOT NULL, PRIMARY KEY (t1_c1,c1)) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4;
"
 
insert_rows(){
  SQL=
  for ((i=0;i<${queries};i++)); do
    SQL="${SQL}INSERT INTO t1 (c2) VALUES (0); INSERT INTO t2 VALUES (CURRENT_TIMESTAMP, 0, (SELECT LAST_INSERT_ID()));"
  done
  ${client} -u ${user} -S ${socket} -D ${db} -e "${SQL}"
  rm -f ${ramloc}/${prefix}_md_proc_${1}  # Thread done
}
 
abort(){ jobs -p | xargs -P100 kill >/dev/null 2>&1; rm -Rf ${ramloc}/${prefix}_md_proc_*; exit 1; }
trap abort SIGINT
 
count=0
prefix="$(echo "${RANDOM}${RANDOM}${RANDOM}" | cut -b1-5)"
rm -f ${ramloc}/${prefix}_md_proc_*
for ((i=0;i<${rounds};i++)); do
  for ((i=0;i<${threads};i++)); do
    if [ ! -r ${ramloc}/${prefix}_md_proc_${i} ]; then  # Thread idle
      touch ${ramloc}/${prefix}_md_proc_${i}  # Thread busy
      insert_rows ${i} &
      count=$[ ${count} + 1 ]
      if [ $[ ${count} % 100 ] -eq 0 ]; then  # Limit disk I/O, check once every new 100 threads
        echo "Count: ${count}" | tee lastcount.log
        TAIL="$(tail -n10 ${errorlog} | tr -d '\n')"
        if [[ "${TAIL}" == *"ERROR"* ]]; then
          echo '*** Error found:'
          grep -i 'ERROR' log/master.err
          abort
        elif [[ "${TAIL}" == *"down complete"* ]]; then
          echo '*** Server shutdown'
          abort
        elif ! ${client}-admin ping -u ${user} -S ${socket} > /dev/null 2>&1; then
          echo '*** Server gone (killed/crashed)'
          abort
        fi
      fi
    fi
  done
done

Leads to:

11.1.2 3883eb63dc5e663558571c33d086c9fd3aa0cf8f (Optimized)

2023-07-15 15:41:26 0 [Note] /test/MD220623-mariadb-11.1.2-linux-x86_64-opt/bin/mariadbd: ready for connections.
Version: '11.1.2-MariaDB'  socket: '/test/MD220623-mariadb-11.1.2-linux-x86_64-opt/socket.sock'  port: 12801  MariaDB Server
2023-07-15 15:41:38 1509 [Note] InnoDB: Number of transaction pools: 2
2023-07-15 16:19:46 72616 [ERROR] InnoDB: We detected index corruption in an InnoDB type table. You have to dump + drop + reimport the table or, in a case of widespread corruption, dump all InnoDB tables and recreate the whole tablespace. If the mariadbd server crashes after the startup or when you dump the tables. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
2023-07-15 16:19:46 72616 [ERROR] mariadbd: Index for table 't2' is corrupt; try to repair it
2023-07-15 16:19:46 73218 [ERROR] mariadbd: Index for table 't2' is corrupt; try to repair it
2023-07-15 16:19:46 73195 [ERROR] mariadbd: Index for table 't2' is corrupt; try to repair it
2023-07-15 16:19:46 73187 [ERROR] mariadbd: Index for table 't2' is corrupt; try to repair it
... last message repeated ...

Note it can take anywhere from half a minute to 40 minutes or more to reproduce the issue, on a fast machine with medium load. The issue is very sporadic and system load, number of threads, number of insert's per thread, and other factors may affect issue reproducibility.

Issue observed on 10.6.15 c2d44ecb904dd5558d116433b6cceb5592182447 (Optimized) and 11.1.2 3883eb63dc5e663558571c33d086c9fd3aa0cf8f (Optimized).

Comment by Roel Van de Paar [ 2023-07-15 ]

No specific server settings/variables are required.

Comment by Roel Van de Paar [ 2023-07-15 ]

It is likely that at least two issues are being discussed in this ticket.

Comment by Thirunarayanan Balathandayuthapani [ 2023-07-19 ]

#3  0x000055904ab4d175 in btr_cur_t::pessimistic_search_leaf (
    this=this@entry=0x1482959e68b0, 
    tuple=tuple@entry=0x147ff807a020, mode=mode@entry=PAGE_CUR_LE, 
    mtr=mtr@entry=0x1482959e6e40)
    at /test/11.2_dbg/storage/innobase/btr/btr0cur.cc:1501
#4  0x000055904ab5f11b in btr_cur_t::search_leaf (
    this=this@entry=0x1482959e68b0, 
    tuple=tuple@entry=0x147ff807a020, mode=mode@entry=PAGE_CUR_LE, 
    latch_mode=<optimized out>, latch_mode@entry=BTR_MODIFY_TREE, 
    mtr=mtr@entry=0x1482959e6e40)
    at /test/11.2_dbg/storage/innobase/btr/btr0cur.cc:1437
#5  0x000055904aa3519f in btr_pcur_open (mtr=0x1482959e6e40, 
    cursor=0x1482959e68b0, latch_mode=BTR_MODIFY_TREE, 
    mode=PAGE_CUR_LE, tuple=0x147ff807a020)
    at /test/11.2_dbg/storage/innobase/include/btr0pcur.h:431
#6  row_ins_clust_index_entry_low (flags=flags@entry=0, 
    mode=mode@entry=BTR_MODIFY_TREE, 
    index=index@entry=0x1483b406a430, n_uniq=n_uniq@entry=2, 
    entry=entry@entry=0x147ff807a020, n_ext=n_ext@entry=0, 
    thr=0x1480881235f8)
    at /test/11.2_dbg/storage/innobase/row/row0ins.cc:2684
#7  0x000055904aa37f70 in row_ins_clust_index_entry (
    index=index@entry=0x1483b406a430, 
    entry=entry@entry=0x147ff807a020, thr=thr@entry=0x1480881235f8, 
    n_ext=n_ext@entry=0)
    at /test/11.2_dbg/storage/innobase/row/row0ins.cc:3260
#8  0x000055904aa395c8 in row_ins_index_entry (thr=0x1480881235f8, 
    entry=0x147ff807a020, index=0x1483b406a430)
    at /test/11.2_dbg/storage/innobase/row/row0ins.cc:3375
--Type <RET> for more, q to quit, c to continue without paging--
#9  row_ins_index_entry_step (thr=0x1480881235f8, node=<optimized out>) at /test/11.2_dbg/storage/innobase/row/row0ins.cc:3543
#10 row_ins (thr=0x1480881235f8, node=<optimized out>) at /test/11.2_dbg/storage/innobase/row/row0ins.cc:3660
#11 row_ins_step (thr=thr@entry=0x1480881235f8) at /test/11.2_dbg/storage/innobase/row/row0ins.cc:3789
#12 0x000055904aa5bbe5 in row_insert_for_mysql (mysql_rec=mysql_rec@entry=0x1480881127c8 "\377\231\260\246p\034", prebuilt=0x148088122e80, 
    ins_mode=ROW_INS_NORMAL) at /test/11.2_dbg/storage/innobase/row/row0mysql.cc:1317
#13 0x000055904a8f77c3 in ha_innobase::write_row (this=0x148088122620, record=0x1480881127c8 "\377\231\260\246p\034")
    at /test/11.2_dbg/storage/innobase/handler/ha_innodb.cc:7820
#14 0x000055904a5f46d5 in handler::ha_write_row (this=0x148088122620, buf=0x1480881127c8 "\377\231\260\246p\034")
    at /test/11.2_dbg/sql/handler.cc:7802
#15 0x000055904a2a365f in write_record (thd=thd@entry=0x148088000d48, table=table@entry=0x148334029198, info=info@entry=0x1482959e7bf0, 
    sink=sink@entry=0x0) at /test/11.2_dbg/sql/sql_insert.cc:2204
#16 0x000055904a2ad505 in mysql_insert (thd=thd@entry=0x148088000d48, table_list=<optimized out>, fields=..., values_list=..., 
    update_fields=..., update_values=..., duplic=DUP_ERROR, ignore=false, result=0x0) at /test/11.2_dbg/sql/sql_insert.cc:1154
#17 0x000055904a2edf0f in mysql_execute_command (thd=thd@entry=0x148088000d48, 
    is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /test/11.2_dbg/sql/sql_parse.cc:4449
#18 0x000055904a2daaab in mysql_parse (thd=thd@entry=0x148088000d48, rawbuf=<optimized out>, length=<optimized out>, 
    parser_state=parser_state@entry=0x1482959e8300) at /test/11.2_dbg/sql/sql_parse.cc:7774
#19 0x000055904a2e8176 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x148088000d48, 
    packet=packet@entry=0x14808800ae59 "INSERT INTO t2 VALUES (CURRENT_TIMESTAMP, 0, (SELECT LAST_INSERT_ID()))", 
    packet_length=packet_length@entry=71, blocking=blocking@entry=true) at /test/11.2_dbg/sql/sql_class.h:1370
#20 0x000055904a2ea64d in do_command (thd=0x148088000d48, blocking=blocking@entry=true) at /test/11.2_dbg/sql/sql_parse.cc:1405
#21 0x000055904a45029b in do_handle_one_connection (connect=<optimized out>, connect@entry=0x55904cde2df8, 
    put_in_cache=put_in_cache@entry=true) at /test/11.2_dbg/sql/sql_connect.cc:1416
#22 0x000055904a45076a in handle_one_connection (arg=0x55904cde2df8) at /test/11.2_dbg/sql/sql_connect.cc:1318
#23 0x000014840a282609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#24 0x0000148409e6e133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

We do get DB_CORRUPTION in the above code path. Debugging more to find out the reason.

Comment by Marko Mäkelä [ 2023-07-24 ]

We have reproduced some corruption that should be due to code refactoring that first appeared in MariaDB Server 10.6.12. This ticket claims that corruption occurs in 10.6.11 already.

Comment by Bento [ 2023-07-24 ]

I see that 10.6.11 was released on Nov 7th 2022, I think by then I already had these issues making me thing it would have occurred in an earlier version than that like 10.6.10.

Comment by Sebastian Stamm [ 2023-07-24 ]

The corruption we experienced in started with 10.6.12 and was solved by downgrading to .11.
But I don't got errors like "tried to purge non-delete-marked record in index", only Index for table '...' is corrupt; try to repair it.
So maybe there are multiple issues.

Comment by Marko Mäkelä [ 2023-07-24 ]

sstamm, your issue as well as what we have reproduced internally involves a clustered index, while this bug was about secondary indexes. I filed MDEV-31767 for that.

Comment by Marko Mäkelä [ 2023-07-25 ]

I just realized that the removal of the function buf_wait_for_read() in MDEV-27058 (MariaDB Server 10.6.6) could contribute to the originally reported corruption. If my hypothesis is correct, then fixing MDEV-31767 should fix the originally reported corruption in this ticket as well. This I think we will only know once Fossil and DariusIII are able to test something that includes the MDEV-31767 fix.

(For unique secondary indexes, there was also the bug MDEV-31120, introduced in MariaDB Server 10.6.12 and only reproducible when using the InnoDB change buffer that was disabled by default in MDEV-27734.)

Comment by Bento [ 2023-07-25 ]

Let me know if/when there's a Docker amd64 image I could test and I'll do it right away.

Comment by Marko Mäkelä [ 2023-07-25 ]

Fossil, great, I will. Our Docker maintainer may be busy this week. We might be able to include the MDEV-31767 fix in the upcoming 10.6.15 release, which should be out rather soon. I am not familiar with the regular release process of our Docker images. There might be some days of delay between the release of the underlying packages and the images.

Comment by Daniel Black [ 2023-07-25 ]

About a 1-2 hrs after a commit to a main branch the bb pushes the images show up as quay.io/mariadb-foundation/mariadb-devel:10.6 which when I wrote this did include the exact fix for this bug.

Comment by Bento [ 2023-07-26 ]

Wait the fix for this ticket is already in 10.6.x?

Comment by Marko Mäkelä [ 2023-07-26 ]

Fossil, I just pushed the fix of MDEV-31767 to the 10.6 branch. Regular packages of this development snapshot should appear at https://ci.mariadb.org/36987/ within the next two hours, and I hope that quay.io/mariadb-foundation/mariadb-devel:10.6 will be updated soon too. The previous push into the 10.6 branch should have been 2 days ago (Monday).

Comment by Bento [ 2023-07-26 ]

Cool. Is this fix also present in 10.10.x?

Now running 10.10.5-MariaDB-1:10.10.5+maria~ubu2204 as my main instance and I am still running 10.5 in Docker for the crashing tables.

Comment by Marko Mäkelä [ 2023-07-26 ]

So far, I have merged this to 10.9 and to 10.10 (https://ci.mariadb.org/37004/). The mandatory staging builds are taking time. For some reason, I did not find any build related to the push to the 10.9 branch. It could be related to some overload on the CI system (we are close to scheduled quarterly releases).

Comment by Bento [ 2023-07-26 ]

Okay. Since I am running 10.10.5 in production it would be cool if I could test that one or 10.11.x. That way I can just upgrade my main database and see how it normally runs with all the tables in one DB. That would be the best test I think.

But if needed I can test any version in Docker with just the failing tables.

Comment by Marko Mäkelä [ 2023-07-26 ]

I have pushed the merge of the MDEV-31767 fix to all branches (up to 11.2).

Comment by Daniel Black [ 2023-07-26 ]

All updated.

To see what commits are in a pulled image:

$ podman pull quay.io/mariadb-foundation/mariadb-devel:10.10
$ podman inspect --format="{{json .Annotations}}" quay.io/mariadb-foundation/mariadb-devel:10.10 | jq
{
  "org.opencontainers.image.authors": "MariaDB Foundation",
  "org.opencontainers.image.base.digest": "sha256:4e64b60f60c9c58b5b8001158442cc8ef2fc0d5b60743c20c60c4136d5883cdf",
  "org.opencontainers.image.base.name": "localhost/mariadb-37004-amd64:latest",
  "org.opencontainers.image.description": "This is not a Release.\\nBuild of the MariaDB Server from CI as of commit b1b47264d2ef1fa80b90f308e81c49d9a1011d56",
  "org.opencontainers.image.documentation": "https://hub.docker.com/_/mariadb",
  "org.opencontainers.image.licenses": "GPL-2.0",
  "org.opencontainers.image.revision": "b1b47264d2ef1fa80b90f308e81c49d9a1011d56",
  "org.opencontainers.image.source": "https://github.com/MariaDB/mariadb-docker/tree/28adeb9071427d1e18aa9317f41d62fa4a0f3769/10.10",
  "org.opencontainers.image.title": "MariaDB Server 10.10 CI build",
  "org.opencontainers.image.version": "10.10.6+b1b47264d2ef1fa80b90f308e81c49d9a1011d56"
}
$ # Altnernately
$ podman run --rm  quay.io/mariadb-foundation/mariadb-devel:10.10 mariadbd --help --verbose | grep  version-source-revision
2023-07-26 22:51:17 0 [Warning] Could not open mysql.plugin table: "Table 'mysql.plugin' doesn't exist". Some options may be missing from the help text
version-source-revision                                      b1b47264d2ef1fa80b90f308e81c49d9a1011d56

note: I always use podman, but docker cli has the same syntax.

Form a URL with the org.opencontainers.image.revision - https://github.com/MariaDB/server/commits/b1b47264d2ef1fa80b90f308e81c49d9a1011d56

marko's "the fix" link earlier is b102872ad50cce5959ad95369740766d14e9e48c MDEV-31767 InnoDB tables are being flagged as corrupted on an I/O bou… which is in the URL form.

Thanks for volunteering to test Fossil.

Comment by Bento [ 2023-07-27 ]

Cool. Is there a Ubuntu .deb package too by any chance?

Comment by Daniel Black [ 2023-07-27 ]

there's even a repo, its what the containers use:

$ podman run --rm -ti quay.io/mariadb-foundation/mariadb-devel:10.10  bash
root@4300db4363e0:/# cat /etc/apt/sources.list.d/mariadb.list 
deb [trusted=yes] https://ci.mariadb.org/37004/amd64-ubuntu-2204-deb-autobake/debs ./
deb [trusted=yes] https://buildbot.mariadb.net/archive/builds/mariadb-4.x/latest/kvm-deb-jammy-amd64-gal/debs/ ./

Adjust urls to other Ubuntu versions/architectures.

Comment by Marko Mäkelä [ 2023-07-28 ]

danblack, thank you for making it easier to access the development snapshots.

Fossil and others, I am eager to know if MDEV-31767 might have fixed this. Suppose that you observed these corruptions about every N days. I think that we should wait at least 3N to 5N failure-free days before drawing any conclusions.

Comment by Marko Mäkelä [ 2023-08-25 ]

Fossil, DariusIII, did you have a chance to update a server to a version where MDEV-31767 was fixed, such as 10.6.15 or 10.11.5? Did the corruptions disappear?

Comment by Bento [ 2023-09-01 ]

marko I have not tried it yet as I was out on vacation. But I spoke to DariusIII and he was running 11.0.3 and still got the same corruption.

Comment by Marko Mäkelä [ 2023-09-01 ]

DariusIII, can you confirm this first hand? Had you rebuilt the tables (OPTIMIZE TABLE or ALTER TABLE…FORCE) that were claimed as corrupted?

Comment by Darko Krišan [ 2023-09-01 ]

marko Yes, i have installed latest 11.0 version, switched to latest 11.1 afterwards and still had the same issue. As soon as tables started being used indexes were corrupted in couple of minutes. Optimizing the affected table fixes the issue for some time.

Comment by Roel Van de Paar [ 2023-09-02 ]

For completeness, I rechecked using the script listed above on a fixed 11.3 branch (both dbg + opt) and no issues were observed.

Comment by Roel Van de Paar [ 2023-09-02 ]

DariusIII Thank you for the feedback. Can you also confirm the table rebuild for all affected/used tables which Marko mentioned? Thank you

Comment by Baptiste Jamin [ 2023-09-02 ]

We are having the exact same problem on : `mysqld Ver 10.11.5-MariaDB-1:10.11.5+maria~deb12-log for debian-linux-gnu on x86_64 (mariadb.org binary distribution)`.

Table rebuild only temporarily fixes the problem.

FS is EXT4, with fsync enabled instead of O_DIRECT and DMA disabled.

[Edit]

Could it be related to this commit?https://github.com/MariaDB/server/commit/0b47c126e31cddda1e94588799599e138400bcf8

What I don't understand here is for almost the same error, we mark the table as corrupted here:

https://github.com/MariaDB/server/blob/7ba9c7fb84b5f28e4736656b57d9508b70ca6369/storage/innobase/row/row0purge.cc#L476 but not here
https://github.com/MariaDB/server/blob/7ba9c7fb84b5f28e4736656b57d9508b70ca6369/storage/innobase/row/row0purge.cc#L392

Comment by Elena Stepanova [ 2023-09-02 ]

The test case below is non-deterministic, run with --repeat=N. It usually fails for me within a few attempts on one machine and within 20-40 attempts on another.
The tables in the test case are from the comments above. Some tables, fields, indexes and table options were removed, but nothing was added.

With this test, my bisect points at this commit in 10.6.11

commit 78a04a4c22d54dc4f67f067fc9b7a0bc717ebfdd
Author: Vlad Lesin
Date:   Wed Oct 26 11:58:22 2022 +0300
 
    MDEV-29869 mtr failure: innodb.deadlock_wait_thr_race

It doesn't however correlate with information in the comments, when the problem was said to be encountered on 10.6.5 already. It is possible that the bisect result is inaccurate and the commit just increased the probability of the failure; or maybe there was yet another problem in versions before 10.6.11.

--source include/have_innodb.inc
 
CREATE TABLE collections (
  id int(11) unsigned NOT NULL AUTO_INCREMENT,
  collectionhash varchar(255) NOT NULL DEFAULT '0',
  PRIMARY KEY (id),
  UNIQUE KEY ix_collection_collectionhash (collectionhash)
) ENGINE=InnoDB;
CREATE TABLE binaries (
  id bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  collections_id int(11) unsigned NOT NULL DEFAULT 0,
  binaryhash binary(16) NOT NULL DEFAULT '0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0',
  PRIMARY KEY (id),
  UNIQUE KEY ix_binary_binaryhash (binaryhash),
  CONSTRAINT FK_collections FOREIGN KEY (collections_id) REFERENCES collections (id) ON DELETE CASCADE
) ENGINE=InnoDB;
 
INSERT INTO collections (id) VALUES (NULL);
 
--connect (con1,localhost,root,,)
INSERT INTO binaries (id,collections_id) VALUES (NULL,1);
--send
  REPLACE INTO collections (id) VALUES (NULL);
 
--connection default
--error 0,ER_LOCK_DEADLOCK,ER_NO_REFERENCED_ROW_2
REPLACE INTO binaries (id) VALUES (NULL);
 
--source include/restart_mysqld.inc
 
CHECK TABLE binaries, collections EXTENDED;
 
# Cleanup
DROP TABLE binaries, collections;

10.6 2325f8f3

CHECK TABLE binaries, collections EXTENDED;
Table	Op	Msg_type	Msg_text
test.binaries	check	Warning	InnoDB: Index ix_binary_binaryhash is marked as corrupted
test.binaries	check	Warning	InnoDB: Index FK_collections is marked as corrupted
test.binaries	check	error	Corrupt
test.collections	check	status	OK

2023-09-02 18:21:47 0 [ERROR] InnoDB: tried to purge non-delete-marked record in index `ix_binary_binaryhash` of table `test`.`binaries`: tuple: TUPLE (info_bits=0, 2 fields): {[16]0               (0x30000000000000000000000000000000),[8]        (0x0000000000000001)}, record: COMPACT RECORD(info_bits=0, 2 fields): {[16]0               (0x30000000000000000000000000000000),[8]        (0x0000000000000001)}
2023-09-02 18:21:47 0 [ERROR] InnoDB: Flagged corruption of `ix_binary_binaryhash` in table `test`.`binaries` in purge
2023-09-02 18:21:47 0 [ERROR] InnoDB: tried to purge non-delete-marked record in index `FK_collections` of table `test`.`binaries`: tuple: TUPLE (info_bits=0, 2 fields): {[4]    (0x00000001),[8]        (0x0000000000000001)}, record: COMPACT RECORD(info_bits=0, 2 fields): {[4]    (0x00000001),[8]        (0x0000000000000001)}
2023-09-02 18:21:47 0 [ERROR] InnoDB: Flagged corruption of `FK_collections` in table `test`.`binaries` in purge

Intermediate more concurrent and more continuous versions of the test case also triggered the assertion failure:

2023-09-02 18:21:24 0 [ERROR] InnoDB: Flagged corruption of `ix_binary_collection` in table `test`.`binaries` in purge
mysqld: /data/src/10.6/storage/innobase/btr/btr0cur.cc:1011: dberr_t btr_cur_t::search_leaf(const dtuple_t*, page_cur_mode_t, btr_latch_mode, mtr_t*): Assertion `index()->is_btree() || index()->is_ibuf()' failed.
230902 18:21:24 [ERROR] mysqld got signal 6 ;
 
#9  0x00007f7c54253df2 in __GI___assert_fail (assertion=0x55f694f1fd80 "index()->is_btree() || index()->is_ibuf()", file=0x55f694f1ee00 "/data/src/10.6/storage/innobase/btr/btr0cur.cc", line=1011, function=0x55f694f1fde0 "dberr_t btr_cur_t::search_leaf(const dtuple_t*, page_cur_mode_t, btr_latch_mode, mtr_t*)") at ./assert/assert.c:101
#10 0x000055f693a9610b in btr_cur_t::search_leaf (this=0x7f7c27ae58f0, tuple=0x6170002cbd40, mode=PAGE_CUR_GE, latch_mode=BTR_SEARCH_LEAF, mtr=0x7f7c27ae5cf0) at /data/src/10.6/storage/innobase/btr/btr0cur.cc:1011
#11 0x000055f693554aad in btr_pcur_open (tuple=0x6170002cbd40, mode=PAGE_CUR_GE, latch_mode=BTR_SEARCH_LEAF, cursor=0x7f7c27ae58f0, mtr=0x7f7c27ae5cf0) at /data/src/10.6/storage/innobase/include/btr0pcur.h:431
#12 0x000055f693869045 in row_ins_check_foreign_constraint (check_ref=0, foreign=0x6160004eeb20, table=0x618000010520, entry=0x6170002cbd40, thr=0x620000023c90) at /data/src/10.6/storage/innobase/row/row0ins.cc:1627
#13 0x000055f693958803 in row_upd_check_references_constraints (node=0x6200000238a0, pcur=0x6200000239d0, table=0x618000010520, index=0x6160004e5220, offsets=0x7f7c27ae6370, thr=0x620000023c90, mtr=0x7f7c27ae6a10) at /data/src/10.6/storage/innobase/row/row0upd.cc:252
#14 0x000055f6939651d3 in row_upd_clust_rec_by_insert (node=0x6200000238a0, index=0x6160004e5220, thr=0x620000023c90, referenced=true, foreign=false, mtr=0x7f7c27ae6a10) at /data/src/10.6/storage/innobase/row/row0upd.cc:2281
#15 0x000055f693967ba2 in row_upd_clust_step (node=0x6200000238a0, thr=0x620000023c90) at /data/src/10.6/storage/innobase/row/row0upd.cc:2682
#16 0x000055f6939684d2 in row_upd (node=0x6200000238a0, thr=0x620000023c90) at /data/src/10.6/storage/innobase/row/row0upd.cc:2758
#17 0x000055f69396941e in row_upd_step (thr=0x620000023c90) at /data/src/10.6/storage/innobase/row/row0upd.cc:2900
#18 0x000055f6938bc3aa in row_update_for_mysql (prebuilt=0x620000023120) at /data/src/10.6/storage/innobase/row/row0mysql.cc:1688
#19 0x000055f693506747 in ha_innobase::update_row (this=0x61d0001e28b8, old_row=0x61d0001e36c0 "\304\001", new_row=0x61d0001e32b8 "") at /data/src/10.6/storage/innobase/handler/ha_innodb.cc:8688
#20 0x000055f692af3a4d in handler::ha_update_row (this=0x61d0001e28b8, old_data=0x61d0001e36c0 "\304\001", new_data=0x61d0001e32b8 "") at /data/src/10.6/sql/handler.cc:7699
#21 0x000055f6925789c4 in mysql_update (thd=0x62b0000af218, table_list=0x62b0000b6400, fields=..., values=..., conds=0x0, order_num=1, order=0x62b0000b6eb8, limit=1, ignore=false, found_return=0x7f7c27ae7f40, updated_return=0x7f7c27ae7f60) at /data/src/10.6/sql/sql_update.cc:1092
#22 0x000055f692250d26 in mysql_execute_command (thd=0x62b0000af218, is_called_from_prepared_stmt=false) at /data/src/10.6/sql/sql_parse.cc:4423
#23 0x000055f69226a6fb in mysql_parse (thd=0x62b0000af218, rawbuf=0x62b0000b6238 "/* WRK-2 QNO 1037 */ UPDATE test.collections SET `id` = 0 ORDER BY `collectionhash` LIMIT 1", length=91, parser_state=0x7f7c27ae89f0) at /data/src/10.6/sql/sql_parse.cc:8048
#24 0x000055f6922403bc in dispatch_command (command=COM_QUERY, thd=0x62b0000af218, packet=0x62900012c219 "/* WRK-2 QNO 1037 */ UPDATE test.collections SET `id` = 0 ORDER BY `collectionhash` LIMIT 1", packet_length=91, blocking=true) at /data/src/10.6/sql/sql_parse.cc:1896
#25 0x000055f69223d0f0 in do_command (thd=0x62b0000af218, blocking=true) at /data/src/10.6/sql/sql_parse.cc:1409
#26 0x000055f6926a951e in do_handle_one_connection (connect=0x608000002638, put_in_cache=true) at /data/src/10.6/sql/sql_connect.cc:1416
#27 0x000055f6926a8edf in handle_one_connection (arg=0x608000002638) at /data/src/10.6/sql/sql_connect.cc:1318
#28 0x000055f693305c50 in pfs_spawn_thread (arg=0x617000006998) at /data/src/10.6/storage/perfschema/pfs.cc:2201
#29 0x00007f7c542a7fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#30 0x00007f7c543285bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Comment by Baptiste Jamin [ 2023-09-03 ]

This bug is the exact same as https://bugs.mysql.com/bug.php?id=86485

The test script makes the crash in an instant:

```
Sep 03 05:57:24 vultr mariadbd[8448]: 2023-09-03 5:57:24 0 [ERROR] InnoDB: tried to purge non-delete-marked record in index `c` of table `test_corruption`.`t`: tuple: TUPLE (info_bits=0, 2 fields):

{[1]a(0x61),[4] (0x80000005)}, record: COMPACT RECORD(info_bits=0, 2 fields): {[1]a(0x61),[4] (0x80000005)}

Sep 03 05:57:24 vultr mariadbd[8448]: 2023-09-03 5:57:24 0 [ERROR] InnoDB: Flagged corruption of `c` in table `test_corruption`.`t` in purge
```

set sql_mode="";
drop table if exists t;
create table t (
  `a` int,
  `b` int not null,
  `c` char(2) generated always as ('aa') virtual,
  unique key(`b`),
  unique key(`c`(1))
) engine=innodb default charset=utf8 row_format=compact;
 
drop procedure if exists p;
delimiter $
create procedure p()
begin
  declare continue handler for sqlexception begin end;
  repeat
    if rand()>0.5 then insert into t(a,b) values(floor(rand()*10),floor(rand()*10)); end if;
    if rand()>0.5 then delete from t where a<floor(rand()*10); end if;
    if rand()>0.5 then update t set b=concat(a,a); end if;
    if rand()>0.5 then replace into t(a,b) values(floor(rand()*10),floor(rand()*10)); end if;
  until 1=2 end repeat;
end $
delimiter ;
 
call p();

I checked and the patch has never been backported into Maria : https://github.com/percona/percona-server/commit/ef4a3ee7329afe4dedd8aeee9274fb41fce8aafd

Seems to be the solution

Comment by Roel Van de Paar [ 2023-09-03 ]

I tried running the script in the last comment (i.e. https://bugs.mysql.com/bug.php?id=86485) against:

11.3 (dbg) @ 7ba9c7fb84b5f28e4736656b57d9508b70ca6369 (Build date 2 Sep)
10.6 (dbg) @ d794d3484b2cbd069d68dc8d017a6f18e9a3090e (Build date 1 Aug)
10.6 (opt) @ d794d3484b2cbd069d68dc8d017a6f18e9a3090e (Build date 1 Aug)

And observed no corruptions in any of these when running for about 30 minutes on each. However, when restarting 11.3 and trying the same again, it crashed immediately. If you do not get an immediate failure, CTRL+c the script, and re-run the same script in the same client session again, or reboot the instance and try again. This script also produces the index()->is_btree() assertion seen by Elena quite well at times:

11.3.0 7ba9c7fb84b5f28e4736656b57d9508b70ca6369 (Debug)

Version: '11.3.0-MariaDB-debug'  socket: '/test/MD020923-mariadb-11.3.0-linux-x86_64-dbg/socket.sock'  port: 11670  MariaDB Server
2023-09-04  9:24:51 0 [ERROR] InnoDB: tried to purge non-delete-marked record in index `c` of table `test`.`t`: tuple: TUPLE (info_bits=0, 2 fields): {[1]a(0x61),[4]    (0x80000006)}, record: COMPACT RECORD(info_bits=0, 2 fields): {[1]a(0x61),[4]    (0x80000006)}
2023-09-04  9:24:51 0 [ERROR] InnoDB: Flagged corruption of `c` in table `test`.`t` in purge
mariadbd: /test/git-bisect/11.3_dbg/storage/innobase/btr/btr0cur.cc:1001: dberr_t btr_cur_t::search_leaf(const dtuple_t*, page_cur_mode_t, btr_latch_mode, mtr_t*): Assertion `index()->is_btree()' failed.

11.3.0 7ba9c7fb84b5f28e4736656b57d9508b70ca6369 (Debug)

Core was generated by `/test/MD020923-mariadb-11.3.0-linux-x86_64-dbg/bin/mariadbd --no-defaults --cor'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=22585690433088)
    at ./nptl/pthread_kill.c:44
[Current thread is 1 (Thread 0x148aa40e2640 (LWP 2650661))]
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=22585690433088) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=22585690433088) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=22585690433088, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x0000148ad6042476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x0000148ad60287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x0000148ad602871b in __assert_fail_base (fmt=0x148ad61dd150 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5638ce17b509 "index()->is_btree()", file=0x5638ce17b7a8 "/test/git-bisect/11.3_dbg/storage/innobase/btr/btr0cur.cc", line=1001, function=<optimized out>) at ./assert/assert.c:92
#6  0x0000148ad6039e96 in __GI___assert_fail (assertion=0x5638ce17b509 "index()->is_btree()", file=0x5638ce17b7a8 "/test/git-bisect/11.3_dbg/storage/innobase/btr/btr0cur.cc", line=1001, function=0x5638ce17e2c0 "dberr_t btr_cur_t::search_leaf(const dtuple_t*, page_cur_mode_t, btr_latch_mode, mtr_t*)") at ./assert/assert.c:101
#7  0x00005638cdbdb0a0 in btr_cur_t::search_leaf (this=this@entry=0x148aa40dd850, tuple=tuple@entry=0x148a300de030, mode=mode@entry=PAGE_CUR_LE, latch_mode=latch_mode@entry=BTR_MODIFY_LEAF, mtr=mtr@entry=0x148aa40de100) at /test/git-bisect/11.3_dbg/storage/innobase/btr/btr0cur.cc:1001
#8  0x00005638cdad2626 in row_ins_sec_index_entry_low (flags=flags@entry=0, mode=mode@entry=BTR_MODIFY_LEAF, index=index@entry=0x148a30067da0, offsets_heap=<optimized out>, offsets_heap@entry=0x148a300de440, heap=heap@entry=0x148a300de8d0, entry=entry@entry=0x148a300de030, trx_id=0, thr=0x148a300c7910) at /test/git-bisect/11.3_dbg/storage/innobase/row/row0ins.cc:3026
#9  0x00005638cdad4c08 in row_ins_sec_index_entry (index=index@entry=0x148a30067da0, entry=entry@entry=0x148a300de030, thr=thr@entry=0x148a300c7910, check_foreign=<optimized out>) at /test/git-bisect/11.3_dbg/storage/innobase/row/row0ins.cc:3323
#10 0x00005638cdb4c0e4 in row_upd_sec_index_entry (node=node@entry=0x148a300b4608, thr=thr@entry=0x148a300c7910) at /test/git-bisect/11.3_dbg/storage/innobase/row/row0upd.cc:2011
#11 0x00005638cdb4e164 in row_upd_sec_step (thr=0x148a300c7910, node=0x148a300b4608) at /test/git-bisect/11.3_dbg/storage/innobase/row/row0upd.cc:2038
#12 row_upd (thr=0x148a300c7910, node=0x148a300b4608) at /test/git-bisect/11.3_dbg/storage/innobase/row/row0upd.cc:2762
#13 row_upd_step (thr=thr@entry=0x148a300c7910) at /test/git-bisect/11.3_dbg/storage/innobase/row/row0upd.cc:2877
#14 0x00005638cdafa36d in row_update_for_mysql (prebuilt=0x148a300b3bd0) at /test/git-bisect/11.3_dbg/storage/innobase/row/row0mysql.cc:1684
#15 0x00005638cd9b5962 in ha_innobase::update_row (this=0x148a300b1fa0, old_row=0x148a300b1b98 "\371\004", new_row=0x148a300b1b88 "\371\004") at /test/git-bisect/11.3_dbg/storage/innobase/handler/ha_innodb.cc:8611
#16 0x00005638cd6ca25f in handler::ha_update_row (this=0x148a300b1fa0, old_data=0x148a300b1b98 "\371\004", new_data=0x148a300b1b88 "\371\004") at /test/git-bisect/11.3_dbg/sql/handler.cc:7882
#17 0x00005638cd4d78fd in Sql_cmd_update::update_single_table (this=<optimized out>, thd=0x148a30000d58) at /test/git-bisect/11.3_dbg/sql/sql_update.cc:987
#18 0x00005638cd4d8756 in Sql_cmd_update::execute_inner (this=0x148a30095728, thd=0x148a30000d58) at /test/git-bisect/11.3_dbg/sql/sql_update.cc:3059
#19 0x00005638cd40e6fc in Sql_cmd_dml::execute (this=0x148a30095728, thd=0x148a30000d58) at /test/git-bisect/11.3_dbg/sql/sql_select.cc:33344
#20 0x00005638cd3ce977 in mysql_execute_command (thd=0x148a30000d58, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /test/git-bisect/11.3_dbg/sql/sql_parse.cc:4371
#21 0x00005638cd5cd148 in sp_instr_stmt::exec_core (this=0x148a30099168, thd=<optimized out>, nextp=0x148aa40e0184) at /test/git-bisect/11.3_dbg/sql/sp_instr.cc:956
#22 0x00005638cd5d13f2 in sp_lex_keeper::reset_lex_and_exec_core (this=this@entry=0x148a300991a0, thd=thd@entry=0x148a30000d58, nextp=nextp@entry=0x148aa40e0184, open_tables=open_tables@entry=false, instr=instr@entry=0x148a30099168, rerun_the_same_instr=rerun_the_same_instr@entry=false) at /test/git-bisect/11.3_dbg/sql/sp_instr.cc:296
#23 0x00005638cd5d1e25 in sp_lex_keeper::validate_lex_and_exec_core (this=this@entry=0x148a300991a0, thd=thd@entry=0x148a30000d58, nextp=nextp@entry=0x148aa40e0184, open_tables=open_tables@entry=false, instr=instr@entry=0x148a30099168) at /test/git-bisect/11.3_dbg/sql/sp_instr.cc:475
#24 0x00005638cd5d2759 in sp_instr_stmt::execute (this=0x148a30099168, thd=0x148a30000d58, nextp=0x148aa40e0184) at /test/git-bisect/11.3_dbg/sql/sp_instr.cc:862
#25 0x00005638cd314382 in sp_head::execute (this=this@entry=0x148a30085908, thd=thd@entry=0x148a30000d58, merge_da_on_success=merge_da_on_success@entry=true) at /test/git-bisect/11.3_dbg/sql/sp_head.cc:1280
#26 0x00005638cd316726 in sp_head::execute_procedure (this=0x148a30085908, thd=thd@entry=0x148a30000d58, args=0x148a30005fb8) at /test/git-bisect/11.3_dbg/sql/sp_head.cc:2267
#27 0x00005638cd3c1976 in do_execute_sp (thd=thd@entry=0x148a30000d58, sp=sp@entry=0x148a30085908) at /test/git-bisect/11.3_dbg/sql/sql_parse.cc:3025
#28 0x00005638cd3c60a8 in Sql_cmd_call::execute (this=0x148a30013348, thd=0x148a30000d58) at /test/git-bisect/11.3_dbg/sql/sql_parse.cc:3252
#29 0x00005638cd3d2b19 in mysql_execute_command (thd=thd@entry=0x148a30000d58, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /test/git-bisect/11.3_dbg/sql/sql_parse.cc:5733
#30 0x00005638cd3d43a8 in mysql_parse (thd=thd@entry=0x148a30000d58, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x148aa40e1200) at /test/git-bisect/11.3_dbg/sql/sql_parse.cc:7760
#31 0x00005638cd3d653c in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x148a30000d58, packet=packet@entry=0x148a3000af69 "call p()", packet_length=packet_length@entry=8, blocking=blocking@entry=true) at /test/git-bisect/11.3_dbg/sql/sql_class.h:247
#32 0x00005638cd3d8417 in do_command (thd=0x148a30000d58, blocking=blocking@entry=true) at /test/git-bisect/11.3_dbg/sql/sql_parse.cc:1406
#33 0x00005638cd52f2ae in do_handle_one_connection (connect=<optimized out>, connect@entry=0x5638d0ed5028, put_in_cache=put_in_cache@entry=true) at /test/git-bisect/11.3_dbg/sql/sql_connect.cc:1445
#34 0x00005638cd52f50d in handle_one_connection (arg=0x5638d0ed5028) at /test/git-bisect/11.3_dbg/sql/sql_connect.cc:1347
#35 0x0000148ad6094b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#36 0x0000148ad6126a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

elenst's MTR script does however immediately reproduce (using --repeat 1000, but result is near instantaneous) the corruption:

11.3.0 7ba9c7fb84b5f28e4736656b57d9508b70ca6369

2023-09-04  8:56:39 0 [ERROR] InnoDB: tried to purge non-delete-marked record in index `ix_binary_binaryhash` of table `test`.`binaries`: tuple: TUPLE (info_bits=0, 2 fields): {[16]0               (0x30000000000000000000000000000000),[8]        (0x0000000000000001)}, record: COMPACT RECORD(info_bits=0, 2 fields): {[16]0               (0x30000000000000000000000000000000),[8]        (0x0000000000000001)}
2023-09-04  8:56:39 0 [ERROR] InnoDB: Flagged corruption of `ix_binary_binaryhash` in table `test`.`binaries` in purge
2023-09-04  8:56:39 0 [ERROR] InnoDB: tried to purge non-delete-marked record in index `FK_collections` of table `test`.`binaries`: tuple: TUPLE (info_bits=0, 2 fields): {[4]    (0x00000001),[8]        (0x0000000000000001)}, record: COMPACT RECORD(info_bits=0, 2 fields): {[4]    (0x00000001),[8]        (0x0000000000000001)}
2023-09-04  8:56:39 0 [ERROR] InnoDB: Flagged corruption of `FK_collections` in table `test`.`binaries` in purge

And on the same 11.6 revision:

10.6.15 d794d3484b2cbd069d68dc8d017a6f18e9a3090e (Debug)

2023-09-04  9:00:19 0 [ERROR] InnoDB: tried to purge non-delete-marked record in index `ix_binary_binaryhash` of table `test`.`binaries`: tuple: TUPLE (info_bits=0, 2 fields): {[16]0               (0x30000000000000000000000000000000),[8]        (0x0000000000000001)}, record: COMPACT RECORD(info_bits=0, 2 fields): {[16]0               (0x30000000000000000000000000000000),[8]        (0x0000000000000001)}
2023-09-04  9:00:19 0 [ERROR] InnoDB: Flagged corruption of `ix_binary_binaryhash` in table `test`.`binaries` in purge
2023-09-04  9:00:19 0 [ERROR] InnoDB: tried to purge non-delete-marked record in index `FK_collections` of table `test`.`binaries`: tuple: TUPLE (info_bits=0, 2 fields): {[4]    (0x00000001),[8]        (0x0000000000000001)}, record: COMPACT RECORD(info_bits=0, 2 fields): {[4]    (0x00000001),[8]        (0x0000000000000001)}
2023-09-04  9:00:19 0 [ERROR] InnoDB: Flagged corruption of `FK_collections` in table `test`.`binaries` in purge

The optimized 10.6/11.3 builds of the same revisions are also affected in the same way.

A UBSAN+ASAN build showed the same corruption, but no additional UBSAN nor ASAN bugs.

Comment by Marko Mäkelä [ 2023-09-04 ]

baptistejamin and Roel, the script in https://bugs.mysql.com/bug.php?id=86485 defines a unique index on a virtual column. I think that we have multiple bugs open about indexed virtual columns, such as MDEV-30024, which has one thing in common with that test case: there is a column prefix index defined on a virtual column. On 2023-03-09 in this ticket, I noted that there are no indexed virtual columns involved in the reported case.

In this ticket, I am only interested in test cases that reproduce corruption without using ROW_FORMAT=COMPRESSED (due to bugs like MDEV-31574), indexed virtual columns of the normal or HASH variety (MDEV-371), or the change buffer (disabled by default in MDEV-27734, removed in MDEV-29694).

Comment by Marko Mäkelä [ 2023-09-04 ]

I updated MDEV-30024 with some remarks about the MySQL 8.0.11 change that baptistejamin pointed to. I suspect that it works incorrectly with non-ASCII data.

Comment by Marko Mäkelä [ 2023-09-04 ]

The test case that elenst posted does reproduce the failure for me. It uses FOREIGN KEY and no virtual columns, just like the table that DariusIII and Fossil are using. The test reproduced errors from CHECK TABLE on the first try I ran it. I’m now trying to reproduce it under rr record, which is a bit harder but will allow me to debug a full trace of events from the start to the corruption. Without https://rr-project.org this kind of bugs would be really tricky to analyze (even when a reasonably repeatable test case exists).

Comment by Marko Mäkelä [ 2023-09-04 ]

I assume that the server restart in the test was for making it more likely that the purge of transaction history will complete. The following does that more efficiently:

--source include/have_innodb.inc
 
SET @freq=@@GLOBAL.innodb_purge_rseg_truncate_frequency;
SET GLOBAL innodb_purge_rseg_truncate_frequency=1;
 
CREATE TABLE collections (
  id int(11) unsigned NOT NULL AUTO_INCREMENT,
  collectionhash varchar(255) NOT NULL DEFAULT '0',
  PRIMARY KEY (id),
  UNIQUE KEY ix_collection_collectionhash (collectionhash)
) ENGINE=InnoDB;
CREATE TABLE binaries (
  id bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  collections_id int(11) unsigned NOT NULL DEFAULT 0,
  binaryhash binary(16) NOT NULL DEFAULT '0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0',
  PRIMARY KEY (id),
  UNIQUE KEY ix_binary_binaryhash (binaryhash),
  CONSTRAINT FK_collections FOREIGN KEY (collections_id) REFERENCES collections (id) ON DELETE CASCADE
) ENGINE=InnoDB;
 
INSERT INTO collections (id) VALUES (NULL);
 
--connect (con1,localhost,root,,)
INSERT INTO binaries (id,collections_id) VALUES (NULL,1);
--send
  REPLACE INTO collections (id) VALUES (NULL);
 
--connection default
--error 0,ER_LOCK_DEADLOCK,ER_NO_REFERENCED_ROW_2
REPLACE INTO binaries (id) VALUES (NULL);
 
SET GLOBAL innodb_max_purge_lag_wait=0;
CHECK TABLE binaries, collections EXTENDED;
SET GLOBAL innodb_purge_rseg_truncate_frequency=@freq;
 
# Cleanup
DROP TABLE binaries, collections;

With this, I got an rr replay trace much sooner:

10.6 2325f8f33971c29503639126c69e549aa53b5585

2023-09-04  9:19:09 0 [ERROR] InnoDB: tried to purge non-delete-marked record in index `ix_binary_binaryhash` of table `test`.`binaries`: tuple: TUPLE (info_bits=0, 2 fields): {[16]0               (0x30000000000000000000000000000000),[8]        (0x0000000000000001)}, record: COMPACT RECORD(info_bits=0, 2 fields): {[16]0               (0x30000000000000000000000000000000),[8]        (0x0000000000000001)}
2023-09-04  9:19:09 0 [ERROR] InnoDB: Flagged corruption of `ix_binary_binaryhash` in table `test`.`binaries` in purge
2023-09-04  9:19:09 0 [ERROR] InnoDB: tried to purge non-delete-marked record in index `FK_collections` of table `test`.`binaries`: tuple: TUPLE (info_bits=0, 2 fields): {[4]    (0x00000001),[8]        (0x0000000000000001)}, record: COMPACT RECORD(info_bits=0, 2 fields): {[4]    (0x00000001),[8]        (0x0000000000000001)}
2023-09-04  9:19:09 0 [ERROR] InnoDB: Flagged corruption of `FK_collections` in table `test`.`binaries` in purge
^ Found warnings in /dev/shm/10.6/mysql-test/var/18/log/mysqld.1.err

At the time the error was reported by purge, this secondary index consisted of a single page that only contained the record that is mentioned in the above output.

Comment by Marko Mäkelä [ 2023-09-04 ]

The clustered index of the table binaries contains only a single delete-marked record, which had been last modified during the execution of REPLACE INTO collections (id) VALUES (NULL):

10.6 2325f8f33971c29503639126c69e549aa53b5585

#1  0x00005574e4fd0459 in btr_rec_set_deleted<true> (block=..., rec=..., mtr=...)
    at /mariadb/10.6/storage/innobase/btr/btr0cur.cc:4352
#2  0x00005574e4fc3269 in btr_cur_del_mark_set_clust_rec (block=..., rec=..., index=..., offsets=..., 
    thr=..., entry=..., mtr=...) at /mariadb/10.6/storage/innobase/btr/btr0cur.cc:4417
#3  0x00005574e515cdca in row_upd_del_mark_clust_rec (node=..., index=..., offsets=..., thr=..., 
    referenced=..., foreign=..., mtr=...) at /mariadb/10.6/storage/innobase/row/row0upd.cc:2494
#4  0x00005574e515e6ef in row_upd_clust_step (node=..., thr=...)
    at /mariadb/10.6/storage/innobase/row/row0upd.cc:2636
#5  0x00005574e515ea98 in row_upd (node=..., thr=...)
    at /mariadb/10.6/storage/innobase/row/row0upd.cc:2758
#6  0x00005574e515ed88 in row_upd_step (thr=...) at /mariadb/10.6/storage/innobase/row/row0upd.cc:2900
#7  0x00005574e511ffd1 in row_update_cascade_for_mysql (thr=..., node=..., table=...)
    at /mariadb/10.6/storage/innobase/row/row0mysql.cc:2015
#8  0x00005574e5106bd8 in row_ins_foreign_check_on_constraint (thr=..., foreign=..., pcur=..., 
    entry=..., mtr=...) at /mariadb/10.6/storage/innobase/row/row0ins.cc:1351
#9  0x00005574e5107752 in row_ins_check_foreign_constraint (check_ref=..., foreign=..., table=..., 
    entry=..., thr=...) at /mariadb/10.6/storage/innobase/row/row0ins.cc:1744
#10 0x00005574e515cb1a in row_upd_check_references_constraints (node=..., pcur=..., table=..., 
    index=..., offsets=..., thr=..., mtr=...) at /mariadb/10.6/storage/innobase/row/row0upd.cc:252
#11 0x00005574e515cf2f in row_upd_del_mark_clust_rec (node=..., index=..., offsets=..., thr=..., 
    referenced=..., foreign=..., mtr=...) at /mariadb/10.6/storage/innobase/row/row0upd.cc:2502
#12 0x00005574e515e6ef in row_upd_clust_step (node=..., thr=...)
    at /mariadb/10.6/storage/innobase/row/row0upd.cc:2636
#13 0x00005574e515ea98 in row_upd (node=..., thr=...)
    at /mariadb/10.6/storage/innobase/row/row0upd.cc:2758
#14 0x00005574e515ed88 in row_upd_step (thr=...) at /mariadb/10.6/storage/innobase/row/row0upd.cc:2900
#15 0x00005574e511ebe7 in row_update_for_mysql (prebuilt=...)
    at /mariadb/10.6/storage/innobase/row/row0mysql.cc:1688
#16 0x00005574e4f2a66a in ha_innobase::delete_row (this=..., record=...)
    at /mariadb/10.6/storage/innobase/handler/ha_innodb.cc:8809

At that time, the table collections contained a single delete-marked record that had been delete-marked by the same transaction 0x1d. That row had been inserted by transaction 0x18, before the binaries record had been inserted (by transaction 0x1a).

Currently it looks like the error might be that row_upd_step() in the above stack trace failed to process all indexes. If I understood it correctly, the REPLACE here is being executed as an INSERT (which will be rolled back) followed by a DELETE and possibly one more INSERT, which I suppose would be rolled back due to a duplicate key error. The corruption is flagged because the secondary index contains a record that is expected to not exist or to be delete-marked, so that the secondary index would be an appropriate subset of the clustered index.

Comment by Marko Mäkelä [ 2023-09-04 ]

The relevant part seems to be that lock_wait() returns DB_LOCK_WAIT instead of returning DB_SUCCESS, to row_update_cascade_for_mysql():

		switch (trx->error_state) {
		case DB_LOCK_WAIT:
			if (lock_wait(thr) == DB_SUCCESS) {
				continue;
			}
 
			/* fall through */
		default:
			/* Other errors are handled for the parent node. */
			thr->fk_cascade_depth = 0;
			return trx->error_state;

The DB_LOCK_WAIT is supposed to be a transient error which is supposed to be replaced with DB_LOCK_WAIT_TIMEOUT or DB_DEADLOCK, in this very function lock_wait().

The value DB_LOCK_WAIT is returned from trx_t::error_state. The MDEV-29869 fix (this MariaDB 10.6.11 change that elenst noted as increasing the probability of failure) could in fact cause this, because it removed some assignments that would reset trx_t::error_state to DB_SUCCESS. The function lock_wait() must ensure that trx_t::error_state never remains at DB_LOCK_WAIT.

Comment by Baptiste Jamin [ 2023-09-04 ]

Using latin1 makes no problem:

```
DROP TABLE t1;

CREATE TABLE t1 (col1 INT,
col2 VARCHAR(100),
col3 VARCHAR(80) GENERATED ALWAYS AS (SUBSTR(col2,1,70)),
PRIMARY KEY (col1), UNIQUE KEY uidx(col3(10))) ENGINE=InnoDB DEFAULT CHARSET=latin1;

INSERT INTO t1(col1, col2) VALUES(1, CONCAT(1, REPEAT('z',90)));
REPLACE INTO t1(col1, col2) SELECT col1, col2 FROM t1;
```

Issue with utf8 however:

```

DROP TABLE t1;

CREATE TABLE t1 (col1 INT,
col2 VARCHAR(100),
col3 VARCHAR(80) GENERATED ALWAYS AS (SUBSTR(col2,1,70)),
PRIMARY KEY (col1), UNIQUE KEY uidx(col3(10))) ENGINE=InnoDB DEFAULT CHARSET=latin1;

INSERT INTO t1(col1, col2) VALUES(1, CONCAT(1, REPEAT('z',90)));
REPLACE INTO t1(col1, col2) SELECT col1, col2 FROM t1;
```

Comment by Marko Mäkelä [ 2023-09-04 ]

baptistejamin, your findings are relevant (and had already been made) in MDEV-30024, but not here.

The bug in this ticket involves FOREIGN KEY constraints and no GENERATED or VIRTUAL columns.

Comment by Marko Mäkelä [ 2023-09-06 ]

vlad.lesin, please review my suggested fix (on top of MDEV-32096).

Comment by Vladislav Lesin [ 2023-09-06 ]

I agree with the fix, provided by Marko. As Marko fixed the issue, there is nothing to do from my side, I am returning this task to Marko.

Comment by Matthias Leich [ 2023-09-07 ]

origin/bb-10.6-MDEV-30531 40061be1bdfb4c13afe39dae151764e4afd0a3df 2023-09-06T13:48:20+03:00
performed well in RQG testing. No not yet known bad effects.
Please be aware that I do not have some test which replays exact what is mentioned on top.
But there are on bb-10.6-MDEV-30531 and many other source trees many RQG tests using ddl and dml with foreign keys failing.
Hence I am unable to tell if MDEV-30531 is really fixed.

Comment by Marko Mäkelä [ 2023-09-11 ]

DariusIII, Fossil, and anyone else who was affected by this: At https://ci.mariadb.org/38245/ you should soon find pre-built packages for a 10.6 development snapshot that includes the fix. I would love to know if this fixes the trouble for you.

Comment by Marko Mäkelä [ 2023-09-20 ]

I double-checked the impact of the unexpected return code DB_LOCK_WAIT from lock_wait(). If this occurs in a FOREIGN KEY check in row_ins_check_foreign_constraint(), the error should be handled fine by row_mysql_handle_errors(), just like it is handled for any other locking conflict that arises during DML operations.

Theoretically, this bug could impact the use of the InnoDB internal SQL interpreter, but I think that any locking conflicts in such use (updating persistent statistics or FULLTEXT INDEX) should be prevented by coarser locks.

Corruption due to the invalid lock_wait() return value is only possible in row_update_cascade_for_mysql(), which handles ON (UPDATE|DELETE) (CASCADE|SET NULL) for FOREIGN KEY constraints.

Comment by Bento [ 2023-09-20 ]

Is there a repo/packages for Ubuntu 22.04 and MariaDB 10.10.x?

Comment by Daniel Black [ 2023-09-21 ]

"the fix" per marko's ref above hasn't been merged, but in appreciation for your bug report the following package will appear soon.

The below repository includes just the above fix (minor conflicts adjusted) on top of the 10.10 branch. There hasn't been a 10.6 ->10.10 merge for a while so other fixes may not be included.

These will show up as repo:

10.10

deb [trusted=yes] https://ci.mariadb.org/38612/amd64-ubuntu-2204-deb-autobake/debs ./

Other 10.10 deb/rpms available modifying above URL.

Testing appreciated.

Container images for testing:

  • quay.io/mariadb-foundation/mariadb-devel:10.10-mdev-30531-danielblack-pkgtest
  • quay.io/mariadb-foundation/mariadb-devel:10.6
Comment by Bento [ 2023-09-23 ]

I am now running quay.io/mariadb-foundation/mariadb-devel:10.10-mdev-30531-danielblack-pkgtest in Docker for just the affected tables. Will report back.

Comment by Bento [ 2023-09-28 ]

It's been nearly 5 days and no corruption yet while running in Docker.

Comment by Marko Mäkelä [ 2023-09-28 ]

That is great news, thank you. I think that we should wait for another week, to be more certain.

Comment by Bento [ 2023-10-04 ]

Still running peachy in Docker. Would be great if this fix could be pushed into a new release soon.

Comment by Bento [ 2023-10-06 ]

Any way of pushing a new release so I can ditch my split setup this week? I am now encountering other errors and need to put the tables back into one and the same DB.

Comment by Marko Mäkelä [ 2023-10-06 ]

Fossil, I am glad that this fix has worked for you. I agree that this bug (as well as some others, such as MDEV-32096) would be a strong reason to push out an unscheduled release, but I do not decide that. I see that the tentative date for the next quarterly releases is October 26. Let us hope that it will not slip much.

Comment by Bento [ 2023-10-06 ]

Roger. If I install the release from the repo Daniel provided and MariaDB pushes out a new release on the main repos, it will just pull the newer release from the main repo and overwrite the temp version from Daniels repo right?

Comment by Daniel Black [ 2023-10-16 ]

Sorry for the delay. Busy week.

The version in the temporary repo is the same as the version in the next release. It might need some encouragement to install the new version sudo apt-get --reinstall install.

Comment by Bento [ 2023-10-25 ]

Cheers. Any ETA on a new release containing this fix? October 26 is tomorrow.

Comment by Sergei Golubchik [ 2023-10-25 ]

It was delayed, because MySQL release was a week late (and users expect MariaDB have all security fixes from the corresponding MySQL release). Should be soon, but few days later than planned.

Comment by Bento [ 2023-11-12 ]

Any updates on the new version release?

Comment by Sergei Golubchik [ 2023-11-12 ]

Yes, it's built and is being uploaded to mirrors

Comment by Bento [ 2023-11-13 ]

Great, can't wait to upgrade my production machines. So it will be available today then?

Comment by Bento [ 2023-11-15 ]

Upgraded from 10.10.6 to 11.1.3 and my CPU usage is 100% now on all cores. Load went from 10-ish to 50+.

New optimizer? Any way to turn this off?

Comment by Bento [ 2023-11-15 ]

Okay so.... More tables had corrupted indexes but these were never reported. I had a few duplicate entries on columns with a unique index.

I did do OPTIMIZE and also ANALYZE TABLE but it did not report any errors. Is there a better way of scanning ALL tables with unique indexes and look for dupes?

Comment by Sergei Golubchik [ 2023-12-05 ]

As far as I understand, CHECK TABLE should detect dupes, you can run

mariadb-check -A

to run CHECK TABLE on all tables in all databases (see --help for more command line options, as usual)

Comment by Bento [ 2023-12-05 ]

It didn't. But I manually removed all the dupes now.

I am still having issues with this new optimizer picking weird plans all the time. I have some slow queries on a certain column so I add an index on it, then suddenly a ton of other queries start using that index instead of their "old" one and execution time goes from 50ms to 30 seconds.

Comment by Marko Mäkelä [ 2023-12-10 ]

The MDEV-24402 CHECK TABLE…EXTENDED (which can be invoked also via mariadb-check; see MDEV-30129) should hopefully find all inconsistencies.

When a table is being rebuilt by OPTIMIZE TABLE or ALTER TABLE…FORCE, secondary indexes will be ignored and not checked. I don’t think that ANALYZE TABLE would report any index corruption.

Generated at Thu Feb 08 10:16:56 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.