[MDEV-30531] Corrupt index(es) on busy table when using FOREIGN KEY with CASCADE or SET NULL Created: 2023-02-01 Updated: 2024-01-18 Resolved: 2023-09-11 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.6.11, 10.10.2, 11.3, 11.1.2 |
| Fix Version/s: | 10.6.16, 10.10.7, 10.11.6, 11.0.4, 11.1.3, 11.2.2 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Bento | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 5 |
| Labels: | corruption, foreign-keys, regression, sporadic | ||
| Environment: |
Ubuntu 22.04.1 Filesystem: |
||
| Attachments: |
|
||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||
| Description |
|
For several months now I have been having issues with index corruption on one table. Not entirely sure when it started as it wasn't directly obvious after upgrading MariaDB for example. It worked fine for months on the same machine, OS and filesystem. I am running 10.10 now but it also happened on 10.6. I know one other person running the same type of software who also has this issue, they are not using ZFS but EXT4 so filesystem is probably not the issue. This table is constantly being written to and read from. A program analyzes/processes the data every few mins, and then deletes the rows it just processed when it's done. The only way to "fix" it is to truncate the table, but it takes anywhere from 1h to 3 days for the issue to come up again showing:
journald then shows the following output (different indexes) (Database & table names replaced for privacy)
It seems to start with tried to purge non-delete-marked record in index:
Also shows:
And it also shows thousands of these but I have no idea if that's related:
|
| Comments |
| Comment by Marko Mäkelä [ 2023-02-01 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
If the InnoDB change buffer has ever been enabled on that server (it was disabled by default in However, if tables become corrupted after TRUNCATE TABLE or OPTIMIZE TABLE, I do not think that we can blame the change buffer. A more likely explanation might be that starting with I found another open bug | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-02-01 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I just noticed innodb_doublewrite=0 in my.cnf.txt I have been hoping that some of my colleagues would update our knowledge base with information under which circumstances the doublewrite buffer can be safely disabled. One such setting is when using unbuffered I/O (innodb_flush_method=O_DIRECT) on ext4fs created with a bigalloc size that is innodb_page_size, provided that the Linux kernel is recent enough and the underlying storage device actually supports atomic writes of innodb_page_size. (I do not know how to determine that.) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-02-01 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
innodb_use_native_aio and innodb_doublewrite seem to be very common on MySQL/MariaDB on ZFS setups: https://www.percona.com/blog/mysql-zfs-performance-update/ and I've been running like this for about a year without problems. I do however see that they use innodb_flush_method=fsync instead of O_DIRECT. That doesn't explain why the other guy is having the same problem while running on plain old EXT4 though. I can try setting innodb_flush_method=fsync. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-02-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
So I changed innodb_flush_method to fsync, restarted MariaDB (this is a production machine) and it still failed after about 36 hours. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-02-13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Anyone? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-02-20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I do not see anything in that blog post that would convince me that writes up to innodb_page_size on ZFS are atomic. I have corrected similar claims of the same author in the past: https://www.percona.com/blog/2015/06/17/update-on-the-innodb-double-write-buffer-and-ext4-transactions/ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrew Hutchings [ 2023-02-20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Fossil, Yes, Percona are recommending a certain setup for MySQL performance, this is not a recommendation for reliability which appears to be important for your use-case. The first half of Marko's comment is quite important here. It looks like you may have created the table in question without full_crc32 enabled. You can check this by doing:
We highly recommend you enable innodb_doublewrite. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-02-20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Ok I will enable that. I think when I talked to the other person with this issue he was on EXT4 and not using either of those settings though. Will report back soon. I ran that query and all tables have full_crc32=1. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-02-23 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Okay so I waited until the table corrupted again, truncated the table and stopped MariaDB, edited my.cnf with innodb_doublewrite = 1 and started it again. Within 24h the corruption happened again so that's not the problem. Also the other person who is running with all the stock values and is using EXT4 has the exact same issue. Anyone who can dig a bit deeper? (Tagging a person with @ doesn't seem to do anything btw) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-02-24 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Fossil, thank you for reproducing the corruption with the doublewrite buffer enabled. If the mariadbd process was never killed in the middle of a page write, the doublewrite buffer is not needed. One would need to be rather unlucky to run into corruption because of torn page writes, but I wanted to be ensure that the corruption is not caused by something as trivial as unsafe use of innodb_doublewrite=0. Because reproducing corruption related to this may be hard, there could be a common misconception that it is safe to disable the doublewrite buffer when it is actually unsafe. Unfortunately, I have a feeling that this could be a bug in the file system or in the underlying storage. Are there any error messages reported by the kernel? Anything suspicious reported by smartctl (if a S.M.A.R.T. interface is available)? Do you know which Linux kernel version the other person on ext4fs is using? Was the server built with io_uring or with libaio support? File systems are not free from bugs. One devastating example https://lwn.net/Articles/864363/ was mentioned in the legendary blog post about page_compressed. Do you or the other person use page_compressed? I recently encountered some discussion on Linux file system testing. As far as I understand, because ZFS is not in the Linux kernel core, it might not receive that much https://github.com/kdave/xfstests testing. https://bugzilla.kernel.org/show_bug.cgi?id=212337 is an example that might be a problem with a part of the test framework, or related to a genuine problem. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Darko Krišan [ 2023-02-24 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I am that other user that has same issue as @Bento. Kernel version is 5.4.0-139-generic and filesystem is EXT4 on Ubuntu 20.04. Do you or the other person use page_compressed? No, no compression is used at all. And all tables are using full_crc32. Yesterday i have completely recreated that table from scratch, but corruption started like 2 minutes after i started filling it with data. It started for me , if i remember correctly, with version 10.6.5 and every version after that. Unfortunately as that is a production server i cannot rollback to older version and test it. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-02-24 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
DariusIII, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Darko Krišan [ 2023-02-24 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
@marko I have been using O_DIRECT for years, so that change makes no difference in this case. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-02-24 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
DariusIII, I see. When did you upgrade to a 5.4 kernel? Have you considered upgrading to (say) Ubuntu 22.04, which would have a newer kernel with hopefully fewer bugs in the block layer or the file system? I assume that your server is built with libaio, because while the 5.4 kernel supports io_uring, Ubuntu 20.04 does not include liburing; you’d have to compile that library yourself. We do most of our internal stress testing on /dev/shm and ext4, with focus on crash recovery. Data sets are typically smaller, but we have caught some Linux kernel bugs, such as | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-02-24 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
DariusIII, I see. When did you upgrade to a 5.4 kernel? Have you considered upgrading to (say) Ubuntu 22.04, which would have a newer kernel with hopefully fewer bugs in the block layer or the file system? I assume that your server is built with libaio, because while the 5.4 kernel supports io_uring, Ubuntu 20.04 does not include liburing; you’d have to compile that library yourself. We do most of our internal stress testing on /dev/shm and ext4, with focus on crash recovery. Data sets are typically smaller, but we have caught some Linux kernel bugs, such as | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-02-24 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I am using 22.04 and kernel 5.15 and have the same issue though. Anyway, I just used the Ubuntu 22.04 ISO to install this system a year ago. Did not mess with anything related to iouring etc.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Darko Krišan [ 2023-02-24 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I ran same command @bento posted and result is: CONFIG_IO_URING=y Oh, and about the kernel update, i am always using latest updates from ubuntu, nothing added by hand. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-02-24 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I have to add that the kernel config also shows AIO:
I don't know how to check what is actually used. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Michael Widenius [ 2023-03-05 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Bento or Darko, can your try to disable uring with If it helps, then there is probably still some issue in the Linux kernel related to uring or AIO. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-03-05 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Will try. I had innodb_use_native_aio = 0 set in my.cnf already though. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Michael Widenius [ 2023-03-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Bento, any chance you could a copy of corrupted table to the Jira issue or FTP it to | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-03-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Uploaded to FTP: binaries.ibd and .frm Did a `service mariadb stop` first | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-03-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Fossil, which errors are being reported for the file binaries.ibd that you uploaded? I see that it uses the innodb_checksum_algorithm=full_crc32 format, and there are no page checksum errors reported for it. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-03-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Tagging doesn't work: @Marko Anyway. The errors are the ones I have pasted in the original bug report all the way at the top.
The idex is always different, could also be multiple ones failing:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-03-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Fossil, in my very first comment I suggested that some secondary indexes could be corrupted due to previous use of the InnoDB change buffer. I do not think that this type of corruption can have anything to do with the low level page I/O. To fix the corruption, you could execute the following:
We have many open bugs related to indexed virtual columns ( Therefore, my main suspect for causing this corruption is the InnoDB change buffer. In my recent talk https://fosdem.org/2023/schedule/event/innodb_change_buffer/ I covered one possible scenario that can cause this type of corruption. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-03-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
@Marko (tagging still doesnt work) innodb_change_buffering was already disabled:
By the way. When I try to OPTIMIZE the corrupted table I get this error back. I see I haven't posted that before:
[EDIT] The index for those 2 columns should be unique so I don't know how it inserted them in the first place. But after insertion it starts to complain about duplicate values and gets marked as corrupted. [EDIT 2] Not sure how that would be possible and why it complains only about collections_id 1721887 and not ones inserted before those. [EDIT 3] The SHOW CREATE literally says:
So I created a new test table identical to this one, removed the foreign key constraints though. Then tried to insert duplicate rows and it did not accept it, so that works as intended. So then I truncated the original BINARIES table. The only way to get it to function again after corruption and tried to insert two identical rows again. Now it did not accept it, as it should. So somewhere something goes wrong and corrupts the table. Maybe this is normal that it accepts dupes after corruption because it can't check for UNIQUE keys anymore due to said corruption but I don't know for sure. Anyway, at least it's some more info lol. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-03-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Fossil, Jira uses a notation like
for user names. Was the InnoDB change buffering ever enabled in the past? If yes, then the bug that was fixed as part of I created a table that corresponds to binaries.frm and then executed the following:
Because the result was empty, there are no hidden virtual columns created due to In | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-03-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
marko this table was never restored using mysqldump etc. so unique_checks are not used. Change buffering could have been enabled by default in the past on an older MariaDB version, but I truncated the table many times now because it's the only way to temp. fix the corruption after it happens. Also did OPTIMIZE before, can only do that when it's not corrupted of course. Also I think DariusIII even dropped the entire table and recreated it by hand but that did not fix things either. I think the non uniqueness is caused by inserting things into the table AFTER it already became corrupted. So it might not be related at all. I run OPTIMIZE every once in a while on all my tables. It's only this binaries one that gets messed up. Same for DariusIII | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Darko Krišan [ 2023-03-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Bento Yes i have dropped the table and recreated it multiple times and got same issue over and over. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Darko Krišan [ 2023-03-11 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
marko This happened just now: I ran
I have truncated the table less than 12 hours ago. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-03-11 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
DariusIII, the only other cause of this that comes to my mind is that the transaction system metadata is corrupted. I can name two ways how it could happen. One would be that if the database was originally created before MySQL 5.1.48 and upgraded to MariaDB 10.3 or later before Another possible cause is a race condition that affects the purge of committed transaction history. While we only reproduced bad symptoms of I am guessing here, because no server error log contents has been made available. Do the problems persist after initializing the database and restoring the contents from a logical backup (SQL dump)? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Darko Krišan [ 2023-03-12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
marko This is a production system that uses mariadb from its start, couple of years ago. I believe truncating and recreating the table from scratch should have eliminated any commited transaction history issue. Database has multiple indexes, unique and non unique, has around 20 different tables, but issue is with this one and only specific table. Maybe this error from syslog could help?
I now found an ongoing issue with similar symptoms: https://jira.mariadb.org/browse/MDEV-20640, and also couple of another issues listed there with similar issues. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-03-12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Marko: What DariusIII said is true for my case too. I even mysqldump-ed and recreated the entire database in the beginning of December 2022 and upgraded from 10.6.x to 10.10.x. It looks like (90% sure) when the table becomes corrupted I can still run OPTIMIZE to "fix" the corruption if the corruption just happened. But after a few hours (like if it happens at night) the only thing I can do is TRUNCATE to get it going again.
on my install by the way. Is there anything else I can do/run/monitor to get some more useful output to resolve this problem? It's been plaguing me for 5 months now. Might be related: MDEV-26951 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-03-13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
DariusIII and Fossil, Currently, I do not have any idea what could cause such corruption. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Darko Krišan [ 2023-03-13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
marko I really have no idea what else to check, as this is really a weird issue, as table is same as it was for years. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-03-14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Would the indexes become corrupted when using MariaDB 10.6.12? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-03-14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I don't know about 10.60.12 specifically but I was on one of the later 10.6 versions in Dec 2022 and also had this issue. Thats why I tried upgrading to 10.10.x | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Michael Widenius [ 2023-03-14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Bento, please change to use innodb_doublewrite = 1 and see if this helps! A couple of questions: What MariaDB version are you using now? It may also help if you once in a while do a 'check table' on the problematic table. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-03-14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
alice All the files are already in the original bug report. And I have enabled doublewrite as per Marko's request a bit further up, did not solve the issue.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-03-14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Fossil, which MariaDB version were you using before the problems started? Can you reproduce this corruption with the latest MariaDB 10.5? Or when using a different storage engine than InnoDB? I am starting to suspect that the cause of this corruption could be something outside InnoDB. There were some changes to the memory management in the SQL layer in 10.6. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-03-14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
marko Hard to say, the issue did not immediately appear after updating so it was some 10.6.x version. I think dariusiii tried MyRocks but I believe that also caused corruption for him. Not sure though. Since it happens on a production machine, reverting to 10.5 is a PITA but I will see if I can do that in the next 2 weeks or so. Currently not at home/office. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Darko Krišan [ 2023-03-14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
bento marko I tried MyRocks, but it had other issues, not related to this problem but to how MyRocks works and handles data. As i said in my previous comment:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-03-15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It would be great if one of you could revert to the latest 10.5 for test purposes. InnoDB uses the same file format in 10.5 and 10.6; a minor difference is that 10.6 implements crash-safe DDL operations. The biggest InnoDB changes between 10.5 and 10.6 are crash-safe DDL, some changes to locking (such as In the SQL layer, there were substantial changes, such as this one and a few parent commits. If the trouble is not reproducible with the latest 10.5, a next step might be to create a hybrid of 10.5 with InnoDB adapted from 10.6. If also that turns out fine, we should look at other changes between 10.5 and 10.6. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-03-15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
marko I think we're both running 10.10, are the data files backwards compatible with 10.5? My DB is fairly large so if I have to mysqldump it, then downgrade and then import it again it woud mean quite some downtime for production. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-03-16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Unfortunately, a downgrade from 10.8 or later to earlier versions than 10.8 is tricky. An optimistic attempt at removing ib_logfile0 after a clean shutdown and then starting 10.6 or older is bound to fail with error messages "LSN in the future" and all sorts of corruption due to that. Starting with We do not test or guarantee downgrades between major versions. I do not know if anything else would be incompatible between 10.5 and 10.10. For InnoDB, this is the only change. I think that to get meaningful results, it would be best to start from the scratch from a logical dump. Only in that way we can be sure that there was no hidden corruption somewhere in the InnoDB system tablespace or undo logs. It is much easier to reason about the failure if we know that only one MariaDB server version ever touched the files, and no physical backup was restored or no crash recovery took place. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-03-19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I'm going to run a 10.5 Docker container on the production machine and change the underlying code so the table affected and any tables it has FK's too etc. run only on that DB while the rest of the application keeps querying the original DB. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-03-28 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Fossil, can you give a status update? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-03-30 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
marko Working on it. Had to work on some other code before I can start testing this on 10.5. Will report back asap. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-04-18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
For the record, for some performance regression testing in | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-04-18 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
marko I am working on changing the code so it uses two database instances. Will be able to test soon. For now I have cronned an OPTIMIZE statement to run every hour which seems to keep the issue away for now. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-04-28 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I have started testing with MySQL 10.5 now in Docker. Running alongside 10.10.3 which runs directly on the host. Only the tables needed to test the corruption are in the 10.5 DB and I have edited the code to query these tables only on the 10.5 instance while the rest remains on the main DB. Will see how it goes, not sure what the performance will be with it running in Docker. [EDIT] Inserts seem a lot slower for now. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-04-28 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
A number of InnoDB performance improvements went into 10.6. You might also need to explicitly set innodb_flush_method=O_DIRECT; the default was changed in 10.6 by | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-04-28 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Doesn't help. Using fsync too in the native 10.10.3 install as per your advice. Anyway, will let it run like this for a while. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-05-05 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
marko Been running 10.5 for 6 days no without any sign of corruption. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-05-08 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Fossil, thank you for the update. Next, can you please try to update to the latest 10.6 and see what happens, with explicitly set innodb_flush_method=fsync? It could be useful to wait for the upcoming 10.6.13 release, so that you will avoid the | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-05-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Fossil, you wrote that you encountered this issue on MariaDB Server 10.6 in December 2022. MariaDB Server 10.6.11 had been released in November 2022. I found some changes as early as MariaDB Server 10.6.10 that could potentially explain this if your active working set is much larger than the InnoDB buffer pool. I am still investigating this. I would advise you to stay on 10.5 for now. Has that been stable for you? Another user can reproduce corruption with ROW_FORMAT=COMPRESSED tables, but I see that you are not using any form of compression. Using ROW_FORMAT=COMPRESSED could greatly improve the chances of hitting the corruption. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-05-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
marko I encountered this issue before December too, I only updated to 10.10 in December trying to get rid of the issue. I am still just running the table that causes corruption in 10.5, the rest still runs in the "old" 10.10 instance. My InnoDB Buffer pool setting is 96G. I have 35G of "raw" data in /var/lib/mysql. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-05-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I had a hypothesis for a possible cause of this bug, but I believe that it was incorrect. See the last comment in Currently, our hope is that someone can reproduce this corruption in a controlled fashion and share the scripts or give debug access to the environment. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-05-29 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Fossil or DariusIII, does MariaDB Server 10.5 appear to be stable for you? Would the corruptions emerge as soon as you upgrade to 10.6? Unfortunately, due to Currently, I am aware of one bug that could affect crash recovery and mariadb-backup; see Today I diagnosed the race condition | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-05-29 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
marko Yeah, 10.5 has been running the affected tables for some while now without any issues. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-06-02 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I was thinking that I had also thought if | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-06-05 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Are there any FOREIGN KEY constraints attached to the indexed columns? We have some open bugs in that area, such as MDEV-30869. Between 10.5 and 10.6, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Darko Krišan [ 2023-06-05 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
marko Yes, there is a foreign key in binaries table. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-06-07 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Fossil, does your corrupting table involve FOREIGN KEY constraints as well? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-06-07 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Yeah its the same table. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-06-07 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I think that this is effectively blocked by the FOREIGN KEY related bug fixes that nikitamalyavin has been working on. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Nikita Malyavin [ 2023-06-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
marko, MDEV-30869 is caused by a work-in-progress MDEV-30021 fix, which is not presented in any mainline branch. MDEV-30021 is itself debug-only at this moment. Returning to innodb. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-06-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Fossil or DariusIII, can you share the complete table definitions of all tables that are connected via FOREIGN KEY constraints to a table where an index will be corrupted? Table and column names can be replaced with something like t1,t2,t3,t4 and c1,c2,c3,c4, but they must remain unique and consistent. This would give us some ideas on how to reproduce the bug. Also, any hints of how these tables are being modified would be helpful. The indexed column whose index gets corrupted could also be modified due to an ON…CASCADE or ON…SET NULL constraint. For those cases, it would be useful to know also the SQL statements that can cause such constraints to be executed. One last question: Does the application involve updating any PRIMARY KEY columns, or deleting and inserting records with the same primary key? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-06-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Another user who is hitting this has a table similar to the following. I omitted any unrelated non-indexed columns of the child table and obfuscated the names. I do not know the structure of the parent table, but I assume that it does not matter much for ON DELETE CASCADE.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Darko Krišan [ 2023-06-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Here is the structure of three connected tables:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-06-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Oh, I see, this is ROW_FORMAT=COMPRESSED after all. I had mentioned that attribut on 2023-05-10, but got no response to that. Common themes with the other affected user would be ROW_FORMAT=COMPRESSED and ON DELETE CASCADE. It is possible that some refactoring in 10.6, such as It would help a lot if one of you could upgrade to 10.6.14 after converting the affected table binaries to ROW_FORMAT=DYNAMIC (the default). Would that be stable? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-06-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Not on mine. I am not using compressed, you can check the files I uploaded in this issue a few months ago. Every table is DYNAMIC. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2023-06-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
DariusIII Hi! Thank you for providing the DDLs for the tables. The definition for the parts table appears to have an error:
I'm curious, were these definitions the exact outputs from a SHOW CREATE TABLE command or similar, or could there have been an error during the copy/paste process? I ask this to ensure that the omitted ' isn't contributing to or associated with the reported corruption. While it seems unlikely, it is not impossible. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2023-06-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Tables as provided by DariusIII: generalized table/col names for testing:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Darko Krišan [ 2023-06-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
roel These are from SHOW CREATE TABLE, but i have removed COMMENT part of definition (as they are irrelevant), so it could be an error on my side when deleting that | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2023-06-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
DariusIII Ack, understood. Thank you for confirming. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2023-06-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I started a number of testruns with the given table defs. Will report back in case I see anything relevant. marko Please let me know if any startup variable or other changes would likely help reproducibility, thanks. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-06-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Here are mine, should be the same tables but not using COMPRESSED.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2023-06-13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
In my testrun, I ran into a new iteration of
Reduction not successful this far, but this bug may be related. Note the BINARY column. Other testcases are present in | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-07-03 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Fossil, thank you. We now know that this corruption can occur with both ROW_FORMAT=DYNAMIC and ROW_FORMAT=COMPRESSED. The assertion failure that Roel posted to Is there any script that would simulate the actual workload on these tables? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Timo Lottmann [ 2023-07-11 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I got a smiliar problem with MariaDB > 10.6. I created a simple Java project to simulate it. You need Docker, Java JDK 17 and Gradle. Just extract: mariadb-corruption-for-mariadb-ticket.tar.gz Start MariaDB via Docker. I added 10.11.4 with default settings.
Start gradle
After a few minutes an index gets corrupted:
You can use your own MariaDB instance, if you use port 3808 or just change the port number in DatabaseCorruption.java. Please let me know, if this is useful or if you need more information. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2023-07-15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
timolottmann, thank you very much. I was able to use this information to flow the testcase into a Bash script w/o any required dependencies. This was really helpful. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2023-07-15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This Bash testcase with further reduced SQL:
Leads to:
Note it can take anywhere from half a minute to 40 minutes or more to reproduce the issue, on a fast machine with medium load. The issue is very sporadic and system load, number of threads, number of insert's per thread, and other factors may affect issue reproducibility. Issue observed on 10.6.15 c2d44ecb904dd5558d116433b6cceb5592182447 (Optimized) and 11.1.2 3883eb63dc5e663558571c33d086c9fd3aa0cf8f (Optimized). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2023-07-15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
No specific server settings/variables are required. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2023-07-15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It is likely that at least two issues are being discussed in this ticket. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Thirunarayanan Balathandayuthapani [ 2023-07-19 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We do get DB_CORRUPTION in the above code path. Debugging more to find out the reason. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-07-24 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We have reproduced some corruption that should be due to code refactoring that first appeared in MariaDB Server 10.6.12. This ticket claims that corruption occurs in 10.6.11 already. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-07-24 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I see that 10.6.11 was released on Nov 7th 2022, I think by then I already had these issues making me thing it would have occurred in an earlier version than that like 10.6.10. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sebastian Stamm [ 2023-07-24 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The corruption we experienced in started with 10.6.12 and was solved by downgrading to .11. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-07-24 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
sstamm, your issue as well as what we have reproduced internally involves a clustered index, while this bug was about secondary indexes. I filed | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-07-25 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I just realized that the removal of the function buf_wait_for_read() in (For unique secondary indexes, there was also the bug | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-07-25 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Let me know if/when there's a Docker amd64 image I could test and I'll do it right away. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-07-25 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Fossil, great, I will. Our Docker maintainer may be busy this week. We might be able to include the | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2023-07-25 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
About a 1-2 hrs after a commit to a main branch the bb pushes the images show up as quay.io/mariadb-foundation/mariadb-devel:10.6 which when I wrote this did include the exact fix for this bug. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-07-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Wait the fix for this ticket is already in 10.6.x? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-07-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Fossil, I just pushed the fix of | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-07-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Cool. Is this fix also present in 10.10.x? Now running 10.10.5-MariaDB-1:10.10.5+maria~ubu2204 as my main instance and I am still running 10.5 in Docker for the crashing tables. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-07-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
So far, I have merged this to 10.9 and to 10.10 (https://ci.mariadb.org/37004/). The mandatory staging builds are taking time. For some reason, I did not find any build related to the push to the 10.9 branch. It could be related to some overload on the CI system (we are close to scheduled quarterly releases). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-07-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Okay. Since I am running 10.10.5 in production it would be cool if I could test that one or 10.11.x. That way I can just upgrade my main database and see how it normally runs with all the tables in one DB. That would be the best test I think. But if needed I can test any version in Docker with just the failing tables. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-07-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I have pushed the merge of the | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2023-07-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
To see what commits are in a pulled image:
note: I always use podman, but docker cli has the same syntax. Form a URL with the org.opencontainers.image.revision - https://github.com/MariaDB/server/commits/b1b47264d2ef1fa80b90f308e81c49d9a1011d56 marko's "the fix" link earlier is b102872ad50cce5959ad95369740766d14e9e48c MDEV-31767 InnoDB tables are being flagged as corrupted on an I/O bou… which is in the URL form. Thanks for volunteering to test Fossil. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-07-27 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Cool. Is there a Ubuntu .deb package too by any chance? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2023-07-27 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
there's even a repo, its what the containers use:
Adjust urls to other Ubuntu versions/architectures. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-07-28 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
danblack, thank you for making it easier to access the development snapshots. Fossil and others, I am eager to know if | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-08-25 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Fossil, DariusIII, did you have a chance to update a server to a version where | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-09-01 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
marko I have not tried it yet as I was out on vacation. But I spoke to DariusIII and he was running 11.0.3 and still got the same corruption. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-09-01 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
DariusIII, can you confirm this first hand? Had you rebuilt the tables (OPTIMIZE TABLE or ALTER TABLE…FORCE) that were claimed as corrupted? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Darko Krišan [ 2023-09-01 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
marko Yes, i have installed latest 11.0 version, switched to latest 11.1 afterwards and still had the same issue. As soon as tables started being used indexes were corrupted in couple of minutes. Optimizing the affected table fixes the issue for some time. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2023-09-02 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
For completeness, I rechecked using the script listed above on a fixed 11.3 branch (both dbg + opt) and no issues were observed. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2023-09-02 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
DariusIII Thank you for the feedback. Can you also confirm the table rebuild for all affected/used tables which Marko mentioned? Thank you | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Baptiste Jamin [ 2023-09-02 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We are having the exact same problem on : `mysqld Ver 10.11.5-MariaDB-1:10.11.5+maria~deb12-log for debian-linux-gnu on x86_64 (mariadb.org binary distribution)`. Table rebuild only temporarily fixes the problem. FS is EXT4, with fsync enabled instead of O_DIRECT and DMA disabled. [Edit] Could it be related to this commit?https://github.com/MariaDB/server/commit/0b47c126e31cddda1e94588799599e138400bcf8 What I don't understand here is for almost the same error, we mark the table as corrupted here: https://github.com/MariaDB/server/blob/7ba9c7fb84b5f28e4736656b57d9508b70ca6369/storage/innobase/row/row0purge.cc#L476 but not here | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2023-09-02 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The test case below is non-deterministic, run with --repeat=N. It usually fails for me within a few attempts on one machine and within 20-40 attempts on another. With this test, my bisect points at this commit in 10.6.11
It doesn't however correlate with information in the comments, when the problem was said to be encountered on 10.6.5 already. It is possible that the bisect result is inaccurate and the commit just increased the probability of the failure; or maybe there was yet another problem in versions before 10.6.11.
Intermediate more concurrent and more continuous versions of the test case also triggered the assertion failure:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Baptiste Jamin [ 2023-09-03 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This bug is the exact same as https://bugs.mysql.com/bug.php?id=86485 The test script makes the crash in an instant: ``` Sep 03 05:57:24 vultr mariadbd[8448]: 2023-09-03 5:57:24 0 [ERROR] InnoDB: Flagged corruption of `c` in table `test_corruption`.`t` in purge
I checked and the patch has never been backported into Maria : https://github.com/percona/percona-server/commit/ef4a3ee7329afe4dedd8aeee9274fb41fce8aafd Seems to be the solution | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2023-09-03 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I tried running the script in the last comment (i.e. https://bugs.mysql.com/bug.php?id=86485) against:
And observed no corruptions in any of these when running for about 30 minutes on each. However, when restarting 11.3 and trying the same again, it crashed immediately. If you do not get an immediate failure, CTRL+c the script, and re-run the same script in the same client session again, or reboot the instance and try again. This script also produces the index()->is_btree() assertion seen by Elena quite well at times:
elenst's MTR script does however immediately reproduce (using --repeat 1000, but result is near instantaneous) the corruption:
And on the same 11.6 revision:
The optimized 10.6/11.3 builds of the same revisions are also affected in the same way. A UBSAN+ASAN build showed the same corruption, but no additional UBSAN nor ASAN bugs. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-09-04 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
baptistejamin and Roel, the script in https://bugs.mysql.com/bug.php?id=86485 defines a unique index on a virtual column. I think that we have multiple bugs open about indexed virtual columns, such as In this ticket, I am only interested in test cases that reproduce corruption without using ROW_FORMAT=COMPRESSED (due to bugs like | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-09-04 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I updated | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-09-04 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The test case that elenst posted does reproduce the failure for me. It uses FOREIGN KEY and no virtual columns, just like the table that DariusIII and Fossil are using. The test reproduced errors from CHECK TABLE on the first try I ran it. I’m now trying to reproduce it under rr record, which is a bit harder but will allow me to debug a full trace of events from the start to the corruption. Without https://rr-project.org this kind of bugs would be really tricky to analyze (even when a reasonably repeatable test case exists). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-09-04 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I assume that the server restart in the test was for making it more likely that the purge of transaction history will complete. The following does that more efficiently:
With this, I got an rr replay trace much sooner:
At the time the error was reported by purge, this secondary index consisted of a single page that only contained the record that is mentioned in the above output. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-09-04 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The clustered index of the table binaries contains only a single delete-marked record, which had been last modified during the execution of REPLACE INTO collections (id) VALUES (NULL):
At that time, the table collections contained a single delete-marked record that had been delete-marked by the same transaction 0x1d. That row had been inserted by transaction 0x18, before the binaries record had been inserted (by transaction 0x1a). Currently it looks like the error might be that row_upd_step() in the above stack trace failed to process all indexes. If I understood it correctly, the REPLACE here is being executed as an INSERT (which will be rolled back) followed by a DELETE and possibly one more INSERT, which I suppose would be rolled back due to a duplicate key error. The corruption is flagged because the secondary index contains a record that is expected to not exist or to be delete-marked, so that the secondary index would be an appropriate subset of the clustered index. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-09-04 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The relevant part seems to be that lock_wait() returns DB_LOCK_WAIT instead of returning DB_SUCCESS, to row_update_cascade_for_mysql():
The DB_LOCK_WAIT is supposed to be a transient error which is supposed to be replaced with DB_LOCK_WAIT_TIMEOUT or DB_DEADLOCK, in this very function lock_wait(). The value DB_LOCK_WAIT is returned from trx_t::error_state. The | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Baptiste Jamin [ 2023-09-04 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Using latin1 makes no problem: ``` CREATE TABLE t1 (col1 INT, INSERT INTO t1(col1, col2) VALUES(1, CONCAT(1, REPEAT('z',90))); Issue with utf8 however: ``` DROP TABLE t1; CREATE TABLE t1 (col1 INT, INSERT INTO t1(col1, col2) VALUES(1, CONCAT(1, REPEAT('z',90))); | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-09-04 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
baptistejamin, your findings are relevant (and had already been made) in The bug in this ticket involves FOREIGN KEY constraints and no GENERATED or VIRTUAL columns. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-09-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
vlad.lesin, please review my suggested fix (on top of | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vladislav Lesin [ 2023-09-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I agree with the fix, provided by Marko. As Marko fixed the issue, there is nothing to do from my side, I am returning this task to Marko. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Matthias Leich [ 2023-09-07 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-09-11 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
DariusIII, Fossil, and anyone else who was affected by this: At https://ci.mariadb.org/38245/ you should soon find pre-built packages for a 10.6 development snapshot that includes the fix. I would love to know if this fixes the trouble for you. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-09-20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I double-checked the impact of the unexpected return code DB_LOCK_WAIT from lock_wait(). If this occurs in a FOREIGN KEY check in row_ins_check_foreign_constraint(), the error should be handled fine by row_mysql_handle_errors(), just like it is handled for any other locking conflict that arises during DML operations. Theoretically, this bug could impact the use of the InnoDB internal SQL interpreter, but I think that any locking conflicts in such use (updating persistent statistics or FULLTEXT INDEX) should be prevented by coarser locks. Corruption due to the invalid lock_wait() return value is only possible in row_update_cascade_for_mysql(), which handles ON (UPDATE|DELETE) (CASCADE|SET NULL) for FOREIGN KEY constraints. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-09-20 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Is there a repo/packages for Ubuntu 22.04 and MariaDB 10.10.x? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2023-09-21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
"the fix" per marko's ref above hasn't been merged, but in appreciation for your bug report the following package will appear soon. The below repository includes just the above fix (minor conflicts adjusted) on top of the 10.10 branch. There hasn't been a 10.6 ->10.10 merge for a while so other fixes may not be included. These will show up as repo:
Other 10.10 deb/rpms available modifying above URL. Testing appreciated. Container images for testing:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-09-23 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I am now running quay.io/mariadb-foundation/mariadb-devel:10.10-mdev-30531-danielblack-pkgtest in Docker for just the affected tables. Will report back. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-09-28 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It's been nearly 5 days and no corruption yet while running in Docker. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-09-28 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
That is great news, thank you. I think that we should wait for another week, to be more certain. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-10-04 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Still running peachy in Docker. Would be great if this fix could be pushed into a new release soon. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-10-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Any way of pushing a new release so I can ditch my split setup this week? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-10-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Fossil, I am glad that this fix has worked for you. I agree that this bug (as well as some others, such as | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-10-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Roger. If I install the release from the repo Daniel provided and MariaDB pushes out a new release on the main repos, it will just pull the newer release from the main repo and overwrite the temp version from Daniels repo right? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2023-10-16 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Sorry for the delay. Busy week. The version in the temporary repo is the same as the version in the next release. It might need some encouragement to install the new version sudo apt-get --reinstall install. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-10-25 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Cheers. Any ETA on a new release containing this fix? October 26 is tomorrow. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2023-10-25 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It was delayed, because MySQL release was a week late (and users expect MariaDB have all security fixes from the corresponding MySQL release). Should be soon, but few days later than planned. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-11-12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Any updates on the new version release? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2023-11-12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Yes, it's built and is being uploaded to mirrors | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-11-13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Great, can't wait to upgrade my production machines. So it will be available today then? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-11-15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Upgraded from 10.10.6 to 11.1.3 and my CPU usage is 100% now on all cores. Load went from 10-ish to 50+. New optimizer? Any way to turn this off? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-11-15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Okay so.... More tables had corrupted indexes but these were never reported. I had a few duplicate entries on columns with a unique index. I did do OPTIMIZE and also ANALYZE TABLE but it did not report any errors. Is there a better way of scanning ALL tables with unique indexes and look for dupes? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2023-12-05 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
As far as I understand, CHECK TABLE should detect dupes, you can run
to run CHECK TABLE on all tables in all databases (see --help for more command line options, as usual) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Bento [ 2023-12-05 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It didn't. But I manually removed all the dupes now. I am still having issues with this new optimizer picking weird plans all the time. I have some slow queries on a certain column so I add an index on it, then suddenly a ton of other queries start using that index instead of their "old" one and execution time goes from 50ms to 30 seconds. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-12-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The When a table is being rebuilt by OPTIMIZE TABLE or ALTER TABLE…FORCE, secondary indexes will be ignored and not checked. I don’t think that ANALYZE TABLE would report any index corruption. |