[MDEV-26537] InnoDB corrupts files due to incorrect st_blksize calculation Created: 2021-09-04 Updated: 2023-08-08 Resolved: 2021-09-10 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.2.40, 10.3.31, 10.4.21, 10.5.12, 10.6.4 |
| Fix Version/s: | 10.2.41, 10.3.32, 10.4.22, 10.5.13, 10.6.5 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | László Károlyi | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 2 |
| Labels: | None | ||
| Environment: |
FreeBSD 13.0-RELEASE-p4, running within a FreeBSD jail (amd64) |
||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Hey, I have a job that migrates an old database to in a different format (my custom, php based forum migrating to a django-based one). Recently, I've seen the mariadb server crashing randomly, leaving other databases and their tables corrupted. I can't just pinpoint out the exact issue, but I can provide the traceback and coredump, in hopes of you finding and fixing the bug ASAP. This posted crashlog is from executing mariadb-check -c --all-databases -S /tmp/mysql.sock, that according to its output, crashes on a certain CHECK TABLE command. The crash log from mysqld.err: To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.5.12-MariaDB Thread pointer: 0x81e206b98 Trying to get some variables. Connection ID (thread ID): 138 Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains information that should help you find out what is causing the crash. Since I couldn't upload the core file here (max filesize exceeded), please download the gzipped coredump from here: https://wetransfer.com/downloads/58bcaac03b364010532ebddd214c7dbd20210904093632/a91858186386ac9749129b6f2632db3520210904093659/8c9aa0 |
| Comments |
| Comment by László Károlyi [ 2021-09-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The situation is getting out of hand. My mariadb started crashing left and right, sometimes even with unrecoverable errors where I had to load a dump back for the failing database from the day before. I've recompiled the package on the production server to see if libraries are at fault (different version which should be unlikely), and can confirm that it is NOT the case. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-09-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It looks like you are executing CHECK TABLE by invoking the mariadb-check tool. For some table, CHECK TABLE is crashing because an InnoDB index page is corrupted. Yes, our error reporting could be better. A core dump would not help us much. You can try to determine the table name and the page number from the core dump. A copy of the page and the output of the SHOW CREATE TABLE statement could allow us to diagnose the bug. As far as I can tell, CHECK TABLE…QUICK would not perform the check that is failing; it is only counting the records in the leaf pages. You could still be able to recover all data by invoking mariadb-dump (mysqldump) or similar. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by László Károlyi [ 2021-09-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hey, thanks for your response. Since this is a production server, I reverted back to 10.5.11 now (compiled it for myself again), which seems to be running safely for now, after dumping back the crashed tables. I am simply unable to test on this server, it is that essential. As for which table it was, I can't tell you any good news. They were all kinds of tables from my nextcloud database. Sometimes it would start up again successfully with the broken table unavailable, other times (today when I mentioned it got more severe) it wouldn't. Nothing ever helped but dropping and dumping back the broken tables from the dump which I saved. Luckily I have a daily backup script that dumps all databases. I can safely say that this bug got introduced with 10.5.12. The crashes started happening after I upgraded to that version. This is all I can say for now, in hopes of having it fixed with 10.5.13. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-09-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I would find it very surprising if such corruption was introduced in 10.5.12. We conduct extensive internal testing on Linux, covering also crash recovery. There were several changes to improve the write performance in the 10.5.12 release. While we do not implement any asynchronous I/O interface for FreeBSD, our internal testing does cover the "simulated AIO" code that I think should be the only option outside Linux and Microsoft Windows. If there is a recent regression, it will not magically fix itself. To diagnose and fix this, I would need a somewhat repeatable test case. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by László Károlyi [ 2021-09-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Unfortunately, I can't risk crashing mariadb on my production server. The first crash happened on Aug 28th, and I don't have the error log for that. The recent crashes were on nextcloud's tables, but I've seen other InnoDB related crashes as well, sometimes when loading a dumpfile, sometimes when an ALTER TABLE happened. Hence, I can't provide now a reproducible example since I have many tables/databases. I'm already overwhelmed with work, but one idea I have is to install a virtual machine and try to load those dumps to see if I can reproduce the InnoDB crash. Don't hold your breath until then though. I really hope that someone else will be able to reproduce this bug and come up with a similar bug report. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by László Károlyi [ 2021-09-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
So while I was at it (and before I had no time for it anymore), I installed a VM real quick and installed a mariadbd into it. At first it loaded the passed dump nicely, but then it started the same crashes I had on my server. What causes it, unfortunately I can't tell you. But at least it became reproducible. All I can tell is that it happens from loading the aforementioned dump, and that's also not always, it seems. I can send you the compressed virtualbox VM image privately with instructions on how to reproduce the issue. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by László Károlyi [ 2021-09-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
A bit more info on this: my research shows that it seems to happen with a higher probability when I try and load two dumps at the same time, so that multiple clients are trying to use mariadb on different databases. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-09-07 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
If you can reproduce the corruption starting from a logical dump with 10.5.12 and not with 10.5.11, it could be related to the recent write performance improvements. I am happy to diagnose the problem. I would need an environment where I can compile MariaDB from source code and repeat the problem. A virtual machine image could be a good starting point. Given that reproducing the problem likely is sensitive to the relative performance of CPU cores and I/O, it might be easiest if you can provide ssh access to a suitable environment. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by László Károlyi [ 2021-09-07 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I have already prepared an OVA from the box I installed, I'll extend it with the possibility of compiling mariadb for yourself on it. You'll be able to import it in any VM handler as an appliance. I'll send you the wetransfer link and the instructions to your mariadb.com email address. Meanwhile for instant messages and faster communication, you can contact me on matrix under @karolyi:ksol.io if you like. Cheers! | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-09-08 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Motivated by this report, I went through some internal bug reports and diagnosed a fairly recently found race condition in | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by László Károlyi [ 2021-09-08 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hey, (I'm not sure to use german or english here so I'll continue with english) As per your last email, I tried setting innodb_flush_neighbors=ON but mariadb said /usr/local/libexec/mariadbd: unknown variable 'innodb_flush_neighbors=OFF. After changing innodb_change_buffering=none, the issue remains the same, unfortunately. Meanwhile I joined the zulipchat instance, where do I need to go there? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by László Károlyi [ 2021-09-08 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Looking up the option in the documentation from my comment before, the accepted option was innodb_flush_neighbors=0 but that didn't make the crash go away, unfortunately. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-09-08 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The comments in this system are public and preferred to be in English. Thank you for narrowing this down. So, this ought to be something else than | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Pedro de Oliveira [ 2021-09-08 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello, I just want to add some information. The "change" that causes this crash is common in all the latest MariaDB versions for the 10.3, 10.4 and 10.5 trees. https://mariadb.com/kb/en/mariadb-10512-release-notes/ In common in all the releases is:
So the problem was added in one of these three changes probably. My guess is the InnoDB one, but just a guess. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-09-08 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I can repeat a crash on my system in the provided VirtualBox image. But, I would need some additional help regarding the build options and debugging information. I am not seeing complete stack traces in gdb. All InnoDB function names seem to be replaced with ??. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Pedro de Oliveira [ 2021-09-08 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
If you are building MariaDB from the ports, probably this will help: https://wiki.freebsd.org/BenWoods/DebuggingPorts | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-09-08 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
falsovsky, thank you for the update. If the 10.3 and 10.4 series are affected, this could be completely unrelated to the write performance improvements in 10.5 and 10.6. I was not able to reproduce any crash with the same test case on Linux. I tried the latest 10.5 as well as 10.5.12, even with innodb_use_native_aio=0 to mimic FreeBSD (disabling the io_uring that was in the build). I think that this needs help from the FreeBSD community because we do not have that platform on our CI systems nor anyone who is developing on the platform. I am happy to help once this has been narrowed down a little further. I would suggest that you read the source code revision history (git log mariadb-10.3.30..mariadb-10.3.31 storage/innobase and similar). Maybe karolyi could test his dump on the 10.3 and 10.4 series as well? The earlier version reproduces the crash, the easier it should be to narrow down the cause. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Pedro de Oliveira [ 2021-09-08 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello Marko! I just did a git bisect between these two releases. https://github.com/MariaDB/server/compare/mariadb-10.4.20...mariadb-10.4.21
After building MariaDB a thousand times, I think I identified the evil commit: https://github.com/MariaDB/server/commit/58252fff15acfe7c7b0452a87e202e3f8e454e19 I created a patch ( freebsdfix.patch Do you want me to try to build mariadb-10.4.21 with debugging and give a somewhat useful stack trace? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by László Károlyi [ 2021-09-08 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hey Pedro, thanks for your great work. Just a question, are you allowed to disclose how you were able to crash the server? In my case it was loading a dump but if a more easy way exists (that is not loading a huge dump twice simultaneously), it might help in tracking down the cause as well. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Pedro de Oliveira [ 2021-09-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hey, The best way that I managed to crash it almost always, without having to wait too long was: I have a SQL dump (~100MB of my private projects, about 10 databases with a few tables). In one terminal I run mysql and 'source dump.sql;' on it, and while it is importing, on other three terminals I start running "mysqldump -A > /dev/null" at the same time, and keep repeating them. MariaDB almost always crashes before the dump is imported. Sorry if I don't have a "scientific way" of doing it. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Pedro de Oliveira [ 2021-09-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I managed to build MariaDB with debugging symbols and ran it inside gdb. Version: '10.4.21-MariaDB' socket: '/tmp/mysql.sock' port: 3306 MariaDB Server Thread 34 received signal SIGABRT, Aborted. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-09-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
falsovsky, great work! And very nice that you can repeat this with a much smaller data dump than the 788 MiB (which made the data directory expand to about 6.8 GiB when loaded twice in parallel).
I remember that posix_fallocate() has been the source of some surprises. On Linux, GNU libc would not only wrap the system call, but also implement a fallback that would write NUL bytes. That fallback is missing in musl libc, which is used in Alpine Linux (MDEV-18462) and in the GNU libc replacement in Docker on Microsoft Windows. I must say that I only tested falsovsky, would you or someone you know be able to contribute a proposed fix? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-09-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I tried to build MariaDB on Debian GNU/Linux after
(and creating a wrapper script for g++ similar to the musl-gcc) but it turned out that I would need a special libncurses. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-09-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
On the FreeBSD virtual machine image provided by karolyi, I tried the following with innodb_buffer_pool_size=5m and the release binary of MariaDB 10.5.12:
It took a while, but the server did crash. In the data directory, files for tables t1‥t5 and u1‥u5 had been created. The u5.ibd file was slightly larger than the files for the completed t1‥t4 and u1‥u4, which were all identical size. This would seem to suggest that the file size is wrongly shrinking at some point. InnoDB cannot shrink data files for tables. TRUNCATE and similar operations would create a new smaller file. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2021-09-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-09-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Part of the When using innodb_flush_method=O_DIRECT (on systems where it has been implemented), all file reads and writes must be multiples of the block size. That was being violated on the system where the SSD sector size was 4096 bytes and a data file was not a multiple of that, for some reason. The test innodb.log_data_file_size artificially constructs such a scenario by shrinking an .ibd file using some Perl code. Which block size would be reported on FreeBSD? falsovsky, could you try to debug the root cause of the failure? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2021-09-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
karolyi, falsovsky, can you try/test the build with fdatasync explicitly disabled - {{ cmake -DHAVE_FDATASYNC=0 -DHAVE_DECL_FDATASYNC=0 .}} and see if the fallback to fsync corrects the behaviour. The theory is fdatasync doesn't updated the size metadata returned by fstat somewhere in the BSD Kernel/libc. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Pedro de Oliveira [ 2021-09-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Daniel, Just built it with those two options, but sadly no change, still crashes. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-09-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
For the record, I tested the following on Linux with GNU libc:
This would cause us to always use the fallback code that invokes a pwrite() wrapper in a loop. If FreeBSD supports posix_fallocate(), the above patch would help in narrowing down the problem. If it fixes the corruption, the problem is in the posix_fallocate() code path, else it is in the fallback code. Also, does the following test pass on FreeBSD?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Pedro de Oliveira [ 2021-09-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Yo Marko, Just built it with that patch, it still crashes in my test case. TEST RESULT TIME (ms) or COMMENT worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019 Completed: All 5 tests were successful. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by László Károlyi [ 2021-09-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hey @Pedro, can you please share how you could compile mariadb so that the innodb storage would be compiled with said debug as well? When using the debug flags in /etc/make.conf, although mariadb was compiled with the debug flags, I only got the ?? chars when trying to print a backtrace on the core file from gdb. It seems to me (us, because we did it together with Marko), that the debug flags specifications don't reach down to the innodb storage level, when using the debug mode specified in the URL you shared. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-09-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
falsovsky, thank you. What if you revert the #if 0 and add #error after the original #ifdef HAVE_POSIX_FALLOCATE line? That is, is the posix_fallocate() code being used at all on FreeBSD? In the revert freebsdfix.patch The intention was that current_size would be rounded down to a multiple of statbuf.st_blksize, so that we will be writing multiples of that in order to enlarge the file to the desired size. (Some last bytes in the file that do not form a multiple of a block size will be overwritten with NUL bytes.) It looks like we are using pwrite() anywhere else than Microsoft Windows. So, I would think that the only cause of the corruption should be a funny value of statbuf.st_blksize. Would the following (untested) patch fix the corruption? And what would it write to the server error log?
(Sorry, I do not have a proper development environment in the FreeBSD VM image, so I have to rely on your help.) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Pedro de Oliveira [ 2021-09-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Lazlo, Building a DEBUG version of MariaDB doesn't work, because it gets ULTRA SLOW, and this bug is probably a race condition, and it doesn't crash while it is build with debug way. The only way I managed to get some info was to build it outside of the ports tree. Manually on my home. export CFLAGS="-ggdb -g3" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Pedro de Oliveira [ 2021-09-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello Marko, That patch seems to fix the issue, at least I haven't been able to crash it yet. =) 2021-09-09 21:17:24 8 [Note] InnoDB: ./gitea/push_mirror.ibd:65536 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by László Károlyi [ 2021-09-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hey Pedro, thanks, just like I thought, a debug build of mariadb on FreeBSD from ports isn't really a debug build. Keep in mind that there are patches under `/usr/ports/databases/mariadb105-server/files/`, you might want to apply those to see if they don't break anything. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-09-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
falsovsky, thank you. Could it be that the st_blksize is reporting the current file size? https://www.unix.com/man-page/freebsd/2/stat/ says:
Okay, it does not promise it to be a power of two (your output includes 0x18000 and 0x1c000 which are not powers of 2), or that it would remain constant over the lifetime of the file handle. That was what I would have expected. The rounding down to a block size is only needed due to the constraints of O_DIRECT I/O. I see that O_DIRECT is documented in the FreeBSD 13 man 2 fcntl, so I suppose that we do support innodb_flush_method=O_DIRECT on FreeBSD. It is not the default setting before It looks like we could fix the corruption with a preprocessor directive that would avoid reading st_blksize on FreeBSD. But, that would basically revert the fix of When it comes to | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-09-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I am setting this to Blocker status, so that we should address this before the next releases. I could trivially disable the
But, I would rather like to have a suggestion how to read the actual block size on various BSDs. And I would need your help in that. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-09-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
One more idea: Could the st_blksize be reported differently if the file is in O_DIRECT mode? That is, would the output from my diagnostic patch be different if innodb_flush_method=O_DIRECT had been specified? Maybe someone can check the FreeBSD source code. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-09-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This appears to affect IBM AIX as well. A safe fix could be to assume that the file system block size is 4096 bytes, while in reality it could be a smaller power of 2. I think that a larger storage device block size is highly unlikely, because 4096 bytes is a very popular virtual memory page size. Only the smallest page size of the UltraSPARC would be larger (8192 bytes). We also use st_blksize in a few other places, most notably when dealing with page_compressed tables. It remains to be seen in which way those tables are broken on the affected operating systems. As far as I can tell, the only occasion where os_file_set_size() may currently be invoked on something else than an integer multiple of 4096 bytes is with ROW_FORMAT=COMPRESSED tables that use 1024 or 2048 bytes page size. The test innodb.check_ibd_filesize,4k is even covering that. We will have to make sure that when such files are being extended, we will not overwrite the last 1024, 2048, or 3072 bytes of actual payload with NUL bytes. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-09-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I conducted some more checks and pushed a tentative fix for testing as well as a port of the fix to 10.5. We will run some internal stress tests to make sure that there is no regression for ROW_FORMAT=COMPRESSED tables. It looks like INFORMATION_SCHEMA.INNODB_SYS_TABLESPACES.FS_BLOCK_SIZE is reporting the st_blksize. For page_compressed tables, ‘insane’ values of st_blksize>4096 should be mapped to 512 bytes already. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Pedro de Oliveira [ 2021-09-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello Marko, A friend of mine made me the following source to be able to test what you said on FreeBSD.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by László Károlyi [ 2021-09-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Just a hunch, but searching around I found getbsize. Not sure if it helps. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by László Károlyi [ 2021-09-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Also, one should consider that in the case of ZFS, the DB files can be in multiple zfs mounts with multiple blocksizes, possibly. A certain discovery has to be done in this case for the block sizes. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-09-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
falsovsky, yes, the test program delivers a result that I would have expected before this bug was filed. Do you have an idea why the call in InnoDB would return such values? Note that we got a similar corruption on IBM AIX, so it definitely is not FreeBSD specific. I think that it is easiest to extend the files with 4096-byte aligned writes when posix_fallocate() is not available, and to hope that such writes will be compatible with O_DIRECT. For the rare case that someone uses ROW_FORMAT=COMPRESSED tables with a smaller page size and the table is being extended to an odd number of pages, the writes of those pages may extend the otherwise preallocated file with 1, 2, or 3 kilobytes. In that way, we can avoid adding more operating system specific tweaks. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Matthias Leich [ 2021-09-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Pedro de Oliveira [ 2021-09-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Will this fix only be applied to 10.5 tree or will it be also applied on the other versions? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-09-11 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I have pushed the fix to the 10.2 branch and merged up to 10.5 so far. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2021-09-21 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It was fixed in FreeBSD Port collection on September 12. New, fixed, versions:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-11-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It turns out that the fix causes a debug assertion failure on crash recovery (
Both space.size and space.committed_size would be 64, while space.size_in_header would be 65 for the copy of the data directory. The space.size is based on the file size, and we’d never extend files in increments smaller than 4096 bytes. This patch has not gone through our stress tests yet. In any case, only cmake -DCMAKE_BUILD_TYPE=Debug should be affected by this. |