[MDEV-31443] assert [FATAL] InnoDB: Unable to find charset-collation for 254 after upgrade from 10.11.4 to 11.0 Created: 2023-06-09 Updated: 2023-09-08 Resolved: 2023-07-05 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 11.0.2 |
| Fix Version/s: | 11.0.3, 11.1.2, 11.2.1 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Erlandas | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | assertion, crash | ||
| Environment: |
Ubuntu 22.04 |
||
| Attachments: |
|
||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Description |
|
Once the I have upgraded from 10.11.4 version - MariaDB did not start. Crash dump is not showing symbols (dbgsym package is installed): Reading symbols from /usr/sbin/mariadbd... Error log is showing: 2023-06-09 9:23:25 0 [Note] InnoDB: Number of transaction pools: 1 To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help Server version: 11.0.2-MariaDB-1:11.0.2+maria~ubu2204 source revision: 0005f2f06c8e1aea4915887decad67885108a929 Thread pointer: 0x0 Kernel version: Linux version 5.15.0-73-generic (buildd@bos03-amd64-060) (gcc (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0, GNU ld (GNU Binutils for Ubuntu) 2.38 ) #80-Ubuntu SMP Mon May 15 15:18:26 UTC 2023 |
| Comments |
| Comment by Erlandas [ 2023-06-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I was unable to start MariaDB on multiple attempts and had to remove 11.0.2 version and install 10.11.4 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2023-06-13 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-06-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The error message mentions prtype=254. I think that we may be missing some adjustment for data that had been originally written with something older than MySQL 4.1.2, before proper support for character sets was implemented. Originally, the prtype (precise type) did not include any charset-collation code. I think that we need some adjustment like this code in dict_load_column_low():
To work around this problem, Erland22 could shut down 10.11 after executing the following:
That should complete the change buffer merge, and then ibuf_upgrade() in 11.0 would have nothing to do. But, that would also destroy any evidence. We’d need a copy of an old enough data directory in order to reproduce and fix this bug. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Joshua Lückers [ 2023-06-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I'm experiencing the same issue on my work machine. I would love to provide a data directory but I don't feel comfortable sharing all the databases I have (I have a lot of them locally). Is there a way to track down what database/data directory is causing this issue? P.s: I downgraded to an older version so everything works again. Is there e.g a query I can use to find the culprit without needed to "break" anything again? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-06-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
JoshuaL, in a debug build of the server you can set a startup option innodb_change_buffer_dump=ON to have the contents of the change buffer dumped to the server error log. That should allow some tablespaces with the problematic metadata to be identified. I do not know if there are ready-made packages for debug builds. (The -debuginfo or -dbgsym packages typically contain debugging information for release builds.) I have asked our testers if it would be possible to load some data into MySQL 4.0 so that I can reproduce the failure and test a fix. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Christian Hesse [ 2023-06-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I am having the same issue, with MariaDB packages from Arch Linux testing repository, mariadb 11.0.2-1. For me it is another prtype, though: 2023-06-25 8:02:07 0 [ERROR] [FATAL] InnoDB: Unable to find charset-collation for 271 The data originates from a very old MySQL-Installation, but I did dump in MySQL 5.1.73 and imported to MariaDB 10.1.36 in 2018. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Erlandas [ 2023-06-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thank you Marko, I would prefer to pull out the problem instead of destroying it. Here is my config (see the end of it where I have added the innodb_change_buffer_dump=ON Should MariaDB 10.11.4 restart now make a dump of the problem into log file? Please confirm and I'll make a restart. Just a note: _ innodb_fast_shutdown_ setting is being used since 10.6 version or so, before that (and before switching to MariaDB 10.1 from MySQL 5.6 I believe - such switch was not being used).
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-06-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Erland22, before you restart with innodb_change_buffer_dump=ON, please make sure that innodb_fast_shutdown=1 (not 0) is in effect. If the server fails to start up due to an unknown option innodb_change_buffer_dump, then it is not a debug version. eworm, if you loaded an SQL dump into a freshly created MariaDB 10.1.36 database, that would simplify our testing. Do you happen to have a copy of the CREATE TABLE statements that you could share? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Christian Hesse [ 2023-06-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This is a large database with lots of dbs and about 4000 tables... Will have to identify the effected table first. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-06-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The .frm files should embed a MySQL or MariaDB version number of the server where the table was last created or rebuilt (not counting rebuild by TRUNCATE TABLE):
The version number is encoded like 100908 for 10.9.8. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Christian Hesse [ 2023-06-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I had one table from 10.1.36 and one from 10.4.8... Rebuilt both with ALTER TABLE ... FORCE; - will check later if the situation changed. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Christian Hesse [ 2023-06-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
No change, still fails with the same message. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-06-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It is possible that the change buffer contains entries for tables that have been dropped a long time ago but were not removed from the change buffer. The change buffer makes use of lazy deletion. I hope that someone can dump their change buffer. Here you might find a usable 10.6 debug build: https://ci.mariadb.org/36101/amd64-ubuntu-2004-debug/ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Erlandas [ 2023-06-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I have the following:
How do I check if I have debug version? I have installed Debug symbols package, but they are not showing up (as if that package is not installed). Or what could I install on Ubuntu 22.04 in order to get change buffer pulled? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Erlandas [ 2023-06-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I have CREATE TABLES statements pulled how or where I could safely (not publicly) share it with you? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Christian Hesse [ 2023-06-26 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Running Arch here... Also have wanted to go with the slow shutdown here... But looks like it hangs, repeatedly (for an hour now) giving: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-06-27 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
eworm, I use cmake -DCMAKE_BUILD_TYPE=Debug (and a few other options). Depending on the compiler and options, the GCC flag -Werror that this would imply could cause the build to fail. Does the build fail also when you use cmake -DMYSQL_MAINTAINER_MODE=WARN to disable the -Werror. Yes, there should be progress messages output every 15 seconds if the upgrade of the change buffer is taking a long time, and the number of change buffer pages to process should steadily decrease. Can you please attach a debugger to the hung server process and produce full stack traces of the hang?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Christian Hesse [ 2023-06-27 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Ok, here we go... Built mariadb 10.9.7 with cmake -DCMAKE_BUILD_TYPE=Debug -DMYSQL_MAINTAINER_MODE=WARN and dumped the change buffer. Please see uploaded file docker-mariadb@debug.service.log.gz... Regarding the hung server process on change buffer apply... Should I open a new issue for that? I guess it is not strictly related. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-06-27 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
eworm, thank you. I think that also the hang is related to the upgrade logic that was implemented in The type information will be encoded in the 4th field of each record, like this:
Because the length (16) is not divisible by 6, this is not in the innodb_change_buffering=inserts format, and the first 4 bytes are a 16-bit counter and additional metadata. (16-4=12)/6=2, meaning that there is metadata for 2 columns:
These 6 bytes are decoded in the function dtype_new_read_for_order_and_null_size(). The character-set/collation code would be the 15 least significant bits of each field (the last 4 hexadecimal digits masked with 0x7fff). The most significant bit of those 16 bits is the NOT NULL flag. Here we have charset-collation 0 for both fields, and the second field is NOT NULL. For other than CHAR, VARCHAR, TEXT columns, the charset-collation code should be ignored. The prtype without the flags DATA_BINARY_TYPE (1024) and DATA_NOT_NULL (256) or the charset-collation code is in the second byte. Here it would be 3 for both fields. If you could produce a full stack trace for the FATAL error, that would point me directly to the relevant change buffer record. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Christian Hesse [ 2023-06-27 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Uploaded gdb output... Hope that helps. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-06-29 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
eworm, thank you, gdb_output.log
The first command should switch to the stack frame of ibuf_insert_to_index_page(). The 32 most significant bits of the page identifier are the tablespace identifier, and the least significant 32 bits are the page number. The last command should identify the file name of the table. I would like to see the table definition. You can obfuscate the name of the table and the columns to protect any confidential data. What I am mostly interested in are the PRIMARY KEY columns and any indexed columns. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Christian Hesse [ 2023-06-29 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
And here we go... Out put is:
I've uploaded the table structure in t_table.sql, where all names were replaced. The table was created with MariaDB 10.5.11, checked with the method described above. Probably worth noting: The table has columns that contain German Umlaut characters ('ä', 'ö', 'ü'). Can that cause the issues we see here? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-06-29 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thank you. Somehow I got the block wrong, but it does not matter. The index here points to a dummy object that has been constructed based on the contents of change buffer records, which I guess explains why the prefix_len is nonzero even though there are no column prefix indexes in your table. The second field is the PRIMARY KEY(column01), a 4-byte integer. The first field should be something with 7 bytes, but I can't figure out what it would be. Can you also dump the following:
If you are starting a new process and not debugging a previous core dump, the addresses can change. In that case:
The encoding of dictionary object names was changed to UTF-8 basic multilingual plane (utf8mb3) back in MySQL 5.1. That should work just fine. Sometimes users forget to declare that the connection character set encoding is UTF-8 instead of the default latin1, while passing UTF-8 data. In that case, the data dictionary would see some "double encoded" names, as in the following (assuming that you are using UTF-8 in the shell):
Each of these 6 non-ASCII characters would then be encoded in the filename-safe encoding (@ followed by 2 non-hex chars, or 4 hexadecimal digits, that is, 3 or 5 ASCII chars per source character). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Christian Hesse [ 2023-06-29 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
No double encoding I have characters encoded in UTF-8 as expected. That should be fine then. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-06-29 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thank you. I think that this should be the minimal table definition to reproduce this:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-06-29 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Some sanity checking: mtype=12=DATA_VARMYSQL and prtype=271=256+15=DATA_NOT_NULL+DATA_MYSQL_TRUE_VARCHAR do match column06, which is VARCHAR in the post-MySQL-5.0.3 .frm file format and NOT NULL. The len=60 in bytes matches VARCHAR(20) with up to 3 bytes per character. For some reason, the charset-collation part of prtype had not been set. Perhaps in some older minor versions of MariaDB there was some bug related to that. I would expect our stress test for the upgrade to have generated that kind of data. eworm, do you happen to have complete server error logs so that you could list all MariaDB versions that have ever touched this data directory? It could help the search. Also, in case this should depend on the data, can you try to determine the page number? Here is an alternative way of extracting it as well as the change buffer record:
That would help me to find the related the change buffer record. In [^docker-mariadb@debug.service.log.gz] there are many records for many pages of this tablespace 0x7f1. Maybe they all share the same metadata, but I did not try to check that yet. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Christian Hesse [ 2023-06-29 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Upgrade path is:
This is not from error log though, but from package manager log and git repository where I store the database structure. All packages are from official Arch Linux repositories, unless we could not update due to temporary incompatibilities with our applications. I think we had some bugfix releases in production where Arch Linux was ahead with next feature release. BTW, this is a Galera cluster... Does not matter, no? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Christian Hesse [ 2023-06-29 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Some of your gdb commands fail...
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Christian Hesse [ 2023-06-29 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Oh, some extra questions... Would an ALTER TABLES t_table FORCE; fix this? And any idea if this can cause the hang on slow shutdown? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-06-30 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
eworm, it is very strange that you are seeing a page number 0 also in the page frame and not only in the block descriptor. The smallest page number in an .ibd that can contain a secondary index page is 4. I think that we may have to debug this deeper, so please preserve a copy of this corrupted data for further analysis. What was the last ALTER TABLE statement? Did it rebuild the table? Which server version number do you see in the .frm file?
That version number would change on any ALTER TABLE, including some that do not affect InnoDB at all, say, changing the default value of a column. The GDB @ operator (for interpreting a pointer as an array) should be more than 20 years old. Would the following work?
or "manually"
Because the data lengths should be 7 and 4, I guess that the following should work:
Alternatively, if you have the data pointer address, you could do
Yes, rebuilding the table should fix this, as should a slow shutdown of the 10.x server before upgrading. Dropping and re-adding the secondary indexes may cause trouble if you are using an older server version where Thank you for mentioning Galera cluster. If you used snapshot transfer (SST), the data may have been corrupted by that at some time of the past. Before Basically, each time you perform Galera SST with other than wsrep_sst_method=mysqldump, you will copy any corruption from the donor node and potentially add more corruption. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-06-30 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
In [^docker-mariadb@debug.service.log.gz] there are several records that look like this:
Apart for page 0xff (255), there are some for pages 0x9f7e, 0xd2ea, 0x15175, and so on. All the secondary index keys that I paid attention to looked like K followed by 6 decimal digits, so I would assume that they are for the same index. I think that it would be good to follow the page access patterns, with some breakpoints. The change buffer merge applies to secondary index leaf pages, and it never causes page splits or merges. I would like to have copies of the data pages that are being accessed right before the crash. I suppose it will crash in the same way each time. First, I would suggest the following commands to find out which pages were accessed right before the crash:
The page_id.m_id that GDB should display as a function parameter should be 0x7f1<<32 + page number, that is, 8731668512768 + page number. In some builds, the function parameters may be garbage until after the function preamble has been executed. Should that be the case, you would have to set the breakpoint on a specific source code line inside the function, say, buf0buf.cc:2649 in mariadb-11.0.2. For sanity checking, I would like to see the last 3 page numbers that were accessed before the crash. I would also like to see a page dump of the last page before there is any attempt to apply any changes to it. That you can send to me by email, to protect your data. A page dump can be obtained from the buffer pool like this, if a breakpoint in buf_page_get_gen has been hit:
You will have to replace $1 with whatever GDB reports as the return value of the function. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Christian Hesse [ 2023-07-03 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We switched the wsrep_sst_method from rsync to mariabackup half way the cycle with MariaDB 10.5.6. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Christian Hesse [ 2023-07-03 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The last ALTER TABLE to that table (at 10.5.11) change column25 from varchar(20) to varchar(50). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-07-03 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thank you. Extending a VARCHAR ( The corruption could very well be blamed to wsrep_sst_method=rsync malfunctioning in the past. I would still like to make sure that the upgrade procedure tolerates some corruption like this. If all else fails, we could write a log message, saying that there is some corruption, and you should fix that by OPTIMIZE TABLE or ALTER TABLE…FORCE. This should always work, because the change buffer only covers secondary indexes, not the clustered index, where the data is stored. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Chris Page [ 2023-07-03 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I have run into the same issue and can confirm that running `SET GLOBAL innodb_fast_shutdown=0;` prior to stopping my MariaDB 10 instance allows me to successfully install MariaDB 11. However, can I confirm this is a viable workaround to the issue, i.e. it won't cause any future consistency issues? Thanks, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-07-03 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
chrispage1, yes, a prior slow shutdown should empty the change buffer. Your experience suggests that no corruption might not be involved at all, but just some incorrect logic in the upgrade procedure. Originally, I was considering to simply refuse upgrade if the change buffer is not empty. I thought that it would be more user-friendly to implement an upgrade procedure. I hope that I will soon get more details from someone, so that the problem can be fixed. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Chris Page [ 2023-07-03 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
What else do you need for testing? This test was on a snapshot and I have two other servers to upgrade. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-07-03 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
chrispage1, I would need an interactive debugging session with access to the data, so that I can check some data pages that were accessed in an attempted upgrade that leads to this failure. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Chris Page [ 2023-07-03 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I could provide you with SSH access over a screen session? It's the least I could do to support MariaDB. Unfortunately I won't be able to send my data off as there is some pretty sensitive stuff in there. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-07-04 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I had an interactive debugging session with eworm today. I did not see any obvious sign of data corruption in that. The fatal error would occur on the very first record that is being inserted into page 255:
That page only contains the key K100248 with various PRIMARY KEY column values. One of the inserts into the page should probably invoke page reorganization (to empty the PAGE_FREE list and to reduce the PAGE_GARBAGE to 0), but the code crashed already on the INSERT. Only the PAGE_MAX_TRX_ID had been updated from the change buffer leaf page to the secondary index leaf page. In other news, mleich informed me that he finally reproduced this bug. I should get access to his data shortly, so that I can debug and fix this more conveniently, having access to the full data. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Matthias Leich [ 2023-07-04 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-07-04 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
For some reason, the upgrade testing for The rewritten function dtype_new_read_for_order_and_null_size() was actually losing the charset-collation information, by shifting a 32-bit variable twice by 16 bits. In fact, my compiler (clang-16 -O2) was smart enough to optimize away the entire line that the patch below is changing:
There are also some other issues that mleich’s testing revealed. I pushed a tentative fix for more extensive internal testing. Ready-built packages of that (if anyone in the community want to test this) should be available in https://ci.mariadb.org/36376/ in some time, for some time. There appears to be something wrong with eworm’s data set, because a slow shutdown of MariaDB Server 10.9.7 would hang in change buffer merge. That we will debug separately. It would be interesting to know if this fix would allow that dataset to be upgraded to 11.0. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-07-04 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I reproduced the issue with data.tar.xz The packages for this follow-up should appear in https://ci.mariadb.org/36377/. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Christian Hesse [ 2023-07-04 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Tested MariaDB 11.0.2 with 1b198aa2ff9a7f4d2a580c7915798056fdbdbe66 and cbdd6179cf870bc13b12b90fb365d7aca46cc55a on top. It does no longer crash, but bail out with an error:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2023-07-05 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The container quay.io/mariadb-foundation/mariadb-devel:11.0-mdev-31443-pkgtest was also generated from the build and available for testing. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-07-05 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Today, I hope to figure out what is triggering the corruption report for eworm. I believe that it may be genuine corruption, because a slow shutdown of MariaDB 10.9.7 with a copy of that database would hang in change buffer merge, periodically spitting out "progress" messages from srv_shutdown_print(), with unchanged content. Possibly, I could come up with some logic that when innodb_force_recovery=1 is set, such corruption will be ignored and the upgrade would go through. The impact of that would be that some secondary indexes of some InnoDB tables could end up being corrupted. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Christian Hesse [ 2023-07-05 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Oh, by the way... Anybody interested in Arch Linux packages for testing? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-07-05 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The upgrade actually went through fine after applying one more fix:
This consistency check was too strict. In [^docker-mariadb@debug.service.log.gz] there are several buffered entries to a redundant secondary index, like this:
On the table, there is a PRIMARY KEY (actually a NOT NULL UNIQUE column) as well as a non-unique secondary index defined on the PRIMARY KEY column. We can apply these changes just fine. We do not know yet what caused the slow shutdown of MariaDB Server 10.9.7 to hang on this data. I suggest the following steps:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-07-05 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
There are some readily built packages in https://ci.mariadb.org/36411/ for the final fix. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Christian Hesse [ 2023-07-05 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Pushed to Arch Linux testing repositories as mariadb 11.0.2-2... | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Christian Hesse [ 2023-07-05 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
We had another debug session for MariaDB 10.9.7 slow shutdown hang. We could not find the exact issue, but solved it anyway: With the extracted tablespace IDs I could get the table names from information_schema.innodb_sys_tables, then run OPTIMIZE TABLE on all of these. Finally the slow shutdown succeeded. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Artem Russakovskii [ 2023-08-02 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Is the fix above going to also work for | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Artem Russakovskii [ 2023-08-14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
11.0.3 is out https://mariadb.com/kb/en/mariadb-11-0-3-release-notes/. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Artem Russakovskii [ 2023-08-30 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I can confirm upgrading to 11.0.3 fixed the issue for me. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Artem Russakovskii [ 2023-08-30 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Aaand a few minutes in, the server started crashing. Not sure if related to this bug or not, but here: |