[MDEV-24578] MariaDB 10.5 fails to join Galera cluster of MariaDB 10.1, root cause SST failure, symptom is InnoDB: Assertion failure Created: 2021-01-12 Updated: 2021-12-23 Resolved: 2021-12-23 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Galera SST, Storage Engine - InnoDB |
| Affects Version/s: | 10.5 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Otto Kekäläinen | Assignee: | Jan Lindström (Inactive) |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Description |
|
(This was originally filed as https://jira.mariadb.org/browse/MDEV-24162, but mistakenly marked as duplicate of While trying to join an existing Galera cluster of two MariaDB 10.1.48 nodes with a fresh install of MariaDB 10.5.8, the maria-backup part runs fine but once all files have been transferred and the mariadbd tries to start it fails on:
Because the daemon is managed by systemd, it automatically tries to restart. The second start fails on a slightly different error:
Any further restarts repeat this latter error. Full journald output from both runs attached. I have also produced core dumps of both crashes and they are attached. In first process 63031 there is:
The second process has:
The failure happened originally on 10.5.8. I created new builds based on latest 10.5 HEAD with debug symbols etc to ensure that this bug is in fact not fixed in git yet, but exists and if 10.5.9 was released today, it would still fail like this. The attached README.txt includes notes on how I created the debug build with symbols and installed it etc procedure. |
| Comments |
| Comment by Marko Mäkelä [ 2021-01-13 ] | |||||||||||||||||||||||||||||||||||||
|
In the log, we see that the ‘nuclear option’ has been used to switch the server to ‘garbage in, garbage out’ mode:
The write-ahead log (redo log) is the glue that guarantees the atomicity and durability of operations in the buffer pool. For example, let us consider a single-statement UPDATE transaction, updating an indexed column. It would consist of the following mini-transactions that atomically change a bunch of pages:
(Starting with MariaDB 10.3, the first two mini-transactions should be a single mini-transaction.) The buffer pool is a cache of the data files. It is not a write-through cache. Typically, outstanding modifications could reside in the buffer pool for a long time. Only when a log checkpoint is performed, everything until the new checkpoint LSN must be written back to the data files. Between checkpoints, pages can be written in any order. For example, it could happen that the clustered index page is written to the data file before the undo log page has been written. If the server were killed, the ib_logfile* deleted and the server restarted, then the index and undo pages could be out of sync with each other. The log file is the glue that guarantees that the server will recover to a particular logical point of time. If that glue is removed, the file system could have copies of pages corresponding to different points of time. The system tablespace pages 975 and 977 that are mentioned in your "is in the future" output could be undo log pages. Because apparently no such message was issued for the clustered index page, it seems possible that the undo log page corresponds to a newer point of time than the index page. As you can imagine, it is possible that the entire undo log record for the old modification had been purged, freed and reused for something else. Now, it could be the case that the 10.5 server is finding the old index page again and trying to execute purge again. But, the purge had actually been executed in the past! Only, for some reason, the log file as well as the modifications to the index page were lost. Let me ask you how this corruption was reached:
Also, how exactly was the upgrade performed?
And last but not least, how would you suggest us to improve our documentation? | |||||||||||||||||||||||||||||||||||||
| Comment by Otto Kekäläinen [ 2021-01-13 ] | |||||||||||||||||||||||||||||||||||||
|
The message log sequence number 1587610736226 is in the future is from the restart after the first crash. You should focus on reading the gdb-63031.log. That is where the first crash happens. It is quite understandable that the restart fails (visible in the gdb-6501.log) as the mariadbd crashed and left the files in an inconsistent state. I only provided gdb-6501.log as addiitonal information in case it helps shed some additional light into the first crash and the state after it. > Let me ask you how this corruption was reached 1. MariaDB 10.5 is installed and configures in stand-alone mode (wsrep_on=OFF in config). Everything works. WSREP_SST: [INFO] Waiting for SST streaming to complete! CRASH These lines indicate the restart: Note that journald does not guarantee the order of each exact line. Thus the crash+stacktrace is not the last output. Or maybe the crash is from one thread and the rest of the mariadbd outputs stuff before is shuts down. > Also, how exactly was the upgrade performed? Nothing special. Just `systemctl restart mariadb` -> join starts with SST from donor. > .. someone ‘fixed’ the problem by deleting the logs ? Nobody did anything. Only wsrep_on=ON and restart server. The donor 10.1 runs all the time and it does not crash. The crash is in 10.5.9 and it is unable to run. | |||||||||||||||||||||||||||||||||||||
| Comment by Otto Kekäläinen [ 2021-01-13 ] | |||||||||||||||||||||||||||||||||||||
|
marko Just to be clear, nobody set `innodb_force_recovery=6` nor deleted logs or did anything else manually. If by looking at the gdb stack trace or journald log it looks as if somebody did, it must be the server itself, which sounds pretty serious. | |||||||||||||||||||||||||||||||||||||
| Comment by Otto Kekäläinen [ 2021-01-13 ] | |||||||||||||||||||||||||||||||||||||
|
Seems that InnoDB assertion failures are common error messages that don't actually tell what has gone wrong (https://jira.mariadb.org/browse/MDEV-13542). The root cause is something that corrupted the data, and the assertion error is just the first one that triggers on bad data. | |||||||||||||||||||||||||||||||||||||
| Comment by Otto Kekäläinen [ 2021-01-13 ] | |||||||||||||||||||||||||||||||||||||
|
OK, this is not a 10.5.7/8 regression. The server fails to start also with MariaDB 10.5.6 (although there are no assertion errors in 10.5.6):
I have been able to upgrade a MariaDB 10.1.x Galera cluster to MariaDB 10.5.6 before on a near identical setup. So in this case maybe maria-backup is doing something funny while transfering the files. Looking at the data dirs after maria-backup is done with SST they seem to be ok, although on the donor side there are ib_logfile0 (48 MB) and ib_logfile1 (48 MB) while the receiving MariaDB 10.5.6 side has just one ib_logfile0 (96 MB). Both have a ibdata1 with equal size, but on the receiving side the timestamp is 5 minutes older. | |||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-01-13 ] | |||||||||||||||||||||||||||||||||||||
|
otto, if the logs really were not wrongly discarded at any point, then this "is in the future" problem is a bug in the backup workflow. Can you get more information on that? InnoDB crashing on corruption is a design issue (or you might even say, attitude problem), and hopefully some day The line of thinking during InnoDB development has been that error handling is hard, and it is difficult to cover all errors. Part of this thinking was that users should have properly working backups (and that there would no bugs in backup tools; see | |||||||||||||||||||||||||||||||||||||
| Comment by Otto Kekäläinen [ 2021-01-13 ] | |||||||||||||||||||||||||||||||||||||
|
Just to be sure, I checked that these system settings are identical on both donor and receiver:
It did not help however. Also I see that InnoDB does align the log sizes on the fly anyway:
| |||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-01-13 ] | |||||||||||||||||||||||||||||||||||||
|
I implemented the log resizing (rebuild) on server startup in MySQL 5.6. That should be safe; at least I am not aware of any problem with that. I have not reviewed the Galera snapshot transfer scripts. I hope that they are not ignoring any nonzero exit status from mariabackup, or that mariabackup did not wrongly return zero exit status even after encountering an error. (We have had bugs of the latter form in the past.) Can you get complete logs of all mariabackup invocations? | |||||||||||||||||||||||||||||||||||||
| Comment by Otto Kekäläinen [ 2021-01-13 ] | |||||||||||||||||||||||||||||||||||||
|
Ok I solved this now. The root cause was that apparently SST with maria-backup from 10.1->10.5 does not work, it does not properly run the prepare etc steps or maybe it just the logic that InnoDB has to upgrade redo logs etc. Solution: The key here is to do it quickly. If too much time passes and IST can't catch up and the joiner starts running SST it will always fail. | |||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-01-14 ] | |||||||||||||||||||||||||||||||||||||
|
otto, to narrow down the cause further, I would suggest that you try to point out the problematic steps or omissions in the SST scripts. Ideally, a patched script would be good. I have a few remarks that may be useful for ensuring that this will be fixed by our Galera team:
In this reported case, I do not think that we encountered the last point, but I want to mention it for completeness. | |||||||||||||||||||||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2021-01-14 ] | |||||||||||||||||||||||||||||||||||||
|
otto Does it work if you do Galera rolling upgrade 10.1->10.2->10.3->10.4->10.5 so not directly 10.1->10.5 ?. To further analyze it would be nice to have mariabackup logs. | |||||||||||||||||||||||||||||||||||||
| Comment by Otto Kekäläinen [ 2021-01-17 ] | |||||||||||||||||||||||||||||||||||||
|
marko It seems to me that the SST join process never runs `mysql_upgrade` and thus InnoDB fails on assertion error. When the IST runs there are also `Table mysql/innodb_index_stats has length mismatch in the column name table_name.` warnings right up until the upgrade starts:
jplindst Trying an 10.1->10.2->10.3->10.4->10.5 upgrade would be waste of time. I think we can just conclude that SST is not a supported upgrade method, but with IST even the 10.1 -> 10.5 upgrade works (as the result of this experience was). | |||||||||||||||||||||||||||||||||||||
| Comment by Otto Kekäläinen [ 2021-01-18 ] | |||||||||||||||||||||||||||||||||||||
|
In my opinion, to close this bug we should either 1) Documentation: Emit some warning / fatal error that a SST from a major older version is not possible so that the process stops with a sensible error message and does not proceed to crashing InnoDB (and cause "false" alarms and people doing gdb traces in vain). OR 2) Upgrade proofing: Alternatively make the Galera join process detect that the SST is from an older version and trigger a mariadb-upgrade run before InnoDB starts. | |||||||||||||||||||||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2021-01-19 ] | |||||||||||||||||||||||||||||||||||||
|
The supported method to do upgrades is documented e.g. here https://mariadb.com/kb/en/upgrading-from-mariadb-101-to-mariadb-102-with-galera-cluster/ . SST is not a upgrade method because it just copies database state from donor to joiner. This database state copy means all tables including mysql database. If tables are not compatible for server version on joiner there is not much we could do. | |||||||||||||||||||||||||||||||||||||
| Comment by Christopher Yeleighton [ 2021-02-13 ] | |||||||||||||||||||||||||||||||||||||
Also, assertions are not the right way to deal with inconsistent data. The purpose of assertions is to catch logic errors in the development phase. |