[MDEV-14622] Running mysql_upgrade in background upon package installation or upgrade causes deadlocks and other side-effects Created: 2017-12-11 Updated: 2022-01-29 Resolved: 2021-11-19 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Platform Debian |
| Affects Version/s: | 10.1, 10.2, 10.3, 10.4, 10.5 |
| Fix Version/s: | 10.5.4 |
| Type: | Bug | Priority: | Major |
| Reporter: | Elena Stepanova | Assignee: | Otto Kekäläinen |
| Resolution: | Fixed | Votes: | 2 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Description |
|
MariaDB Debian startup scripts, among other things, after server startup invoke debian-start, which starts a block of functions in background and exits. The block of functions includes mysql_upgrade script (from within upgrade_system_tables_if_necessary).
On non-systemd installations, this block proceeds after the main script ends, and eventually performs the necessary checks. It's a questionable decision, because the server might start being used before mysql_upgrade has been run. It would be more reasonable either wait till it finishes, if mysql_upgrade is really necessary, or not run it at all, if it's not. However, it's a minor problem. On systemd installations, when the main script finishes, the background block aborts too. So, mysql_upgrade has almost no chance to be executed. It remains unnoticed, the server reports that debian-start was executed successfully:
but both syslog and general log, if enabled, confirms that the background jobs have never been finished. Also, result of these jobs is ignored, even if mysql_upgrade fails, it is not reflected in the service status anyhow. |
| Comments |
| Comment by Marc Olzheim [ 2018-01-02 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Indeed, I discovered the same problem, while migrating from Ubuntu 14.04 to 16.04, so actually to systemd. The systemd ExecPostStart subprocesses seem to be cleaned up after exit. This patch does the trick for me:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jean Weisbuch [ 2018-01-26 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The execution of debian-start can be very long (several minutes) on servers having thousands of tables. For example, on a server with about 120000 tables, it takes more than 6 minutes to do the mysql_upgrade itself, the check_for_crashed_tables takes almost 5 minutes. – – It might also be disabled as the engine itself will check the table at first openening of the file and the behavior could be customized using the myisam_recover_options and aria_recover_options. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2018-04-01 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
https://www.freedesktop.org/software/systemd/man/systemd.service.html#ExecStartPre= says this should be used for long running processes and also describes the kill off of process. I suspect that creating 1 (or more individual) services that "After=mariadb.service/BindsTo=mariadb.service" that way they length of their execution and status is independent of the mariadb.service. As independent services other services could depend the completion of the table check for instance. Options to increase the parallelism of these operations could also be considered. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Otto Kekäläinen [ 2018-04-09 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Oracle MySQL seems to have removed this script completely: https://salsa.debian.org/mariadb-team/mysql/commit/f12dd3fb5387113585a981e2b8d234e81c6a630d It is legacy anyway, and I don't think anybody has that much deeply reviewed it since systemd was introduced and/or pre/postinstal scripts updated to fix other bugs. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jean Weisbuch [ 2018-04-17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
check_for_crashed_tables could probably be removed altogether as automatic detection and recovery of crashed tables at opening has improved. Keeping the automatic mysql_upgrade execution shouldnt be useful if we are sure that the post-install does work well (which doesnt work well in my experience in MySQL 5.7 official packages at least on systems not using Systemd by not running mysql_upgrade at package upgrades) and the check_root_accounts is nice to have but its removal wont break anything. If you are sure that mysql_upgrade is correctly executed by the post-install of the package upgrade i think it could be removed, meaning that the MYCHECK_RCPT variable on mariadb-server-10.*.mysql.default could also be removed. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2020-07-05 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The problems from running mysql_upgrade in the background still exist, moreover they seem to worsen in higher versions, as both mysql_upgrade and post-install scripts of engine-specific packages get more SQL logic and thus the probability of conflicts grows. The obvious problem is when the server installation ends and the client may start running something while mysql_upgrade is happening in the background – and since it deals with system tables, various strange effects may occur. The less obvious but even bigger problem is that when the server is installed together with other packages, mysql_upgrade starts as soon as server installation is finished, but other packages are still being installed, and it causes race conditions. Specifically, the frequent outcome in buildbot is a hang upon spider installation. It happens due to a deadlock between spider creating its own objects and mysql_upgrade modifying system tables. One I managed to catch live is below, there may be other ones. Technically the deadlock isn't permanent, it would probably resolve itself after lock_wait_timeout is exceeded, but since it's 1 day by default, for all practical purposes it makes no difference. Besides, even if it was set to a lower level, nothing good would come out of either mysql_upgrade or Spider logic failing with a lock wait timeout.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Otto Kekäläinen [ 2020-07-06 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Good debugging. I am sure the mysql_upgrade is not run optimally now, nor is the server restarted after installs/upgrades optimally. Something to mitigate the parallel runs can maybe be engineered in the Debian maintainer scripts, but as suggested in https://jira.mariadb.org/browse/MDEV-14622?focusedCommentId=109469&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-109469 we could probably remove this debian-start script completely? If the RPM installs can do without it, why not the Debian as well? Somebody with deeper understanding of the philosophy behind the mysql_upgrade script and when and how it should run could comment here. And why do we even need a mysql_upgrade script, can't the mariadbd daemon itself just auto-upgrade it's tables if they seem outdated? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Otto Kekäläinen [ 2020-07-07 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Buildbot seems to have tests for this now, making all builds fail:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2020-07-07 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
No, that's not that. The logic is contrary, buildbot has to wait for mysql_upgrade to finish before it continues the test, in order to reduce the probability of the failures caused by it running in the background. It doesn't eliminate the problem completely, as the main issue is the conflict between installation scripts themselves, when we install packages in a bulk and have no control over it (e.g. a race between mysql_upgrade after server installation and engine packages still being installed at the same time). The waiting logic has been there in buildbot for a very long time. The cause of recent "all builds fail" was a typo in the buildbot config file, it affected debian installation tests yesterday evening and upgrade tests today for half a day, should already be fixed by now. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jean Weisbuch [ 2020-07-07 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I agree that it should be the server itself that checks at startup that its internal data structure (the "mysql" database) is up-to-date then corrects/update it before accepting connections so you don't risk having a half-working server online with some of its functions that are disabled or limited as long as the structure is not right. Checking the non-internal databases for UPGRADE during startup would be problematic on servers with a great number of databases/tables and would require a different logic. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2020-07-08 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
> it should be the server itself that checks at startup that its internal data structure (the "mysql" database) is up-to-date then corrects/update That's all good as an idea for remote future, but we have several GA versions which will still be supported for years. They obviously can't take that kind of a change, but are still suffering from this problem. Something has to be done about all of it much earlier than a radical change making the server itself take care of it is implemented, if ever. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jean Weisbuch [ 2020-07-08 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
In this case i believe that executing "mysql_upgrade --upgrade-system-tables" (if it's enough to fix the issues you are talking about) at the end of "mariadb-server-$VERSION.postinst", just after it started the server could be a simple solution. – A dirty but simple fix could be to put a "flag" file on the datadir after executing a "mysql_upgrade --upgrade-system-tables" to inform that the "mysql_upgrade --force" must be executed in the background after the server is started, it could be executed by debian-start in nohup or a similar non blocking fashion and the flag would be removed once the mysql_upgrade execution has finished. – | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Otto Kekäläinen [ 2020-08-03 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I've added this to https://jira.mariadb.org/browse/MCOL-4057 to track it for ColumnStore packaging. Currently the way ColumnStore works requires it to restart the whole server, just adding the plugin does not activate (unlike it does for other plugins).
Triggers are also used during installation but for ColumnStore it is just about ldconfig, not the service.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Otto Kekäläinen [ 2020-08-22 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I maybe managed to reproduce this on Salsa-CI as well, there the Spider installation step gets stuck and does not progress: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Otto Kekäläinen [ 2021-02-06 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I believe this was fixed in Feel free to re-open this elenst if you come across the same failure again. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2021-11-19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Edit: spider moved to | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2021-11-19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
closing again - | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Michael Widenius [ 2022-01-18 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Just a note: The parallel run problem is fixed in Daniel's and mine changes to mysql_upgrade. Will soon to be pushed in 10.2 (final testing is going on) |