[MDEV-20717] Plugin system variables and activation options can break "mysqld --wsrep_recover" Created: 2019-10-01 Updated: 2021-05-02 Resolved: 2021-01-28 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Configuration, Galera, Plugins, Variables, wsrep |
| Affects Version/s: | None |
| Fix Version/s: | 10.2.37, 10.3.28, 10.4.18, 10.5.9 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Geoff Montee (Inactive) | Assignee: | Jan Lindström (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Description |
|
To determine a Galera Cluster node's position, users can execute the following:
Or if the node is using systemd, then users can execute the following wrapper script:
See here: MariaDB's systemd unit file also automatically calls this wrapper script to recover the Galera position before starting the server: https://github.com/MariaDB/server/blob/mariadb-10.4.8/support-files/mariadb.service.in#L70 When mysqld is started in this recovery mode, it returns a successful exit code, even if the datadir isn't initialized. For example:
This process needs to work with a non-initialized datadir. This is because with Galera Cluster, deleting the contents of the datadir is one of the most common ways to trigger SSTs. However, two things can currently cause this to fail:
This is a very big problem on systems that use systemd, since this recovery process happens automatically. More details below. Plugin System VariablesIf the MariaDB configuration file references plugin system variables without the loose prefix, then that will lead to failures. For example, let's say that we have the following:
This will cause the recovery process to fail when the datadir is empty. For example:
The workaround for this is to add the loose prefix. https://mariadb.com/kb/en/library/mysqld-options/#-loose- For example, let's say that we have the following:
This will allow the recovery process to succeed when the datadir is empty. For example:
Plugin Activation OptionsIf the MariaDB configuration file sets plugin activation options to FORCE or FORCE_PLUS_PERMANENT without an associated plugin_load_add option, then that will lead to failures. https://mariadb.com/kb/en/library/plugin-overview/#configuring-plugin-activation-at-server-startup For example, let's say that we have the following:
This will cause the recovery process to fail when the datadir is empty. For example:
This situation is most likely to occur when the user has previously installed the plugin dynamically by executing INSTALL SONAME. https://mariadb.com/kb/en/library/plugin-overview/#installing-a-plugin-dynamically The workaround for this is to add the plugin_load_add to the MariaDB configuration file. https://mariadb.com/kb/en/library/plugin-overview/#installing-a-plugin-with-plugin-load-options https://mariadb.com/kb/en/library/mysqld-options/#-plugin-load-add For example, let's say that we have the following:
This will allow the recovery process to succeed when the datadir is empty. For example:
Suggested FixI think the suggested fix should probably be something like this:
|
| Comments |
| Comment by Jan Lindström (Inactive) [ 2019-12-12 ] | |||||||||||
|
Feature request. | |||||||||||
| Comment by Geoff Montee (Inactive) [ 2019-12-16 ] | |||||||||||
|
This seems more like a bug than a feature request to me, since this breaks SSTs/ISTs in a very common/normal situation. At least some workarounds already exist though. | |||||||||||
| Comment by Jan Lindström (Inactive) [ 2020-06-22 ] | |||||||||||
|
Is there some real reason why you cant first use mysql_install_db and then wsrep-recover on initialized datadir? https://galeracluster.com/library/training/tutorials/restarting-cluster.html gives me understanding that galera-recover is intended for recovery process where you already have initial database. | |||||||||||
| Comment by Geoff Montee (Inactive) [ 2020-06-24 ] | |||||||||||
|
Hi jplindst,
We are not talking about manually running galera_recovery. We are talking about galera_recovery running automatically when you start the server. The usual process to initiate an SST on a system with a corrupt data directory is: 1. Delete the current data directory:
2. Start the server:
galera_recovery will automatically run in step #2. If we want users to run mysql_install_db between these steps, then it adds an additional step, and it is a change in procedure from what people are used to. Obviously, we can take this approach, but it is not the most user-friendly approach.
The galera_recovery script is run every time the server starts. See the ExecStartPre directive from the systemd unit file:
https://github.com/MariaDB/server/blob/mariadb-10.5.4/support-files/mariadb.service.in#L74 Since the galera_recovery script is run for every startup, it needs to support every startup scenario that the server supports. e.g.:
Does that help? | |||||||||||
| Comment by Julius Goryavsky [ 2021-01-20 ] | |||||||||||
|
GeoffMontee Hi! I see two problems in the proposed solution: 1) We do not know which variables are related to the plugin - if the plugin is not loaded, we will assume that these variables do not exist (if they do not have the "loose" prefix). How do we distinguish a plugin variable from just a wrong variable name? It seems to me that either this is a user problem (user must add the loose prefix himself), or, alternatively, we need to make separate sections for plugin's variables in the configuration file. 2) While the plugins are not loaded, we do not know which of them are encryption plugins and which are not. We have a plugin type in the I_S table, but there it appears only after loading. And in the mysql.plugin table we only have a name and a shared library (without plugin type). We can ignore all plugins-related failures if they fail to load in recovery mode, but this can lead to problems if the encryption plugin fails. | |||||||||||
| Comment by Geoff Montee (Inactive) [ 2021-01-20 ] | |||||||||||
|
Hi sysprg,
In this specific case, the data directory is completely empty. Can all non-essential system variables be treated as "loose" during --wsrep-recover only if the underlying data directory is completely empty? I am not sure if that could cause other problems though.
In this specific case, the data directory is completely empty. If the data directory is completely empty, then I am not sure why an encryption plugin would matter. In fact, if the data directory is completely empty, then I don't think there is anything for --wsrep-recover to do. If --wsrep-recover finds an empty data directory, can't it just ignore unknown system variables and return its "special" start position, as it usually does in this case?:
Any unknown system variables would still be caught when the server starts up normally after the SST, right? Thanks! |