Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-20717

Plugin system variables and activation options can break "mysqld --wsrep_recover"

Details

    Description

      To determine a Galera Cluster node's position, users can execute the following:

      mysqld --wsrep_recover
      

      Or if the node is using systemd, then users can execute the following wrapper script:

      galera_recovery
      

      See here:

      https://mariadb.com/kb/en/library/getting-started-with-mariadb-galera-cluster/#determining-the-most-advanced-node

      MariaDB's systemd unit file also automatically calls this wrapper script to recover the Galera position before starting the server:

      https://github.com/MariaDB/server/blob/mariadb-10.4.8/support-files/mariadb.service.in#L70

      When mysqld is started in this recovery mode, it returns a successful exit code, even if the datadir isn't initialized. For example:

      [ec2-user@ip-172-30-0-96 ~]$ sudo rm -fr /var/lib/mysql/*
      [ec2-user@ip-172-30-0-96 ~]$ sudo galera_recovery
      WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1
      --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1
      [ec2-user@ip-172-30-0-96 ~]$ echo $?
      0
      

      This process needs to work with a non-initialized datadir. This is because with Galera Cluster, deleting the contents of the datadir is one of the most common ways to trigger SSTs.

      However, two things can currently cause this to fail:

      • Plugin system variables in MariaDB configuration files.
      • Plugin activation options in MariaDB configuration files.

      This is a very big problem on systems that use systemd, since this recovery process happens automatically.

      More details below.

      Plugin System Variables

      If the MariaDB configuration file references plugin system variables without the loose prefix, then that will lead to failures.

      For example, let's say that we have the following:

      [ec2-user@ip-172-30-0-96 ~]$ cat /etc/my.cnf.d/audit.cnf
      [mariadb]
      server_audit_logging = 1
      server_audit_events = 'CONNECT,QUERY,TABLE'
      

      This will cause the recovery process to fail when the datadir is empty. For example:

      [ec2-user@ip-172-30-0-96 ~]$ sudo rm -fr /var/lib/mysql/*
      [ec2-user@ip-172-30-0-96 ~]$ sudo galera_recovery
      WSREP: Failed to start mysqld for wsrep recovery: '2019-10-01 21:13:48 0 [Note] /usr/sbin/mysqld (mysqld 10.3.17-MariaDB-log) starting as process 17991 ...
      2019-10-01 21:13:48 0 [ERROR] Plugin 'aws_key_management' registration as a ENCRYPTION failed.
      2019-10-01 21:13:48 0 [Warning] The parameter innodb_file_format is deprecated and has no effect. It may be removed in future releases. See https://mariadb.com/kb/en/library/xtradbinnodb-file-format/
      2019-10-01 21:13:48 0 [Note] InnoDB: Using Linux native AIO
      2019-10-01 21:13:48 0 [Note] InnoDB: The first innodb_system data file 'ibdata1' did not exist. A new tablespace will be created!
      2019-10-01 21:13:48 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
      2019-10-01 21:13:48 0 [Note] InnoDB: Uses event mutexes
      2019-10-01 21:13:48 0 [Note] InnoDB: Compressed tables use zlib 1.2.7
      2019-10-01 21:13:48 0 [Note] InnoDB: Number of pools: 1
      2019-10-01 21:13:48 0 [Note] InnoDB: Using SSE2 crc32 instructions
      2019-10-01 21:13:48 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
      2019-10-01 21:13:48 0 [Note] InnoDB: Completed initialization of buffer pool
      2019-10-01 21:13:48 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
      2019-10-01 21:13:48 0 [Note] InnoDB: Setting file './ibdata1' size to 12 MB. Physically writing the file full; Please wait ...
      2019-10-01 21:13:48 0 [Note] InnoDB: File './ibdata1' size is now 12 MB.
      2019-10-01 21:13:48 0 [Note] InnoDB: Setting log file ./ib_logfile101 size to 50331648 bytes
      2019-10-01 21:13:48 0 [Note] InnoDB: Setting log file ./ib_logfile1 size to 50331648 bytes
      2019-10-01 21:13:48 0 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
      2019-10-01 21:13:48 0 [Note] InnoDB: New log files created, LSN=45786
      2019-10-01 21:13:48 0 [Note] InnoDB: Doublewrite buffer not found: creating new
      2019-10-01 21:13:48 0 [Note] InnoDB: Doublewrite buffer created
      2019-10-01 21:13:48 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
      2019-10-01 21:13:48 0 [Note] InnoDB: Creating foreign key constraint system tables.
      2019-10-01 21:13:48 0 [Note] InnoDB: Creating tablespace and datafile system tables.
      2019-10-01 21:13:48 0 [Note] InnoDB: Creating sys_virtual system tables.
      2019-10-01 21:13:48 0 [Note] InnoDB: Creating shared tablespace for temporary tables
      2019-10-01 21:13:48 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
      2019-10-01 21:13:48 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
      2019-10-01 21:13:48 0 [Note] InnoDB: Waiting for purge to start
      2019-10-01 21:13:48 0 [Note] InnoDB: 10.3.17 started; log sequence number 0; transaction id 7
      2019-10-01 21:13:48 0 [Warning] InnoDB: Skipping buffer pool dump/restore during wsrep recovery.
      2019-10-01 21:13:48 0 [Note] InnoDB: Creating #1 encryption thread id 140312278198016 total threads 2.
      2019-10-01 21:13:48 0 [Note] InnoDB: Creating #2 encryption thread id 140312269805312 total threads 2.
      2019-10-01 21:13:48 0 [Note] Plugin 'FEEDBACK' is disabled.
      2019-10-01 21:13:48 0 [ERROR] Could not open mysql.plugin table. Some plugins may be not loaded
      2019-10-01 21:13:48 0 [ERROR] /usr/sbin/mysqld: unknown variable 'server_audit_logging=1'
      2019-10-01 21:13:48 0 [ERROR] Aborting'
      [ec2-user@ip-172-30-0-96 ~]$ echo $?
      1
      

      The workaround for this is to add the loose prefix.

      https://mariadb.com/kb/en/library/mysqld-options/#-loose-

      For example, let's say that we have the following:

      [ec2-user@ip-172-30-0-96 ~]$ sudo cat /etc/my.cnf.d/audit.cnf
      [mariadb]
      loose_server_audit_logging = 1
      loose_server_audit_events = 'CONNECT,QUERY,TABLE'
      

      This will allow the recovery process to succeed when the datadir is empty. For example:

      [ec2-user@ip-172-30-0-96 ~]$ sudo rm -fr /var/lib/mysql/*
      [ec2-user@ip-172-30-0-96 ~]$ sudo galera_recovery
      WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1
      --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1
      [ec2-user@ip-172-30-0-96 ~]$ echo $?
      0
      

      Plugin Activation Options

      If the MariaDB configuration file sets plugin activation options to FORCE or FORCE_PLUS_PERMANENT without an associated plugin_load_add option, then that will lead to failures.

      https://mariadb.com/kb/en/library/plugin-overview/#configuring-plugin-activation-at-server-startup

      For example, let's say that we have the following:

      [ec2-user@ip-172-30-0-96 ~]$ sudo cat /etc/my.cnf.d/audit.cnf
      [mariadb]
      server_audit = FORCE_PLUS_PERMANENT
      loose_server_audit_logging = 1
      loose_server_audit_events = 'CONNECT,QUERY,TABLE'
      

      This will cause the recovery process to fail when the datadir is empty. For example:

      [ec2-user@ip-172-30-0-96 ~]$ sudo rm -fr /var/lib/mysql/*
      [ec2-user@ip-172-30-0-96 ~]$ sudo galera_recovery
      WSREP: Failed to start mysqld for wsrep recovery: '2019-10-01 21:19:53 0 [Note] /usr/sbin/mysqld (mysqld 10.3.17-MariaDB-log) starting as process 18334 ...
      2019-10-01 21:19:53 0 [ERROR] Plugin 'aws_key_management' registration as a ENCRYPTION failed.
      2019-10-01 21:19:53 0 [Warning] The parameter innodb_file_format is deprecated and has no effect. It may be removed in future releases. See https://mariadb.com/kb/en/library/xtradbinnodb-file-format/
      2019-10-01 21:19:53 0 [Note] InnoDB: Using Linux native AIO
      2019-10-01 21:19:53 0 [Note] InnoDB: The first innodb_system data file 'ibdata1' did not exist. A new tablespace will be created!
      2019-10-01 21:19:53 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
      2019-10-01 21:19:53 0 [Note] InnoDB: Uses event mutexes
      2019-10-01 21:19:53 0 [Note] InnoDB: Compressed tables use zlib 1.2.7
      2019-10-01 21:19:53 0 [Note] InnoDB: Number of pools: 1
      2019-10-01 21:19:53 0 [Note] InnoDB: Using SSE2 crc32 instructions
      2019-10-01 21:19:53 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
      2019-10-01 21:19:53 0 [Note] InnoDB: Completed initialization of buffer pool
      2019-10-01 21:19:53 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
      2019-10-01 21:19:53 0 [Note] InnoDB: Setting file './ibdata1' size to 12 MB. Physically writing the file full; Please wait ...
      2019-10-01 21:19:53 0 [Note] InnoDB: File './ibdata1' size is now 12 MB.
      2019-10-01 21:19:53 0 [Note] InnoDB: Setting log file ./ib_logfile101 size to 50331648 bytes
      2019-10-01 21:19:53 0 [Note] InnoDB: Setting log file ./ib_logfile1 size to 50331648 bytes
      2019-10-01 21:19:53 0 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
      2019-10-01 21:19:53 0 [Note] InnoDB: New log files created, LSN=45786
      2019-10-01 21:19:53 0 [Note] InnoDB: Doublewrite buffer not found: creating new
      2019-10-01 21:19:53 0 [Note] InnoDB: Doublewrite buffer created
      2019-10-01 21:19:53 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
      2019-10-01 21:19:53 0 [Note] InnoDB: Creating foreign key constraint system tables.
      2019-10-01 21:19:53 0 [Note] InnoDB: Creating tablespace and datafile system tables.
      2019-10-01 21:19:53 0 [Note] InnoDB: Creating sys_virtual system tables.
      2019-10-01 21:19:53 0 [Note] InnoDB: Creating shared tablespace for temporary tables
      2019-10-01 21:19:53 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
      2019-10-01 21:19:53 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
      2019-10-01 21:19:53 0 [Note] InnoDB: Waiting for purge to start
      2019-10-01 21:19:53 0 [Note] InnoDB: 10.3.17 started; log sequence number 0; transaction id 7
      2019-10-01 21:19:53 0 [Warning] InnoDB: Skipping buffer pool dump/restore during wsrep recovery.
      2019-10-01 21:19:53 0 [Note] InnoDB: Creating #1 encryption thread id 140458902673152 total threads 2.
      2019-10-01 21:19:53 0 [Note] InnoDB: Creating #2 encryption thread id 140458894280448 total threads 2.
      2019-10-01 21:19:53 0 [Note] Plugin 'FEEDBACK' is disabled.
      2019-10-01 21:19:53 0 [ERROR] Could not open mysql.plugin table. Some plugins may be not loaded
      2019-10-01 21:19:53 0 [ERROR] /usr/sbin/mysqld: unknown variable 'server_audit=FORCE_PLUS_PERMANENT'
      2019-10-01 21:19:53 0 [ERROR] Aborting'
      [ec2-user@ip-172-30-0-96 ~]$ echo $?
      1
      

      This situation is most likely to occur when the user has previously installed the plugin dynamically by executing INSTALL SONAME.

      https://mariadb.com/kb/en/library/plugin-overview/#installing-a-plugin-dynamically

      The workaround for this is to add the plugin_load_add to the MariaDB configuration file.

      https://mariadb.com/kb/en/library/plugin-overview/#installing-a-plugin-with-plugin-load-options

      https://mariadb.com/kb/en/library/mysqld-options/#-plugin-load-add

      For example, let's say that we have the following:

      [ec2-user@ip-172-30-0-96 ~]$ cat /etc/my.cnf.d/audit.cnf
      [mariadb]
      plugin_load_add = server_audit
      server_audit = FORCE_PLUS_PERMANENT
      loose_server_audit_logging = 1
      loose_server_audit_events = 'CONNECT,QUERY,TABLE'
      

      This will allow the recovery process to succeed when the datadir is empty. For example:

      [ec2-user@ip-172-30-0-96 ~]$ sudo rm -fr /var/lib/mysql/*
      [ec2-user@ip-172-30-0-96 ~]$ sudo galera_recovery
      WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1
      --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1
      [ec2-user@ip-172-30-0-96 ~]$ echo $?
      0
      

      Suggested Fix

      I think the suggested fix should probably be something like this:

      • If mysqld is started in --wsrep_recover mode, and if any plugin system variables are specified, then they should be treated as loose automatically.
      • If mysqld is started in --wsrep_recover mode, and if any plugin activation options are set to FORCE or FORCE_PLUS_PERMANENT, then they should be treated the same as ON, so that any errors are non-fatal. Should there be an exception for encryption plugins, since an encryption plugin may be needed to recover the position from an encrypted database server?

      Attachments

        Activity

          Feature request.

          jplindst Jan Lindström (Inactive) added a comment - Feature request.

          This seems more like a bug than a feature request to me, since this breaks SSTs/ISTs in a very common/normal situation. At least some workarounds already exist though.

          GeoffMontee Geoff Montee (Inactive) added a comment - This seems more like a bug than a feature request to me, since this breaks SSTs/ISTs in a very common/normal situation. At least some workarounds already exist though.

          Is there some real reason why you cant first use mysql_install_db and then wsrep-recover on initialized datadir? https://galeracluster.com/library/training/tutorials/restarting-cluster.html gives me understanding that galera-recover is intended for recovery process where you already have initial database.

          jplindst Jan Lindström (Inactive) added a comment - Is there some real reason why you cant first use mysql_install_db and then wsrep-recover on initialized datadir? https://galeracluster.com/library/training/tutorials/restarting-cluster.html gives me understanding that galera-recover is intended for recovery process where you already have initial database.

          Hi jplindst,

          Is there some real reason why you cant first use mysql_install_db and then wsrep-recover on initialized datadir?

          We are not talking about manually running galera_recovery. We are talking about galera_recovery running automatically when you start the server.

          The usual process to initiate an SST on a system with a corrupt data directory is:

          1. Delete the current data directory:

          $ rm -fr /var/lib/mysql/*
          

          2. Start the server:

          $ sudo systemctl start mariadb
          

          galera_recovery will automatically run in step #2.

          If we want users to run mysql_install_db between these steps, then it adds an additional step, and it is a change in procedure from what people are used to. Obviously, we can take this approach, but it is not the most user-friendly approach.

          gives me understanding that galera-recover is intended for recovery process where you already have initial database.

          The galera_recovery script is run every time the server starts. See the ExecStartPre directive from the systemd unit file:

          # Perform automatic wsrep recovery. When server is started without wsrep,
          # galera_recovery simply returns an empty string. In any case, however,
          # the script is not expected to return with a non-zero status.
          # It is always safe to unset _WSREP_START_POSITION environment variable.
          # Do not panic if galera_recovery script is not available. (MDEV-10538)
          ExecStartPre=/bin/sh -c "systemctl unset-environment _WSREP_START_POSITION"
          ExecStartPre=/bin/sh -c "[ ! -e @bindir@/galera_recovery ] && VAR= || \
           VAR=`cd @bindir@/..; @bindir@/galera_recovery`; [ $? -eq 0 ] \
           && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1"
          

          https://github.com/MariaDB/server/blob/mariadb-10.5.4/support-files/mariadb.service.in#L74

          Since the galera_recovery script is run for every startup, it needs to support every startup scenario that the server supports.

          e.g.:

          • If mysqld supports starting up with an empty datadir to initiate an SST, then the galera_recovery script should also support it--even if plugin system variables are defined in a configuration file.

          Does that help?

          GeoffMontee Geoff Montee (Inactive) added a comment - Hi jplindst , Is there some real reason why you cant first use mysql_install_db and then wsrep-recover on initialized datadir? We are not talking about manually running galera_recovery . We are talking about galera_recovery running automatically when you start the server. The usual process to initiate an SST on a system with a corrupt data directory is: 1. Delete the current data directory: $ rm -fr /var/lib/mysql/* 2. Start the server: $ sudo systemctl start mariadb galera_recovery will automatically run in step #2. If we want users to run mysql_install_db between these steps, then it adds an additional step, and it is a change in procedure from what people are used to. Obviously, we can take this approach, but it is not the most user-friendly approach. gives me understanding that galera-recover is intended for recovery process where you already have initial database. The galera_recovery script is run every time the server starts. See the ExecStartPre directive from the systemd unit file: # Perform automatic wsrep recovery. When server is started without wsrep, # galera_recovery simply returns an empty string. In any case, however, # the script is not expected to return with a non-zero status. # It is always safe to unset _WSREP_START_POSITION environment variable. # Do not panic if galera_recovery script is not available. (MDEV-10538) ExecStartPre=/bin/sh -c "systemctl unset-environment _WSREP_START_POSITION" ExecStartPre=/bin/sh -c "[ ! -e @bindir@/galera_recovery ] && VAR= || \ VAR=`cd @bindir@/..; @bindir@/galera_recovery`; [ $? -eq 0 ] \ && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1" https://github.com/MariaDB/server/blob/mariadb-10.5.4/support-files/mariadb.service.in#L74 Since the galera_recovery script is run for every startup, it needs to support every startup scenario that the server supports. e.g.: If mysqld supports starting up with an empty datadir to initiate an SST, then the galera_recovery script should also support it--even if plugin system variables are defined in a configuration file. Does that help?

          GeoffMontee Hi! I see two problems in the proposed solution:

          1) We do not know which variables are related to the plugin - if the plugin is not loaded, we will assume that these variables do not exist (if they do not have the "loose" prefix). How do we distinguish a plugin variable from just a wrong variable name? It seems to me that either this is a user problem (user must add the loose prefix himself), or, alternatively, we need to make separate sections for plugin's variables in the configuration file.

          2) While the plugins are not loaded, we do not know which of them are encryption plugins and which are not. We have a plugin type in the I_S table, but there it appears only after loading. And in the mysql.plugin table we only have a name and a shared library (without plugin type).

          We can ignore all plugins-related failures if they fail to load in recovery mode, but this can lead to problems if the encryption plugin fails.

          sysprg Julius Goryavsky added a comment - GeoffMontee Hi! I see two problems in the proposed solution: 1) We do not know which variables are related to the plugin - if the plugin is not loaded, we will assume that these variables do not exist (if they do not have the "loose" prefix). How do we distinguish a plugin variable from just a wrong variable name? It seems to me that either this is a user problem (user must add the loose prefix himself), or, alternatively, we need to make separate sections for plugin's variables in the configuration file. 2) While the plugins are not loaded, we do not know which of them are encryption plugins and which are not. We have a plugin type in the I_S table, but there it appears only after loading. And in the mysql.plugin table we only have a name and a shared library (without plugin type). We can ignore all plugins-related failures if they fail to load in recovery mode, but this can lead to problems if the encryption plugin fails.

          Hi sysprg,

          1) We do not know which variables are related to the plugin - if the plugin is not loaded, we will assume that these variables do not exist (if they do not have the "loose" prefix). How do we distinguish a plugin variable from just a wrong variable name? It seems to me that either this is a user problem (user must add the loose prefix himself), or, alternatively, we need to make separate sections for plugin's variables in the configuration file.

          In this specific case, the data directory is completely empty. Can all non-essential system variables be treated as "loose" during --wsrep-recover only if the underlying data directory is completely empty? I am not sure if that could cause other problems though.

          2) While the plugins are not loaded, we do not know which of them are encryption plugins and which are not. We have a plugin type in the I_S table, but there it appears only after loading. And in the mysql.plugin table we only have a name and a shared library (without plugin type).

          In this specific case, the data directory is completely empty. If the data directory is completely empty, then I am not sure why an encryption plugin would matter. In fact, if the data directory is completely empty, then I don't think there is anything for --wsrep-recover to do.

          If --wsrep-recover finds an empty data directory, can't it just ignore unknown system variables and return its "special" start position, as it usually does in this case?:

          $ sudo galera_recovery
          WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1
          --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1
          

          Any unknown system variables would still be caught when the server starts up normally after the SST, right?

          Thanks!

          GeoffMontee Geoff Montee (Inactive) added a comment - Hi sysprg , 1) We do not know which variables are related to the plugin - if the plugin is not loaded, we will assume that these variables do not exist (if they do not have the "loose" prefix). How do we distinguish a plugin variable from just a wrong variable name? It seems to me that either this is a user problem (user must add the loose prefix himself), or, alternatively, we need to make separate sections for plugin's variables in the configuration file. In this specific case, the data directory is completely empty. Can all non-essential system variables be treated as "loose" during --wsrep-recover only if the underlying data directory is completely empty? I am not sure if that could cause other problems though. 2) While the plugins are not loaded, we do not know which of them are encryption plugins and which are not. We have a plugin type in the I_S table, but there it appears only after loading. And in the mysql.plugin table we only have a name and a shared library (without plugin type). In this specific case, the data directory is completely empty. If the data directory is completely empty, then I am not sure why an encryption plugin would matter. In fact, if the data directory is completely empty, then I don't think there is anything for --wsrep-recover to do. If --wsrep-recover finds an empty data directory, can't it just ignore unknown system variables and return its "special" start position, as it usually does in this case?: $ sudo galera_recovery WSREP: Recovered position 00000000-0000-0000-0000-000000000000:-1 --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1 Any unknown system variables would still be caught when the server starts up normally after the SST, right? Thanks!

          People

            jplindst Jan Lindström (Inactive)
            GeoffMontee Geoff Montee (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.