Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31506

Updating cluster to a new major version Reports Upgrade after a crash is not supported with rsync sst

Details

    • Bug
    • Status: Confirmed (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.8.8, 10.6.14, 10.11.4, 10.7(EOL)
    • 10.11
    • None
    • Amazon Linux using centos7-amd64 MariaDB server builds

    Description

      We are running a 5-node MariaDB Galera cluster and are unable to adapt the latest LTS 10.11. As a general approach we have completely automated database node setup and have only made changes to the cluster by replacing existing nodes with new ones with a changed configuration or later database server version. Here is how we're trying to update the cluster:

      1. Configure & launch a new node to the existing cluster
      2. As a part of its launch operations, the new node makes a state transfer from one node in the existing cluster (automatically; this is a standard operation)
      3. After successful launch of a new node, I desync & terminate the oldest node of cluster
      4. (repeat until all cluster nodes are of new version)

      In the beginning our cluster was of version 10.6.14 and we tried to start updating directly to 10.11.4 by configuring & launching a node to the cluster. MariaDB server startup failed and reported the following error:

      WSREP: Failed to start mysqld for wsrep recovery: '[Note] Starting MariaDB 10.11.4-MariaDB-log source revision 4e2b93dffef2414a11ca5edc8d215f57ee5010e5 as process 5688
      [Note] InnoDB: Compressed tables use zlib 1.2.7
      [Note] InnoDB: Number of transaction pools: 1
      [Note] InnoDB: Using crc32 + pclmulqdq instructions
      [Note] InnoDB: Using Linux native AIO
      [Note] InnoDB: Initializing buffer pool, total size = XGiB, chunk size = YMiB
      [Note] InnoDB: Completed initialization of buffer pool
      [Note] InnoDB: File system buffers for log disabled (block size=512 bytes)
      [ERROR] InnoDB: Upgrade after a crash is not supported. The redo log was created with MariaDB 10.5.10. You must start up and shut down MariaDB 10.7 or earlier.
      [ERROR] InnoDB: Plugin initialization aborted with error Generic error
      [Note] InnoDB: Starting shutdown...
      [ERROR] Plugin 'InnoDB' init function returned error.
      [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
      [Note] Plugin 'FEEDBACK' is disabled.
      [ERROR] Unknown/unsupported storage engine: InnoDB
      [ERROR] Aborting'
      systemd[1]: mariadb.service: control process exited, code=exited status=1
      systemd[1]: Failed to start MariaDB 10.11.4 database server.

      We then decided to try to updating one major version step at a time. Updating the cluster to version 10.7.8 succeeded. Attempt to update from 10.7.8 to 10.8.8 failed to a similar error message again.

      I found out that there's an earlier bug MDEV-27437 about quite similar issue, but it seems that that issue should not affect our use case as we are using a later version of mariadb server and the suggested wsrep_sst_method=rsync. Attached our server.cnf.

      Attachments

        Issue Links

          Activity

            danblack Daniel Black added a comment - - edited

            Might be able to do workaround container (docker) solution as part of scripts/wsrep_sst_mariabackup.sh:clean_at_exist to do the recovery.

            Cause is the rsync sst only does a FLUSH TABLE WITH READ LOCK, even docs suggest a BACKUP STAGE BLOCK COMMIT should be done.

            Possible to do between flush tables and rsync is:

            SET GLOBAL innodb_max_purge_lag_wait=0;
            

            (need to trigger a purge thread?)

            ref: MDEV-16952

            danblack Daniel Black added a comment - - edited Might be able to do workaround container (docker) solution as part of scripts/wsrep_sst_mariabackup.sh:clean_at_exist to do the recovery. Cause is the rsync sst only does a FLUSH TABLE WITH READ LOCK, even docs suggest a BACKUP STAGE BLOCK COMMIT should be done. Possible to do between flush tables and rsync is: SET GLOBAL innodb_max_purge_lag_wait=0; (need to trigger a purge thread?) ref: MDEV-16952

            The InnoDB redo log format was last changed in MDEV-14425. I remember that around the time it was implemented in early 2022, there was some discussion with janlindstrom that the Galera upgrade and SST would need to be adjusted so that there would be an option to ‘prepare’ the data using a server version that is compatible with the donor. But I do not remember any follow-up on that.

            As far as I can tell, this affects both the default wsrep_sst_method=rsync and wsrep_sst_method=mariabackup.

            I wonder how upgrades from earlier versions to 10.2 (with a MySQL 5.7 compatible redo log format) or 10.5 (MDEV-12353) worked. There was some talk on testing ‘rolling upgrades’, but I do not remember details.

            marko Marko Mäkelä added a comment - The InnoDB redo log format was last changed in MDEV-14425 . I remember that around the time it was implemented in early 2022, there was some discussion with janlindstrom that the Galera upgrade and SST would need to be adjusted so that there would be an option to ‘prepare’ the data using a server version that is compatible with the donor. But I do not remember any follow-up on that. As far as I can tell, this affects both the default wsrep_sst_method=rsync and wsrep_sst_method=mariabackup . I wonder how upgrades from earlier versions to 10.2 (with a MySQL 5.7 compatible redo log format) or 10.5 ( MDEV-12353 ) worked. There was some talk on testing ‘rolling upgrades’, but I do not remember details.

            People

              sysprg Julius Goryavsky
              tomitukiainen Tomi Tukiainen
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.