Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26922

Invalid configuration checked after IST/SST

Details

    • Bug
    • Status: Stalled (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.3.31, 10.4.21, 10.5.12, 10.6.4
    • 10.5, 10.6
    • Server
    • None

    Description

      After a MariaDB/galera node starts, it will do IST/SST but if latter an invalid variable is found it will crash as can be seen from the log:

      2021-10-27 18:56:33 0 [Note] WSREP: Loading provider /usr/lib64/galera-4/libgalera_smm.so initial position: 08260eb4-3755-11ec-9a25-86269a5f3dc7:151
      2021-10-27 18:56:33 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera-4/libgalera_smm.so'
      2021-10-27 18:56:33 0 [Note] WSREP: wsrep_load(): Galera 26.4.9(r819f29c) by Codership Oy <info@codership.com> loaded successfully.
      ...
      2021-10-27 18:56:34 0 [Note] WSREP: Joiner monitor thread started to monitor
      2021-10-27 18:56:34 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '10.202.236.218' --datadir '/var/lib/mysql/' --parent '23191' --mysqld-args --wsrep_start_position=08260eb4-3755-11ec-9a25-86269a5f3dc7:151'
      2021-10-27 18:56:34 1 [Note] WSREP: ####### IST uuid:00000000-0000-0000-0000-000000000000 f: 0, l: 153, STRv: 3
      2021-10-27 18:56:34 1 [Note] WSREP: IST receiver addr using tcp://10.202.236.218:4568
      2021-10-27 18:56:34 1 [Note] WSREP: Prepared IST receiver for 0-153, listening at: tcp://10.202.236.218:4568
      2021-10-27 18:56:34 0 [Note] WSREP: Member 0.0 (node1) requested state transfer from '*any*'. Selected 1.0 (node2)(SYNCED) as donor.
      ...
      021-10-27 18:56:37 0 [ERROR] /usr/sbin/mariadbd: unknown option '--enforce_gtid_consistency'
      2021-10-27 18:56:37 0 [ERROR] Aborting
      terminate called after throwing an instance of 'wsrep::runtime_error'
        what():  State wait was interrupted
      211027 18:56:37 [ERROR] mysqld got signal 6 ;
      

      It can be seen that one invalid variable was found on the my.cnf file, but the server aborted AFTER doing the entire SST process which can take long.
      Not only the server abort might be missed if the SST process is left running overnight, but also when you remove the invalid variable and restart MariaDB once again, SST might need to execute again, taking again a long time.

      To reproduce just create a 2/3 node galera cluster and put some invalid variable name on the config file under [mysqld]

      Attachments

        Activity

          Generally, it's not possible to detect invalid parameters before SST. The thing is, invalid parameters can only be detected after all plugins (storage engines are plugins too) are loaded — that is, after server knows what parameters are valid plugin parameters. And SST can copy InnoDB files, so it needs to be done before InnoDB is loaded.

          If you use "mysqldump" SST method, that doesn't copy files, you'll likely see "invalid option" error first (but I didn't try it).

          serg Sergei Golubchik added a comment - Generally, it's not possible to detect invalid parameters before SST. The thing is, invalid parameters can only be detected after all plugins (storage engines are plugins too) are loaded — that is, after server knows what parameters are valid plugin parameters. And SST can copy InnoDB files, so it needs to be done before InnoDB is loaded. If you use "mysqldump" SST method, that doesn't copy files, you'll likely see "invalid option" error first (but I didn't try it).

          Ideally, though, SST wouldn't need to be repeated. sysprg, why does it happen and can it be avoided?

          serg Sergei Golubchik added a comment - Ideally, though, SST wouldn't need to be repeated. sysprg , why does it happen and can it be avoided?
          sysprg Julius Goryavsky added a comment - - edited

          serg As a first approximation, we have an abnormal termination here and in this case, the state of Galera is considered inconsistent. Therefore, the next time we start the server, we start SST from scratch again (since the current state is marked as inconsistent in the Galera-specific file). I need to look at the code in detail in order to understand if we can initiate a normal shutdown with diagnostics here (with a normal shutdown of Galera as a consequence of this), instead of abnormal termination. If it is possible, then the state that we received as a result of SST will not be marked as inconsistent and it will be saved, rather than completely lost.

          sysprg Julius Goryavsky added a comment - - edited serg As a first approximation, we have an abnormal termination here and in this case, the state of Galera is considered inconsistent. Therefore, the next time we start the server, we start SST from scratch again (since the current state is marked as inconsistent in the Galera-specific file). I need to look at the code in detail in order to understand if we can initiate a normal shutdown with diagnostics here (with a normal shutdown of Galera as a consequence of this), instead of abnormal termination. If it is possible, then the state that we received as a result of SST will not be marked as inconsistent and it will be saved, rather than completely lost.

          People

            sysprg Julius Goryavsky
            ctutte carlos tutte
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.