[MDEV-26922] Invalid configuration checked after IST/SST Created: 2021-10-27  Updated: 2023-04-27

Status: Stalled
Project: MariaDB Server
Component/s: Server
Affects Version/s: 10.3.31, 10.4.21, 10.5.12, 10.6.4
Fix Version/s: 10.4, 10.5, 10.6

Type: Bug Priority: Major
Reporter: carlos tutte Assignee: Julius Goryavsky
Resolution: Unresolved Votes: 0
Labels: None


 Description   

After a MariaDB/galera node starts, it will do IST/SST but if latter an invalid variable is found it will crash as can be seen from the log:

2021-10-27 18:56:33 0 [Note] WSREP: Loading provider /usr/lib64/galera-4/libgalera_smm.so initial position: 08260eb4-3755-11ec-9a25-86269a5f3dc7:151
2021-10-27 18:56:33 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera-4/libgalera_smm.so'
2021-10-27 18:56:33 0 [Note] WSREP: wsrep_load(): Galera 26.4.9(r819f29c) by Codership Oy <info@codership.com> loaded successfully.
...
2021-10-27 18:56:34 0 [Note] WSREP: Joiner monitor thread started to monitor
2021-10-27 18:56:34 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '10.202.236.218' --datadir '/var/lib/mysql/' --parent '23191' --mysqld-args --wsrep_start_position=08260eb4-3755-11ec-9a25-86269a5f3dc7:151'
2021-10-27 18:56:34 1 [Note] WSREP: ####### IST uuid:00000000-0000-0000-0000-000000000000 f: 0, l: 153, STRv: 3
2021-10-27 18:56:34 1 [Note] WSREP: IST receiver addr using tcp://10.202.236.218:4568
2021-10-27 18:56:34 1 [Note] WSREP: Prepared IST receiver for 0-153, listening at: tcp://10.202.236.218:4568
2021-10-27 18:56:34 0 [Note] WSREP: Member 0.0 (node1) requested state transfer from '*any*'. Selected 1.0 (node2)(SYNCED) as donor.
...
021-10-27 18:56:37 0 [ERROR] /usr/sbin/mariadbd: unknown option '--enforce_gtid_consistency'
2021-10-27 18:56:37 0 [ERROR] Aborting
terminate called after throwing an instance of 'wsrep::runtime_error'
  what():  State wait was interrupted
211027 18:56:37 [ERROR] mysqld got signal 6 ;

It can be seen that one invalid variable was found on the my.cnf file, but the server aborted AFTER doing the entire SST process which can take long.
Not only the server abort might be missed if the SST process is left running overnight, but also when you remove the invalid variable and restart MariaDB once again, SST might need to execute again, taking again a long time.

To reproduce just create a 2/3 node galera cluster and put some invalid variable name on the config file under [mysqld]



 Comments   
Comment by Sergei Golubchik [ 2021-11-23 ]

Generally, it's not possible to detect invalid parameters before SST. The thing is, invalid parameters can only be detected after all plugins (storage engines are plugins too) are loaded — that is, after server knows what parameters are valid plugin parameters. And SST can copy InnoDB files, so it needs to be done before InnoDB is loaded.

If you use "mysqldump" SST method, that doesn't copy files, you'll likely see "invalid option" error first (but I didn't try it).

Comment by Sergei Golubchik [ 2021-11-23 ]

Ideally, though, SST wouldn't need to be repeated. sysprg, why does it happen and can it be avoided?

Comment by Julius Goryavsky [ 2021-11-23 ]

serg As a first approximation, we have an abnormal termination here and in this case, the state of Galera is considered inconsistent. Therefore, the next time we start the server, we start SST from scratch again (since the current state is marked as inconsistent in the Galera-specific file). I need to look at the code in detail in order to understand if we can initiate a normal shutdown with diagnostics here (with a normal shutdown of Galera as a consequence of this), instead of abnormal termination. If it is possible, then the state that we received as a result of SST will not be marked as inconsistent and it will be saved, rather than completely lost.

Generated at Thu Feb 08 09:48:58 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.