[MDEV-18863] Galera SST scripts can't read mysqld_multi's [mysqldN] option groups Created: 2019-03-08  Updated: 2020-08-25  Resolved: 2019-08-19

Status: Closed
Project: MariaDB Server
Component/s: Galera, Galera SST, Scripts & Clients
Affects Version/s: 10.1, 10.2, 10.3, 10.4
Fix Version/s: 10.2.27, 10.1.43, 10.3.18, 10.4.8

Type: Bug Priority: Critical
Reporter: Geoff Montee (Inactive) Assignee: Julius Goryavsky
Resolution: Fixed Votes: 1
Labels: None

Issue Links:
Problem/Incident
causes MDEV-20614 Syntax error, and option put in wrong... Closed
causes MDEV-20728 /usr/sbin/mysqld: unknown variable 'd... Closed
causes MDEV-21039 Server fails to start with mysqld_mul... Closed
causes MDEV-21526 mysqld_multi no longer works with dif... Closed
Relates
relates to MDEV-12956 datadir must be specified in [mysqld]... Closed

 Description   

mysqld_multi uses special option groups with names like [mysqld1], [mysqld2], ..., [mysqldN].

https://mariadb.com/kb/en/library/mysqld_multi/#option-groups

It looks like SST scripts can't currently support these option groups. The only option group-related value it gets from the server is --defaults-group-suffix from the server, if that option was set for mysqld when the server was started:

https://github.com/MariaDB/server/blob/93ac7ae70ff000353538f732899b421a3f2ea7ce/scripts/wsrep_sst_common.sh#L95

https://github.com/MariaDB/server/blob/93ac7ae70ff000353538f732899b421a3f2ea7ce/scripts/wsrep_sst_common.sh#L196

The SST script does not get told by the server to read these option groups, so this means that the SST script will fail to read options like innodb-data-home-dir when it is in a mysqld_multi option group like [mysqld1]:

https://github.com/MariaDB/server/blob/93ac7ae70ff000353538f732899b421a3f2ea7ce/scripts/wsrep_sst_rsync.sh#L174

https://github.com/MariaDB/server/blob/93ac7ae70ff000353538f732899b421a3f2ea7ce/scripts/wsrep_sst_mariabackup.sh#L710

So maybe this problem could easily be fixed by changing mysqld_multi, so that it sets --defaults-group-suffix=N when it starts mysqld. For example, if you start instance number 1 with mysqld_multi, then it could pass --defaults-group-suffix=1 to mysqld, so that it reads group [mysqld1]. I believe that this should allow mysqld to also pass --defaults-group-suffix=N to the SST script, so that it also reads the proper option group.

Currently, it does not look like the mysqld_multi script sets --defaults-group-suffix=N. Instead, the script reads all of the options from the [mysqldN] option group in the configuration file itself, and then passes those options to mysqld on the command-line:

https://github.com/MariaDB/server/blob/93ac7ae70ff000353538f732899b421a3f2ea7ce/scripts/mysqld_multi.sh#L324

It looks like the SST scripts currently need to be able to read the following variables from the [mysqldN] option groups:

  • innodb-data-home-dir

https://github.com/MariaDB/server/blob/93ac7ae70ff000353538f732899b421a3f2ea7ce/scripts/wsrep_sst_rsync.sh#L174

https://github.com/MariaDB/server/blob/93ac7ae70ff000353538f732899b421a3f2ea7ce/scripts/wsrep_sst_mariabackup.sh#L710

  • innodb-log-group-home-dir

https://github.com/MariaDB/server/blob/93ac7ae70ff000353538f732899b421a3f2ea7ce/scripts/wsrep_sst_rsync.sh#L156

https://github.com/MariaDB/server/blob/93ac7ae70ff000353538f732899b421a3f2ea7ce/scripts/wsrep_sst_mariabackup.sh#L955

  • innodb-undo-directory

https://github.com/MariaDB/server/blob/93ac7ae70ff000353538f732899b421a3f2ea7ce/scripts/wsrep_sst_mariabackup.sh#L956

  • log-bin

https://github.com/MariaDB/server/blob/93ac7ae70ff000353538f732899b421a3f2ea7ce/scripts/wsrep_sst_mariabackup.sh#L1034

As an alternative fix, should they all be converted into command-line options that the server passes to the SST scripts, so that the SST scripts don't have to read them from the configuration file at all?



 Comments   
Comment by Daniel Black [ 2019-03-12 ]

GeoffMontee I believe this is all fixed in https://github.com/MariaDB/server/pull/1217

Note https://github.com/MariaDB/server/blob/93ac7ae70ff000353538f732899b421a3f2ea7ce/scripts/wsrep_sst_rsync.sh#L156, in parse_cnf calls my_print default with all the mysqld default group/extra/file (https://github.com/MariaDB/server/blob/93ac7ae70ff000353538f732899b421a3f2ea7ce/scripts/wsrep_sst_common.sh#L196). So any retrieval of mysqld config options in the script need only use `parse_cnf --mysqld

{varname}

`

Comment by Geoff Montee (Inactive) [ 2019-03-12 ]

Hi danblack,

So any retrieval of mysqld config options in the script need only use `parse_cnf --mysqld $varname`

That is the true when mysqld_multi is not being used However, that is not the case when mysqld_multi is in use.

Consider the following test configuration file:

$ cat /tmp/test.cnf
[mysqld]
port=3307
 
[server]
socket=/tmp/socket.sock
 
[mysqld1]
datadir=/tmp/datadir1
 
[mysqld2]
datadir=/tmp/datadir2

What do we get if we execute just "my_print_defaults --defaults-file=/tmp/test.cnf --mysqld"? In that case, it ignores the mysqld_multi option groups, as would be expected:

$ my_print_defaults --defaults-file=/tmp/test.cnf --mysqld
--port=3307
--socket=/tmp/socket.sock

There are two ways to get the proper configuration options for a mysqld_multi instance.

1.) Use default-group-suffix:

$ my_print_defaults --defaults-file=/tmp/test.cnf --defaults-group-suffix=1 --mysqld
--port=3307
--socket=/tmp/socket.sock
--datadir=/tmp/datadir1

2.) Explicitly specify the option group:

$ my_print_defaults --defaults-file=/tmp/test.cnf --mysqld mysqld1
--port=3307
--socket=/tmp/socket.sock
--datadir=/tmp/datadir1

So:

I believe this is all fixed in https://github.com/MariaDB/server/pull/1217

This issue is not fixed in that PR. That PR actually removes some references to WSREP_SST_OPT_SUFFIX_VALUE, but I see that is still being passed to my_print_defaults here:

https://github.com/MariaDB/server/blob/93ac7ae70ff000353538f732899b421a3f2ea7ce/scripts/wsrep_sst_common.sh#L196

To fix this, I think we would still need a couple changes:

1.) We would need to ensure that mysqld_multi sets --defaults-group-suffix to mysqld instead of manually reading the options itself here:

https://github.com/MariaDB/server/blob/93ac7ae70ff000353538f732899b421a3f2ea7ce/scripts/mysqld_multi.sh#L324

2.) We might also need to pass $WSREP_SST_OPT_CONF to mariabackup here:

https://github.com/MariaDB/server/blob/93ac7ae70ff000353538f732899b421a3f2ea7ce/scripts/wsrep_sst_mariabackup.sh#L747

Comment by Daniel Black [ 2019-03-12 ]

Yep. Your right. I did assume mysqld_multi was doing something already like 2.

1) mysqld_multi to use --defaults-group-suffix sounds sane to me.
2) possibly - I don't know

In the same theme of supporting multi instance galera - was I preemtive in removing galera bits from https://github.com/MariaDB/server/pull/510? I'm not overly keep to change it immediately as even a month after a review noone has been bothered to merge it.

Comment by Geoff Montee (Inactive) [ 2019-03-12 ]

Hi danblack,

I don't personally think it's a good change to remove the Galera bits from the systemd template unit file for multi-instances. Ironically, the user who ran into this Galera issue with mysqld_multi decided to try out multi-instances with systemd instead, and that has met their needs so far. If Galera support were completely removed from the systemd template unit file, then that would be a step backwards for quite a few users. If there is an issue with the SST scripts when using multiple instances with systemd, then I think we should try to determine some way to fix those issues instead of getting rid of the Galera support.

Comment by Daniel Black [ 2019-03-12 ]

It was removed as a testing gap because I couldn't even get someone to look at it for a year and hoped some simplification might help it get merged. Some of the flexibility added means the existing systemd environment based method for initialization wasn't usable with PermissionsStartOnly=true removed. The mechanism in https://github.com/MariaDB/server/pull/1143 however nicely fits with with the pr 510 changes. With an existing user base I'll happily merge a pr 1143 like mechanism somehow. As this is a supported feature that users want I'd like to see some buildbot tests for this.

Comment by Geoff Montee (Inactive) [ 2019-03-13 ]

Hi danblack,

Something similar to PR #1143 sounds good to me, if the goal is to get rid of PermissionsStartOnly=true.

Do you know why PR #1143 uses EnvironmentFile to set environment variables instead of just using Environment to set them? It seems like Environment would work just fine to me, and adding files to the equation has the potential to introduce bugs.

Comment by Sergei Golubchik [ 2019-05-15 ]

as far as the original bug is concerned, I'd think mysqld must pass to sst script all its (mysqld's) command line arguments.

Not a hard-coded set of options, like now:

sql/wsrep_sst.cc

    if (my_defaults_file)
      ptr= strxnmov(ptr, end - ptr,
                    WSREP_SST_OPT_CONF, " '", my_defaults_file, "' ", NULL);
 
    if (my_defaults_extra_file)
      ptr= strxnmov(ptr, end - ptr,
                    WSREP_SST_OPT_CONF_EXTRA, " '", my_defaults_extra_file, "' ", NULL);
 
    if (my_defaults_group_suffix)
      ptr= strxnmov(ptr, end - ptr,
                    WSREP_SST_OPT_CONF_SUFFIX, " '", my_defaults_group_suffix, "' ", NULL);

(which, btw, will fail if he path contains a single quote character). Instead, it should simply append all strings from orig_argv array. It'll handle mysqld_multi case, and it'll work when, say, --log-bin is specificed manually on the command line.

Comment by Geoff Montee (Inactive) [ 2019-05-15 ]

Hi serg,

That sounds like it would be a good approach, but I think that would also require changes to the SST scripts because:

  • I believe that the SST scripts currently ignore most of mysqld's arguments instead of passing them to mariabackup or xtrabackup.
  • I believe that the SST scripts currently use different argument names than mysqld to refer to the same option in some cases. For example, SST scripts use --binlog instead of --log-bin to refer to the binary log basename.
  • I believe that the SST scripts currently use the same argument name as mysqld to refer to a different option in some cases. For example, SST scripts use the --port argument to refer to the port used for the SST transfer, while mysqld uses the --port option to refer to the port that it listens on.

See here:

https://github.com/MariaDB/server/blob/mariadb-10.4.4/scripts/wsrep_sst_common.sh#L33

If we go with your idea, then it looks like we would also have to make a lot of changes to how the SST scripts handle arguments, and we might need to rename some SST script arguments, so that they don't conflict with mysqld's arguments.

Comment by Jan Lindström (Inactive) [ 2019-05-16 ]

After some code reading it looks like a good approach. Just as addition we need to rename the hard coded arguments so that they have different names compared to one's in command line. It could be smaller change where all command line parameters are just put to sst command line and not used one's ignored on wsrep_sst_common.sh.

Comment by Sergei Golubchik [ 2019-05-16 ]

another option would be to add to SST scripts an option --mysqld or --mysqld-args that would mean "all options after it follow mysqld syntax". Then there would be no need to rename options in the server, like log-bin to --binlog or --port to something else.

Comment by Julius Goryavsky [ 2019-06-18 ]

To correct this issue, we need to transfer all parameters of the original mysqld call to the SST script , and in the SST scripts themselves we need to transfer of these parameters to utilities such as mariabackup. To prevent these parameters from mixing with the script's own parameters, they should be transferred to SST script after the special option "--mysqld-args", followed by the line of the original parameters, as received by mysqld call at the time of launch (further all these parameters will be passed to mariabackup, for example).

In addition, the SST scripts themselves must be refined so that they can read the parameters from the user-selected group, not just from the global mysqld configuration group. And also so that they can receive the parameters (which important for their work) as command-line arguments.

Patch here: https://github.com/MariaDB/server/compare/10.5...bb-10.5-MDEV-18863?expand=1

Comment by Jan Lindström (Inactive) [ 2019-06-19 ]

See my comments on git and please port this to 10.1

Comment by Julius Goryavsky [ 2019-07-30 ]

I tried to correct all the shortcomings on the results of the review and made some small improvements to the patch itself - this is basic version for the 10.1 branch: https://github.com/MariaDB/server/commit/6e7407eea73e24b0b710bc7fcdfae8d266744d91

Comment by Jan Lindström (Inactive) [ 2019-08-12 ]

ok to push in 10.1. Please then do 10.1->10.2 merge, retest and push. Similar thing should be done for 10.3 and 10.4.

Comment by Julius Goryavsky [ 2019-08-19 ]

After review and testing, the correction is applied to 10.1, 10.2, 10.3 and 10.4 branches

Comment by Marko Mäkelä [ 2019-08-20 ]

I did not see this being applied to 10.2. On 10.1, this broke the debug build for me.

I did the merge to 10.2. On my system, the += syntax that was added to scripts/wsrep_sst_common causes a failure of scripts/wsrep_sst_rsync, because in 10.2, that script is for /bin/sh (which is implemented by dash on Debian), not bash. I think that dependencies to Bash should be avoided, because a simpler shell should have a smaller overhead and fewer security holes.

Comment by Marko Mäkelä [ 2019-08-20 ]

I pushed a follow-up change that makes wsrep_sst_rsync use Bash instead of Bourne shell. The following Galera-related scripts are still declared to use the Bourne shell:

  • scripts/galera_new_cluster.sh
  • scripts/galera_recovery.sh
  • support-files/wsrep_notify.sh
  • mysql-test/std_data/wsrep_notify.sh (why does this exist, and why does it differ from the above?)

The following Galera-related scripts depend on Bash:

  • scripts/wsrep_sst_mariabackup.sh
  • scripts/wsrep_sst_mysqldump.sh
  • scripts/wsrep_sst_rsync.sh
  • scripts/wsrep_sst_xtrabackup-v2.sh
  • scripts/wsrep_sst_xtrabackup.sh

I see that MDEV-17835 in MariaDB 10.3.13 and 10.4.2 changed scripts/wsrep_sst_rsync.sh to use Bash instead of the Bourne shell. Only the 10.2 version was free from that dependency since MariaDB 10.2.3, which is when we applied spil’s contribution to remove Bash dependencies from the wsrep scripts.

Generated at Thu Feb 08 08:47:18 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.