[MDEV-18863] Galera SST scripts can't read mysqld_multi's [mysqldN] option groups Created: 2019-03-08 Updated: 2020-08-25 Resolved: 2019-08-19 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Galera, Galera SST, Scripts & Clients |
| Affects Version/s: | 10.1, 10.2, 10.3, 10.4 |
| Fix Version/s: | 10.2.27, 10.1.43, 10.3.18, 10.4.8 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Geoff Montee (Inactive) | Assignee: | Julius Goryavsky |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Description |
|
mysqld_multi uses special option groups with names like [mysqld1], [mysqld2], ..., [mysqldN]. https://mariadb.com/kb/en/library/mysqld_multi/#option-groups It looks like SST scripts can't currently support these option groups. The only option group-related value it gets from the server is --defaults-group-suffix from the server, if that option was set for mysqld when the server was started: The SST script does not get told by the server to read these option groups, so this means that the SST script will fail to read options like innodb-data-home-dir when it is in a mysqld_multi option group like [mysqld1]: So maybe this problem could easily be fixed by changing mysqld_multi, so that it sets --defaults-group-suffix=N when it starts mysqld. For example, if you start instance number 1 with mysqld_multi, then it could pass --defaults-group-suffix=1 to mysqld, so that it reads group [mysqld1]. I believe that this should allow mysqld to also pass --defaults-group-suffix=N to the SST script, so that it also reads the proper option group. Currently, it does not look like the mysqld_multi script sets --defaults-group-suffix=N. Instead, the script reads all of the options from the [mysqldN] option group in the configuration file itself, and then passes those options to mysqld on the command-line: It looks like the SST scripts currently need to be able to read the following variables from the [mysqldN] option groups:
As an alternative fix, should they all be converted into command-line options that the server passes to the SST scripts, so that the SST scripts don't have to read them from the configuration file at all? |
| Comments |
| Comment by Daniel Black [ 2019-03-12 ] | |||||||||||||||||||||||
|
GeoffMontee I believe this is all fixed in https://github.com/MariaDB/server/pull/1217 Note https://github.com/MariaDB/server/blob/93ac7ae70ff000353538f732899b421a3f2ea7ce/scripts/wsrep_sst_rsync.sh#L156, in parse_cnf calls my_print default with all the mysqld default group/extra/file (https://github.com/MariaDB/server/blob/93ac7ae70ff000353538f732899b421a3f2ea7ce/scripts/wsrep_sst_common.sh#L196). So any retrieval of mysqld config options in the script need only use `parse_cnf --mysqld {varname}` | |||||||||||||||||||||||
| Comment by Geoff Montee (Inactive) [ 2019-03-12 ] | |||||||||||||||||||||||
|
Hi danblack,
That is the true when mysqld_multi is not being used However, that is not the case when mysqld_multi is in use. Consider the following test configuration file:
What do we get if we execute just "my_print_defaults --defaults-file=/tmp/test.cnf --mysqld"? In that case, it ignores the mysqld_multi option groups, as would be expected:
There are two ways to get the proper configuration options for a mysqld_multi instance. 1.) Use default-group-suffix:
2.) Explicitly specify the option group:
So:
This issue is not fixed in that PR. That PR actually removes some references to WSREP_SST_OPT_SUFFIX_VALUE, but I see that is still being passed to my_print_defaults here: To fix this, I think we would still need a couple changes: 1.) We would need to ensure that mysqld_multi sets --defaults-group-suffix to mysqld instead of manually reading the options itself here: 2.) We might also need to pass $WSREP_SST_OPT_CONF to mariabackup here: | |||||||||||||||||||||||
| Comment by Daniel Black [ 2019-03-12 ] | |||||||||||||||||||||||
|
Yep. Your right. I did assume mysqld_multi was doing something already like 2. 1) mysqld_multi to use --defaults-group-suffix sounds sane to me. In the same theme of supporting multi instance galera - was I preemtive in removing galera bits from https://github.com/MariaDB/server/pull/510? I'm not overly keep to change it immediately as even a month after a review noone has been bothered to merge it. | |||||||||||||||||||||||
| Comment by Geoff Montee (Inactive) [ 2019-03-12 ] | |||||||||||||||||||||||
|
Hi danblack, I don't personally think it's a good change to remove the Galera bits from the systemd template unit file for multi-instances. Ironically, the user who ran into this Galera issue with mysqld_multi decided to try out multi-instances with systemd instead, and that has met their needs so far. If Galera support were completely removed from the systemd template unit file, then that would be a step backwards for quite a few users. If there is an issue with the SST scripts when using multiple instances with systemd, then I think we should try to determine some way to fix those issues instead of getting rid of the Galera support. | |||||||||||||||||||||||
| Comment by Daniel Black [ 2019-03-12 ] | |||||||||||||||||||||||
|
It was removed as a testing gap because I couldn't even get someone to look at it for a year and hoped some simplification might help it get merged. Some of the flexibility added means the existing systemd environment based method for initialization wasn't usable with PermissionsStartOnly=true removed. The mechanism in https://github.com/MariaDB/server/pull/1143 however nicely fits with with the pr 510 changes. With an existing user base I'll happily merge a pr 1143 like mechanism somehow. As this is a supported feature that users want I'd like to see some buildbot tests for this. | |||||||||||||||||||||||
| Comment by Geoff Montee (Inactive) [ 2019-03-13 ] | |||||||||||||||||||||||
|
Hi danblack, Something similar to PR #1143 sounds good to me, if the goal is to get rid of PermissionsStartOnly=true. Do you know why PR #1143 uses EnvironmentFile to set environment variables instead of just using Environment to set them? It seems like Environment would work just fine to me, and adding files to the equation has the potential to introduce bugs. | |||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2019-05-15 ] | |||||||||||||||||||||||
|
as far as the original bug is concerned, I'd think mysqld must pass to sst script all its (mysqld's) command line arguments. Not a hard-coded set of options, like now:
(which, btw, will fail if he path contains a single quote character). Instead, it should simply append all strings from orig_argv array. It'll handle mysqld_multi case, and it'll work when, say, --log-bin is specificed manually on the command line. | |||||||||||||||||||||||
| Comment by Geoff Montee (Inactive) [ 2019-05-15 ] | |||||||||||||||||||||||
|
Hi serg, That sounds like it would be a good approach, but I think that would also require changes to the SST scripts because:
See here: https://github.com/MariaDB/server/blob/mariadb-10.4.4/scripts/wsrep_sst_common.sh#L33 If we go with your idea, then it looks like we would also have to make a lot of changes to how the SST scripts handle arguments, and we might need to rename some SST script arguments, so that they don't conflict with mysqld's arguments. | |||||||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2019-05-16 ] | |||||||||||||||||||||||
|
After some code reading it looks like a good approach. Just as addition we need to rename the hard coded arguments so that they have different names compared to one's in command line. It could be smaller change where all command line parameters are just put to sst command line and not used one's ignored on wsrep_sst_common.sh. | |||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2019-05-16 ] | |||||||||||||||||||||||
|
another option would be to add to SST scripts an option --mysqld or --mysqld-args that would mean "all options after it follow mysqld syntax". Then there would be no need to rename options in the server, like log-bin to --binlog or --port to something else. | |||||||||||||||||||||||
| Comment by Julius Goryavsky [ 2019-06-18 ] | |||||||||||||||||||||||
|
To correct this issue, we need to transfer all parameters of the original mysqld call to the SST script , and in the SST scripts themselves we need to transfer of these parameters to utilities such as mariabackup. To prevent these parameters from mixing with the script's own parameters, they should be transferred to SST script after the special option "--mysqld-args", followed by the line of the original parameters, as received by mysqld call at the time of launch (further all these parameters will be passed to mariabackup, for example). In addition, the SST scripts themselves must be refined so that they can read the parameters from the user-selected group, not just from the global mysqld configuration group. And also so that they can receive the parameters (which important for their work) as command-line arguments. Patch here: https://github.com/MariaDB/server/compare/10.5...bb-10.5-MDEV-18863?expand=1 | |||||||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2019-06-19 ] | |||||||||||||||||||||||
|
See my comments on git and please port this to 10.1 | |||||||||||||||||||||||
| Comment by Julius Goryavsky [ 2019-07-30 ] | |||||||||||||||||||||||
|
I tried to correct all the shortcomings on the results of the review and made some small improvements to the patch itself - this is basic version for the 10.1 branch: https://github.com/MariaDB/server/commit/6e7407eea73e24b0b710bc7fcdfae8d266744d91 | |||||||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2019-08-12 ] | |||||||||||||||||||||||
|
ok to push in 10.1. Please then do 10.1->10.2 merge, retest and push. Similar thing should be done for 10.3 and 10.4. | |||||||||||||||||||||||
| Comment by Julius Goryavsky [ 2019-08-19 ] | |||||||||||||||||||||||
|
After review and testing, the correction is applied to 10.1, 10.2, 10.3 and 10.4 branches | |||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2019-08-20 ] | |||||||||||||||||||||||
|
I did not see this being applied to 10.2. On 10.1, this broke the debug build for me. I did the merge to 10.2. On my system, the += syntax that was added to scripts/wsrep_sst_common causes a failure of scripts/wsrep_sst_rsync, because in 10.2, that script is for /bin/sh (which is implemented by dash on Debian), not bash. I think that dependencies to Bash should be avoided, because a simpler shell should have a smaller overhead and fewer security holes. | |||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2019-08-20 ] | |||||||||||||||||||||||
|
I pushed a follow-up change that makes wsrep_sst_rsync use Bash instead of Bourne shell. The following Galera-related scripts are still declared to use the Bourne shell:
The following Galera-related scripts depend on Bash:
I see that |