[MDEV-7631] Invalid WSREP_SST rows appear in mysqld-bin.index file Created: 2015-02-25 Updated: 2015-09-04 Resolved: 2015-06-25 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Galera SST |
| Affects Version/s: | 10.0.16-galera |
| Fix Version/s: | 10.0.20-galera |
| Type: | Bug | Priority: | Critical |
| Reporter: | Ragnar Rova | Assignee: | Nirbhay Choubey (Inactive) |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | None | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Sprint: | 10.1.6-1 | ||||||||
| Description |
|
Invalid rows appear in mysqld-bin.index file, causing server to fail starting. This occurs when WSREP galera replication was enabled. The rows do not point to a valid file but are instead output of wsrep sst. Example of contents of mysqld-bin.index:
on #maria irc channel was mentioned that: "looks like when mariadb forks for a sst it should close all fds for the sst" |
| Comments |
| Comment by Ragnar Rova [ 2015-02-25 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Sample logfile, contains three consecutive runs of 'service mysql bootstrap'. Server fails to start on third attempt. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2015-02-26 ] | ||||||||||||||||||||||||||||||||||||||||||
|
I add a look and it looks like the process class used to run the sst script - class in sql/wsrep_utils.cc, could under some circumstances assume that STDIN_FD is still open. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Nirbhay Choubey (Inactive) [ 2015-02-26 ] | ||||||||||||||||||||||||||||||||||||||||||
|
That's interesting! | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Ragnar Rova [ 2015-02-26 ] | ||||||||||||||||||||||||||||||||||||||||||
|
I run bootstrap like this: service mysql bootstrap --init-file=/var/lib/mysql/reset_password.sql reset_password.sql: update mysql.user set password=PASSWORD('%(password)s') where User='root'; | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Nirbhay Choubey (Inactive) [ 2015-02-27 ] | ||||||||||||||||||||||||||||||||||||||||||
|
rrva We tried reproducing this issue, but haven't been able to repeat it so far. Can you share config files of nodes in the cluster? and other details like OS, etc? | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Ragnar Rova [ 2015-03-03 ] | ||||||||||||||||||||||||||||||||||||||||||
|
See README inside attached file repro-galera-invalid-mysqld-bin-index.tar.gz. Docker environment and fabric scripts which reproduce the bug. Docker probably not needed but included for completeness. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Ragnar Rova [ 2015-03-04 ] | ||||||||||||||||||||||||||||||||||||||||||
|
For now, I'm using the following workaround in my start scripts sed -i '/^\./mysqld-bin./!d' /var/lib/mysql/mysqld-bin.index | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Ragnar Rova [ 2015-03-05 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Were you able to reproduce using the scripts/Dockerfiles I provided? We cannot use an ugly workaround like the above on a production system. We might even remove rows which partially contain a valid mysqld-bin.index filename (when the extra output is interleaved with the correct one). This is blocking us from using MariaDB + wsrep for now, and we'd really like to. Preferably a version with the wsrep_dirty_reads fix. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Ragnar Rova [ 2015-04-09 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Any updates on this issue? | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2015-04-09 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Yes, sorry. We've repeated this issue using your files. Thanks! There is no fix yet, but there is still time till the next 10.0 release. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Ragnar Rova [ 2015-05-19 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Anything I can do to help with this one? Unfortunately it was moved to the next release... | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Ragnar Rova [ 2015-06-16 ] | ||||||||||||||||||||||||||||||||||||||||||
|
I've done some debugging using sysdig, and it seems that when the script wsrep_rsync_sst runs, file descriptor 2 points to /var/lib/mysql/mysqld-bin.index. Excerpt of sysdig capture (can provide whole file with events from all processes):
| ||||||||||||||||||||||||||||||||||||||||||
| Comment by Ragnar Rova [ 2015-06-18 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Could it be the following mysqld closes stderr (fd=2) at some point, so this fd becomes free for new open() calls ? (just speculation) | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2015-06-18 ] | ||||||||||||||||||||||||||||||||||||||||||
|
I don't think mysqld closes stderr. But it recently started to close stdin. If it — at some point — freopen's stderr, then stderr can get fd=0, thus making fd=2 free. Wait… You have log-error in your my.cnf. This means, indeed, freopen for stderr! Try to remove it and see if that helps. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2015-06-18 ] | ||||||||||||||||||||||||||||||||||||||||||
|
looks like it. ./scripts/wsrep_sst_common.sh includes writing to FD 2 for for stderr so this implicitly forms part of the calling convention for sst scripts (should be documented http://galeracluster.com/documentation-webpages/scriptablesst.html) doesn't look like sql/wsrep_utils.cc wsp::process or wsrep_sst.c::sst_joiner_thread do anything special like making the FD 2 mean stderr | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Ragnar Rova [ 2015-06-18 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Removing log-error in my.cnf does not help. If I patch wsrep_sst_rsync with the following, I get rid of the corruption of mysqld-bin.index at least
What do you think of the above as a workaround? Is output to stderr from the sst script for informative reasons or is it processed somehow so its expected to end up elsewhere? Using the patch above, it looks like this now.
I think a fix which guarantees that fd=0, 1, 2 never point to anything else than desired before invoking external commands is needed. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2015-06-18 ] | ||||||||||||||||||||||||||||||||||||||||||
Yes, we cannot expect scripts not to use these descriptors. I'll do something along these lines. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Ragnar Rova [ 2015-06-18 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Attaching sysdig capture file mysqld_fd2_usage.scap.gz read it with sysdig -r or use the text output. evt.num=91048 and evt.num=91049 is the corrupting write call. there is one later at evt.num=333506 as well. backtracking from those events, maybe the management of fd=2 and/or mysqld-bin.index gives some clues. mysqld-bin.index after test run:
| ||||||||||||||||||||||||||||||||||||||||||
| Comment by Ragnar Rova [ 2015-06-18 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Oh, sorry maybe you don't need those sysdig files, your fix sounds good. Thanks! | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2015-06-19 ] | ||||||||||||||||||||||||||||||||||||||||||
|
> I think a fix which guarantees that fd=0, 1, 2 never point to anything else than desired before invoking external commands is needed. more so closing every mysqld FD excluding stdout/stderr that is inherited in the subprocess (FD_CLOEXEC; see fcntl(2)) or something explicit. I believe the dup2 syscall is what you are after to ensure the log-error fd is assigned the correct number. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Ragnar Rova [ 2015-06-19 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Ok. O_CLOEXEC when using open() on mysqld-bin.index is even better? man open
| ||||||||||||||||||||||||||||||||||||||||||
| Comment by Ragnar Rova [ 2015-06-19 ] | ||||||||||||||||||||||||||||||||||||||||||
|
I also found this http://danwalsh.livejournal.com/53603.html | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2015-06-19 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Okay, it turned out to be a CentOS 6 issue.
prints 2 for me on centos 7, gentoo, freebsd 9.2, ubuntu trusty and vivid. But on centos 6 it prints 0.
| ||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2015-06-19 ] | ||||||||||||||||||||||||||||||||||||||||||
|
By the way, normally MariaDB reopens stdout/stderr first and then closes stdin. But you're sending it SIGHUP, and on SIGHUP it closes and reopens log files. So, as a workaround you can stop doing kill -HUP. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Ragnar Rova [ 2015-06-19 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Also a RHEL 6.5 issue. Test program prints err=0 there too. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Ragnar Rova [ 2015-06-19 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Where am I doing kill -HUP? | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Ragnar Rova [ 2015-06-19 ] | ||||||||||||||||||||||||||||||||||||||||||
|
The following fix works. Patch needs to be conditional on if O_CLOEXEC is available but otherwise reading the Dan Walsh blog I think it is correct behavior, possibly even the default for mysql_file_open?
| ||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2015-06-19 ] | ||||||||||||||||||||||||||||||||||||||||||
|
I don't know. But I see in your sysdig trace that mysqld is doing what it only does for SIGHUP (prints lots of information to stdout and then reloads privilege tables, then reopens stdout/stderr). I think I'll do O_CLOSEXEC — this seems like a good thing to have. But in that case wsrep_sst logs will be lost, because fd=2 will be closed. So, while O_CLOSEXEC is good in general, the correct fix in your case is to preserve stderr file descriptor. Alas, CentOS 6 (and RHEL 6, not surprisingly) doesn't do it. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2015-06-19 ] | ||||||||||||||||||||||||||||||||||||||||||
|
nirbhay_c, I've attached cloexec.patch | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Ragnar Rova [ 2015-06-19 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Found some info regarding freopen() https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=295859 man stderr states:
| ||||||||||||||||||||||||||||||||||||||||||
| Comment by Ragnar Rova [ 2015-06-19 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Thanks a lot for the cloexec patch. Any side-effects? In addition to the cloexec.patch, which resolved my issue, I am thinking that a change in sql/log.cc reopen_fstreams Is needed, where dup2 is called to force fd=2 for stderr and fd=1 for stdout after freopen. This way subprocesses could rely on the fd numbers being the expected ones. (Or maybe fd=2 is already occupied by something else at this point in time?) For now, I am adding
At the top of /usr/bin/wsrep_sst_rsync. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2015-06-19 ] | ||||||||||||||||||||||||||||||||||||||||||
|
About freopen — this bugreport is from 2005. It was originally rejected not because it was wrong to preserve file descriptors, but because it was difficult to do, see this comment. And while man stdin was changed to say that freopen may change the descriptor, all recent distributions have fixed freopen too. Because otherwise freopen becomes almost useless, note that man freopen says
And freopen that changes file descriptors cannot be used for its primary use if the process forks children. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2015-06-20 ] | ||||||||||||||||||||||||||||||||||||||||||
|
also for the socket and accept syscalls there is a SOCK_CLOEXEC option with the same meaning. mysql_socket_socket/mysql_socket_accept sql/mysqld.cc or include/mysql/psi/mysql_socket.h ( inline_mysql_socket_socket/inline_mysql_socket_accept) | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Ragnar Rova [ 2015-06-22 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Like @serg mentioned:
I agree. Since RHEL6 is such a common platform, I think that we must have a fix for preserving fd=0,1,2 of freopen() in MariaDB, since they will not patch glibc (they are on glibc-2.12 and that will stay for quite some time). Otherwise log output of wsrep_sst_rsync is lost. The cloexec patch is a good one nevertheless. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Ragnar Rova [ 2015-06-22 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Attaching freopen-stdin-devnull.patch made by @serg which made logging of wsrep_sst_rsync output work again. Tested with the attached docker based testcase (which is CentOS 6.6). Also tested in combination with the cloexec.patch. Both work and and are good. With these two patches, my issue is resolved. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Ragnar Rova [ 2015-06-22 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Needs some ifdefs for portability, fallback to fclose(stdin) when no dev/null or freopen fails | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2015-06-23 ] | ||||||||||||||||||||||||||||||||||||||||||
|
I still think you should mark the socket/accept calls with the flag SOCK_CLOEXEC to avoid writing to open connections. https://mariadb.atlassian.net/browse/MDEV-7631?focusedCommentId=72512&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-72512 | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Nirbhay Choubey (Inactive) [ 2015-06-25 ] | ||||||||||||||||||||||||||||||||||||||||||
|
https://github.com/MariaDB/server/commit/0f4478105f5027ce589a0f08c6019d906d632029 | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2015-09-03 ] | ||||||||||||||||||||||||||||||||||||||||||
|
On a SST (rsync method) donor on a 10.0.21 instances I still see a lot of files open for tar generating binlogs.
| ||||||||||||||||||||||||||||||||||||||||||
| Comment by Ragnar Rova [ 2015-09-03 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Hmm, these files must be opened through different functions than the patched ones in mysys/ {my_open, my_fopen}? | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2015-09-04 ] | ||||||||||||||||||||||||||||||||||||||||||
|
seems so. I'll do a new bug report. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Ragnar Rova [ 2015-09-04 ] | ||||||||||||||||||||||||||||||||||||||||||
|
I did not actually test that there were no open files, just that the corruption of mysqld-bin.index did not occur, which the fix in this ticket did address. /var/lib/mysql/grastate.dat is opened by galera plugin, so they should get a report to add O_CLOEXEC as well I guess. If there are places were O_CLOEXEC cannot be added, one could consider also adding the closing of all currently open fds as well, but O_CLOEXEC was still good since it is free from race conditions. I found some methods here which are OS-specific: http://stackoverflow.com/questions/899038/getting-the-highest-allocated-file-descriptor/918469#918469 | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2015-09-04 ] | ||||||||||||||||||||||||||||||||||||||||||
|
rrva totallly agree. |