Details
-
Bug
-
Status: Open (View Workflow)
-
Minor
-
Resolution: Unresolved
-
5.5.34
-
None
-
Linux
Description
Already reported to http://bugs.mysql.com/71194
When running some MySQL daemon A and we are trying to run another instance B, while these conditions are met:
- network ports are different for A and B
- unix socket location is the same for A and B
Then the new daemon B removes the unix socket file that is actually necessary for daemon A.
How to repeat:
Steps to reproduce:
$ /usr/libexec/mysqld --port 13306 --datadir /var/lib/mysql/
$ fuser /var/lib/mysql/mysql.sock
$ /usr/libexec/mysqld --port 13307 --datadir /var/lib/mysql2/
$ fuser /var/lib/mysql/mysql.sock
Actual results:
/var/lib/mysql/mysql.sock: 5683
/var/lib/mysql/mysql.sock: 5717
which means the first daemon is not able to accept connections on the unix socket
Expected results:
/var/lib/mysql/mysql.sock: 5683
/var/lib/mysql/mysql.sock: 5683
the second daemon shouldn't start at all
Suggested fix:
Either check if some proc is attached to the socket or (portable solution) having a lock file for the socket file, that would contain pid of the process using the socket file.
I tested by removing the unlink of the unix socket before the bind and indeed the second instance fails to start reporting the error:
$ strace -s 99 -e trace=network sql/mysqld --skip-networking --datadir=/tmp/datadir2 --socket /tmp/s.sock --lc-messages-dir=`pwd`/sql/share --verbose
2017-12-30 15:44:20 139830008162496 [Note] sql/mysqld (mysqld 10.2.12-MariaDB) starting as process 22939 ...
2017-12-30 15:44:20 139830008162496 [Warning] Changed limits: max_open_files: 1024 max_connections: 151 table_cache: 431
socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 3
2017-12-30 15:44:20 139830008162496 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2017-12-30 15:44:20 139830008162496 [Note] InnoDB: Uses event mutexes
2017-12-30 15:44:20 139830008162496 [Note] InnoDB: Compressed tables use zlib 1.2.8
2017-12-30 15:44:20 139830008162496 [Note] InnoDB: Using Linux native AIO
2017-12-30 15:44:20 139830008162496 [Note] InnoDB: Number of pools: 1
2017-12-30 15:44:20 139830008162496 [Note] InnoDB: Using SSE2 crc32 instructions
2017-12-30 15:44:20 139830008162496 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2017-12-30 15:44:20 139830008162496 [Note] InnoDB: Completed initialization of buffer pool
2017-12-30 15:44:20 139829451699968 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2017-12-30 15:44:20 139830008162496 [Note] InnoDB: Highest supported file format is Barracuda.
2017-12-30 15:44:20 139830008162496 [Note] InnoDB: 128 out of 128 rollback segments are active.
2017-12-30 15:44:20 139830008162496 [Note] InnoDB: Creating shared tablespace for temporary tables
2017-12-30 15:44:20 139830008162496 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2017-12-30 15:44:20 139830008162496 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2017-12-30 15:44:20 139830008162496 [Note] InnoDB: 5.7.20 started; log sequence number 1619282
2017-12-30 15:44:20 139829077591808 [Note] InnoDB: Loading buffer pool(s) from /tmp/datadir2/ib_buffer_pool
2017-12-30 15:44:20 139829077591808 [Note] InnoDB: Buffer pool(s) load completed at 171230 15:44:20
2017-12-30 15:44:20 139830008162496 [Note] Plugin 'FEEDBACK' is disabled.
socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC, 0) = 20
setsockopt(20, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(20, {sa_family=AF_UNIX, sun_path="/tmp/s.sock"}, 110) = -1 EADDRINUSE (Address already in use)
2017-12-30 15:44:20 139830008162496 [ERROR] Can't start server : Bind on unix socket: Address already in use
2017-12-30 15:44:20 139830008162496 [ERROR] Do you already have another mysqld server running on socket: /tmp/s.sock ?
2017-12-30 15:44:20 139830008162496 [ERROR] Aborting
So removing the unlink is the minimal fix.
A more comprehensive check could test if the existing socket is responsive and only in the non-responsive case remove it like https://github.com/grooverdan/mariadb-server/commit/f4191b0628531b3e0ebe1d2ce53eb8312433fde6 (breaks the way some of mtr works).
This totally ignores your suggested fix mainly because I see it as too susceptible to race conditions and the behaviour of the other instance (which might be a different server version). The existing implementation and my variant are still susceptible to race conditions (in which the second server will abort) however as there isn't a truncate option with bind there isn't a race free implementation.
Do you think this is on the right track?