[MDEV-15748] Unable to stop mariadb.service or mysqld run with wsrep Created: 2018-04-02 Updated: 2021-04-19 Resolved: 2021-01-25 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Galera, Server, wsrep |
| Affects Version/s: | 10.2.30 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Zdravelina Sokolovska (Inactive) | Assignee: | Seppo Jaakola |
| Resolution: | Incomplete | Votes: | 2 |
| Labels: | need_feedback | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
Unable to stop mariadb.service or mysqld run with wsrep with either of systemctl mariadb.service or service mysql how to repeat : on all nodes was received the error , but mysqld it's found in the processlist
Mar 30 20:17:33 t4w6.xentio.lan systemd[1]: Starting MariaDB 10.3.5 database server...
[root@t4w3 ~]# ps aux | grep mysql [root@t4w5 ~]# ps aux | grep mysql [root@t4w6 ~]# ps aux | grep mysql
|
| Comments |
| Comment by Mario Karuza (Inactive) [ 2018-07-13 ] | ||||||||||||||
|
winstone Can you attach logs ? | ||||||||||||||
| Comment by Zdravelina Sokolovska (Inactive) [ 2018-07-16 ] | ||||||||||||||
|
mkaruza, attached logs; We have actually WSREP Errors " Failed to open backend connection: -98 (Address already in use) " as a consequence of mariadb.service being remained in failed mode after Skipping SIGKILL.
That might be related to Daniel Black's analysis on starting service failure: services shouldn't start if there is residual processes left over (in SendSIGKILL=no case) | ||||||||||||||
| Comment by Mario Karuza (Inactive) [ 2018-07-18 ] | ||||||||||||||
|
winstone Do you have enabled pc.wait_prim ? Can you paste params that you provide to galera ? | ||||||||||||||
| Comment by Mario Karuza (Inactive) [ 2018-07-18 ] | ||||||||||||||
|
Problem is duplicate of Node 1 could not run because it can't bind to address / port. Nodes 2 & 3 seems that are successful. There could be problem, if all 3 nodes die and later one of them is not joined in previous saved group. This will block signal to kill mysqld daemon. | ||||||||||||||
| Comment by Zdravelina Sokolovska (Inactive) [ 2018-07-18 ] | ||||||||||||||
|
mkaruza, pc.wait_prim, pc.wait and pc.wait_prim_timeout wsrep provider options are set to theirs default values, eg they are not changed | ||||||||||||||
| Comment by Mario Karuza (Inactive) [ 2018-07-19 ] | ||||||||||||||
|
As mentioned in previous comment. Issue for Node 1 is duplicate, abort due error 'Address already in use'. | ||||||||||||||
| Comment by Zdravelina Sokolovska (Inactive) [ 2018-07-19 ] | ||||||||||||||
|
actually the issue abort due error 'Address already in use'. in that case occurred as a consequence of the current problem , eg being not able to stop mysqld run with wsrep | ||||||||||||||
| Comment by Mario Karuza (Inactive) [ 2018-07-20 ] | ||||||||||||||
|
winstone Than please provide concrete logs which leads to this problem. Traces that are attached doesn't show anything for analysis of this problem | ||||||||||||||
| Comment by Zdravelina Sokolovska (Inactive) [ 2018-07-20 ] | ||||||||||||||
|
mkaruza, those are all logs including error logs, issued by WSREP and InnoDB and get by enabling error logging in server cnf . | ||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2019-12-12 ] | ||||||||||||||
|
Is this really repeatable still ? | ||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2020-12-17 ] | ||||||||||||||
|
Firstly, this looks a bug not a feature request. However, to analyze we would really need some way to reproduce the case and that could be problematic as how to cause those BF long waits is also not known. Here is some idea how testing could be done:
|