[MDEV-16488] Cannot bootstrap the cluster for [ERROR] WSREP: failed to open gcomm backend connection: 131: invalid UUID: 00000000 Created: 2018-06-14 Updated: 2019-05-13 Resolved: 2019-05-13 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Galera |
| Affects Version/s: | 10.3.7 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Zdravelina Sokolovska (Inactive) | Assignee: | Jan Lindström (Inactive) |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | regression | ||
| Environment: |
3 Galera Nodes Master-Master,CentOS 7.4 |
||
| Attachments: |
|
| Description |
|
Cannot bootstrap the cluster for [ERROR] WSREP: failed to open gcomm backend connection: 131: invalid UUID: 00000000 Problem occurred on Galera cluster composed from 3 Nodes. Cluster remained standby in that state until the occurrence of power down - power up events.It seems that Node2 succeed to sync with the group as it shows the same GTID
Note: that Errors was received after trying to bootstraping 1 Node with The same Error was received also from rejoining Nodes with commands Initial GTID positions:
The GTID positions seen in recovery logs:
bootstraping cluster
node 6
|
| Comments |
| Comment by Zdravelina Sokolovska (Inactive) [ 2018-06-14 ] | ||||||||||||||||||
|
It seems that Nodes booted for some reason with empty gvwstate.dat files and that caused the reception of | ||||||||||||||||||
| Comment by Zdravelina Sokolovska (Inactive) [ 2018-06-14 ] | ||||||||||||||||||
|
Current Workaround: mv /var/lib/mysql/gvwstate.dat /var/lib/mysql/gvwstate.dat.bb | ||||||||||||||||||
| Comment by Mario Karuza (Inactive) [ 2018-07-10 ] | ||||||||||||||||||
|
winstone Do you have logs before bootstrapping galera node with this problem ? Can gvwstate.dat be attached ? Is it just "empty" file or does it have anything written in it ? I could not reproduce this problem, and looking how this file is handle inside galera it should be highly unlikely to come to this state / problem. | ||||||||||||||||||
| Comment by Zdravelina Sokolovska (Inactive) [ 2018-07-13 ] | ||||||||||||||||||
|
mkaruza, have attached the logs from all nodes.The file gvwstate.dat was empty without anything written into. | ||||||||||||||||||
| Comment by Mario Karuza (Inactive) [ 2018-07-13 ] | ||||||||||||||||||
|
winstone Do you have logs "before" this problem appears ? Also, if this gvwstate.dat.bb is empty, can you provide information when it was last written to? | ||||||||||||||||||
| Comment by Zdravelina Sokolovska (Inactive) [ 2018-07-18 ] | ||||||||||||||||||
|
actually before the occurrence of the problem, all Nodes were in OPERATIONAL state ; 2X Synced and 1 Desynced and cluster was operational ; mkaruza, the stat file [root@t4w6 mysql]# stat gvwstate.dat.bb | ||||||||||||||||||
| Comment by Mario Karuza (Inactive) [ 2018-07-19 ] | ||||||||||||||||||
|
Following the analysis of provided logs accompanied with provided data ( modify date/time of gvwstate.dat ) and source code of galera this are conclusions: 1. gvwstate.dat is only written on one place in code. In function gcomm::PC::handle_up and there are few conditions that must be true so this can happen. One of important condition is that node should receive PRIMARY view. If all conditions are correct, before writing to gvwstate.dat log message "save pc into disk" should be seen in node logs, which is not true for any of provided node logs. 1. Node2 which was manually desynced.
As you can see there are no logs between 17:35 on 13/6 and 6:54 on 14/6. Provided modify date/time for gvwstate.dat shows that that "someone" modified it at 6:50:45 which is strange. Node 2 after that aborts because it reads empty file. 3. Node3 also have same problem with empty gvwstate.dat file. Following log shows that there were 2 runs of mysqld where one was done with correct gvwstate.dat (started at 6:49:26 ) and second one with empty one ( started at 6:57:34 ). For first run node only received NON-PRIMARY view.
Overall i think that it should be considered that gvwstate.dat file could have been changed outside of galera process. If something goes wrong, like in this case, it is clear that there should be manual intervention. | ||||||||||||||||||
| Comment by Zdravelina Sokolovska (Inactive) [ 2018-07-23 ] | ||||||||||||||||||
|
It's found that bootstrapping cluster failure with | ||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2019-05-13 ] | ||||||||||||||||||
|
If galera cache files are modified outside of the mariadb process further actions require manual correction e.g. totally removing that file. |