[MDEV-24615] MariaDB 10.5.8 Galera node fails to start with WSREP: std::bad_alloc Created: 2021-01-18 Updated: 2021-08-16 Resolved: 2021-08-16 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Galera |
| Affects Version/s: | 10.5.8 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Otto Kekäläinen | Assignee: | Alexey |
| Resolution: | Duplicate | Votes: | 1 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Description |
|
While running a MariaDB 10.5.8 Galera node and issuing `sudo systemctl restart mariadb` it failed to restart and run IST to join the cluster where it was a member of before the restart. Maybe galera.gcache got corrupted on shutdown?
I also came across this: https://stackoverflow.com/questions/64834855/mariadb-cant-start-wsrep-stdbad-alloc/ - so it seems this is affecting multiple users. Deleting the galera.gcache makes the server start, but then it runs a full SST and it is only a temporary solution, as whatever corrupted the galera.gcache file in the first place is still there.
|
| Comments |
| Comment by Aurélien LEQUOY [ 2021-04-14 ] |
|
I got the same trouble in 10.5, one node in cluster Crashed (he was a slave), reboot without Galera, and I got it when i tried to reboot in cluster (without be a slave) |
| Comment by Aurélien LEQUOY [ 2021-04-14 ] |
|
version affected 10.5.9 |
| Comment by Hayden Seitz [ 2021-05-05 ] |
|
I have submitted a similar error: https://jira.mariadb.org/browse/MDEV-25605 Instead of "[ERROR] WSREP: std::bad_alloc", my cluster nodes are reporting "[ERROR] WSREP: deque::_M_new_elements_at_back" |
| Comment by Sergio Charrua [ 2021-05-28 ] |
|
Have the same issue here, with MariaDB 10.5.9. [mariadb-10.5] Though I had wsrep_on=ON, there is only 1 node running (all other nodes are off line). This setup used to work until yesterday 14:30 GMT Yesterday's logs: I also noticed thousands of relay log files, since 1st May, with just about 600bytes sizes. The only way I had to solve the issue (mariadb not starting) was to set Today I made a new test, by setting wsrep to ON again. This is the result after a restart of mariadb service: Today's log shows these errors: had to set wsrep=OFF once again to make it start. Note: MariaDB was working fine since a few weeks, as a standalone node with the wsrep=ON option. |
| Comment by Alexey [ 2021-07-19 ] |
|
Gcache ring buffer is not synced to disk and is not guaranteed to be consistent on restart. Hence some bogus numbers can be read from disk and care must be taken of potential exceptions. In particular here insert may throw exception for bogus seqno value. Fixed in commit 4bb58377cf3ee02e4c69ce329c2e099b07c79368 |