[MDEV-29346] update_rows_log_event hung causing galera cluster failure Created: 2022-08-22 Updated: 2023-12-12 |
|
| Status: | Stalled |
| Project: | MariaDB Server |
| Component/s: | Galera |
| Affects Version/s: | 10.6.5 |
| Fix Version/s: | 10.6 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Khai Ping | Assignee: | Julius Goryavsky |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | None | ||
| Environment: |
3 Node Galera Cluster |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
We have multiple galera clusters working in a multi-master setup. And noticed that a "sleeping" system thread could hung the whole cluster. When this system thread hung as shown in the screenshot, the whole galera cluster goes into a stand still. Nothing an be written into the database We have a log that print the "wsrep_last_committed", it shows that one of the node 's wsrep_last_commited is not moving. Did the wsrep plugin in Galera hung? The h5 server is the one that stuck. There is nothing in the mysql.err showing any stacktrace
The only solution to "unbreak" it is to stop the hung node, kill mariadb and start the mariadb service |
| Comments |
| Comment by Daniel Black [ 2022-08-22 ] | |||||||||||||||||||||||||||||||||||
|
Can you: | |||||||||||||||||||||||||||||||||||
| Comment by Khai Ping [ 2022-08-23 ] | |||||||||||||||||||||||||||||||||||
|
@daniel, does installing the debug-info packages have any performance impact ? | |||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2022-08-23 ] | |||||||||||||||||||||||||||||||||||
|
No, they are information only and used by gdb. Small bit of storage but no impacts to the running server or any replacement of code. | |||||||||||||||||||||||||||||||||||
| Comment by Khai Ping [ 2022-08-23 ] | |||||||||||||||||||||||||||||||||||
|
thank you,i will come back with more information | |||||||||||||||||||||||||||||||||||
| Comment by Khai Ping [ 2022-08-24 ] | |||||||||||||||||||||||||||||||||||
|
@daniel, Does this means i do not need to install the debug info packages? As my binary is not stripped.
When i tried to gdb attach <pid>, i get these lines. Does that mean i need to install the debuginfo?
| |||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2022-08-24 ] | |||||||||||||||||||||||||||||||||||
|
The binary is not technically stripped however a split-debug technique commonly used means that the debug info isn't in the binary, but in separate files, hence the debuginfo packages are still needed. Missing debug information from the libraries mariadb uses isn't a large impediment as the fault is unlikely to be in these libraries. If in doubt, just include the generated gdb information. If for some reason you feel uncomfortable with the detail in the gdb output you can upload it privately to the ftp server. | |||||||||||||||||||||||||||||||||||
| Comment by Khai Ping [ 2022-09-06 ] | |||||||||||||||||||||||||||||||||||
|
@daniel, we are building our own mariadb using the spec file , however the debuginfo rpm is not getting generated for 10.6.5 , however it is getting generated for 10.6.9 any idea what could be causing it? | |||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2022-09-26 ] | |||||||||||||||||||||||||||||||||||
|
> we are building our own mariadb using the spec file , Why? What is it? > however the debuginfo rpm is not getting generated for 10.6.5 , however it is getting generated for 10.6.9 No. I could guess the cmake version is different. But I can't think of a code change that made this difference. | |||||||||||||||||||||||||||||||||||
| Comment by Khai Ping [ 2022-10-18 ] | |||||||||||||||||||||||||||||||||||
|
hi daniel, the command provided by the doc does not seems to work in my system.
The command above give me output like this
However, this alternate command seems to be working
Sample output looks like this, is this something you guys are looking for?
| |||||||||||||||||||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2023-01-02 ] | |||||||||||||||||||||||||||||||||||
|
khaiping.loh Yes, that output would be more than useful. Please provide also full error log. Can you try with more recent version of MariaDB and Galera library. | |||||||||||||||||||||||||||||||||||
| Comment by king [ 2023-01-30 ] | |||||||||||||||||||||||||||||||||||
|
10.6.8 have the same problem | |||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2023-01-30 ] | |||||||||||||||||||||||||||||||||||
|
> Please provide also full error log and the full output of the sudo gdb --batch .... | |||||||||||||||||||||||||||||||||||
| Comment by Khai Ping [ 2023-02-10 ] | |||||||||||||||||||||||||||||||||||
|
@daniel , i have attached mariadbd_full_bt_all_threads.txt Is this issue resolve in mariadb 10.6.12? I am referencing this ticket https://jira.mariadb.org/browse/MDEV-29684, it seems like it is fixed? | |||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2023-02-10 ] | |||||||||||||||||||||||||||||||||||
|
Thank you. What analysis have you done that makes you think it is This does have killed threads holding locks so it potentially the same, but a more complete look than what I have time for now is required to be more definate. | |||||||||||||||||||||||||||||||||||
| Comment by Khai Ping [ 2023-02-10 ] | |||||||||||||||||||||||||||||||||||
Appreciate your prompt response, i hope the bt thread logs is helpful. That log is retrieve from the node that hung. | |||||||||||||||||||||||||||||||||||
| Comment by Khai Ping [ 2023-02-11 ] | |||||||||||||||||||||||||||||||||||
|
uploaded another logs mariadbd_full_bt_all_threads_11feb246.txt | |||||||||||||||||||||||||||||||||||
| Comment by Khai Ping [ 2023-02-15 ] | |||||||||||||||||||||||||||||||||||
|
@Julius Goryavsky , any idea if the logs were useful in helping to find out if it is related to | |||||||||||||||||||||||||||||||||||
| Comment by Julien Fritsch [ 2023-12-05 ] | |||||||||||||||||||||||||||||||||||
|
Automated message: | |||||||||||||||||||||||||||||||||||
| Comment by JiraAutomate [ 2023-12-05 ] | |||||||||||||||||||||||||||||||||||
|
Automated message: |