Details
Description
We have multiple galera clusters working in a multi-master setup. And noticed that a "sleeping" system thread could hung the whole cluster.
When this system thread hung as shown in the screenshot, the whole galera cluster goes into a stand still. Nothing an be written into the database
We have a log that print the "wsrep_last_committed", it shows that one of the node 's wsrep_last_commited is not moving. Did the wsrep plugin in Galera hung?
The h5 server is the one that stuck. There is nothing in the mysql.err showing any stacktrace
2022-08-18 06:10:04,862 INFO galera_alert line:93 galerastats on node xxx-h4: |
2022-08-18 06:10:04,861 INFO galera_alert line:94 {'error': 0, 'payload': {'output': '{"Threads_connected": "150", "wsrep_last_committed": "21383020", |
2022-08-18 06:10:04,862 INFO galera_alert line:93 galerastats on node xxx-h5: |
2022-08-18 06:10:04,862 INFO galera_alert line:94 {'error': 0, 'payload': {'output': '{"Threads_connected": "590", "wsrep_last_committed": "21382990", |
2022-08-18 06:10:04,863 INFO galera_alert line:93 galerastats on node xxx-h6: |
2022-08-18 06:10:04,863 INFO galera_alert line:94 {'error': 0, 'payload': {'output': '{"Threads_connected": "204", "wsrep_last_committed": "21383020", |
....
|
....
|
2022-08-18 06:30:04,996 INFO galera_alert line:93 galerastats on node xxx-h4: |
2022-08-18 06:30:04,996 INFO galera_alert line:94 {'error': 0, 'payload': {'output': '{"Threads_connected": "170", "wsrep_last_committed": "21383020", |
2022-08-18 06:30:04,997 INFO galera_alert line:93 galerastats on node xxx-h5: |
2022-08-18 06:30:04,997 INFO galera_alert line:94 {'error': 0, 'payload': {'output': '{"Threads_connected": "643", "wsrep_last_committed": "21382990", |
2022-08-18 06:30:04,997 INFO galera_alert line:93 galerastats on node xxx-h6: |
2022-08-18 06:30:04,997 INFO galera_alert line:94 {'error': 0, 'payload': {'output': '{"Threads_connected": "228", "wsrep_last_committed": "21383020", |
The only solution to "unbreak" it is to stop the hung node, kill mariadb and start the mariadb service
Attachments
Issue Links
- is caused by
-
MDEV-29293 MariaDB stuck on starting commit state (waiting on commit order critical section)
-
- Closed
-
- relates to
-
MDEV-27689 Node hangs and complete galera cluster freezes
-
- Closed
-
-
MDEV-30718 Cluster hanging regularly on Update_rows_log_event
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Assignee | Jan Lindström [ jplindst ] |
Fix Version/s | 10.6 [ 24028 ] |
Status | Open [ 1 ] | Needs Feedback [ 10501 ] |
Link |
This issue relates to |
Attachment | mariadbd_full_bt_all_threads.txt [ 68208 ] |
Assignee | Jan Lindström [ jplindst ] | Julius Goryavsky [ sysprg ] |
Status | Needs Feedback [ 10501 ] | Open [ 1 ] |
Attachment | mariadbd_full_bt_all_threads_11feb246.txt [ 68215 ] |
Status | Open [ 1 ] | In Progress [ 3 ] |
Status | In Progress [ 3 ] | Stalled [ 10000 ] |
Priority | Critical [ 2 ] | Major [ 3 ] |
Link |
This issue relates to |
Assignee | Julius Goryavsky [ sysprg ] | Seppo Jaakola [ seppo ] |
Assignee | Seppo Jaakola [ seppo ] | Jan Lindström [ JIRAUSER53125 ] |
Status | Stalled [ 10000 ] | Needs Feedback [ 10501 ] |
Attachment | mariadb stacktrace.zip [ 73413 ] |
Attachment | mariadb stacktrace.zip [ 73413 ] |
Attachment | mariadbd_full_bt_all_threads-h14_1712676357.log [ 73414 ] | |
Attachment | mariadbd_full_bt_all_threads-h15_1712676357.log [ 73415 ] | |
Attachment | mariadbd_full_bt_all_threads-h12_1712676357.log [ 73416 ] |
Status | Needs Feedback [ 10501 ] | Open [ 1 ] |
Status | Open [ 1 ] | In Progress [ 3 ] |
Fix Version/s | 10.6.15 [ 29013 ] | |
Fix Version/s | 10.6 [ 24028 ] | |
Resolution | Fixed [ 1 ] | |
Status | In Progress [ 3 ] | Closed [ 6 ] |
Link |
This issue is caused by |
Fix Version/s | 11.3.2 [ 29522 ] | |
Fix Version/s | 11.2.3 [ 29521 ] | |
Fix Version/s | 11.1.4 [ 29024 ] | |
Fix Version/s | 10.11.7 [ 29519 ] | |
Fix Version/s | 10.10.7 [ 29018 ] | |
Fix Version/s | 10.9.8 [ 29015 ] |
Can you: