Details
Description
In an environment running Galera Cluster with 6 MariaDB nodes, 1 arbitrator node, some replicas and a ProxySQL, after a network issue that triggered a state transfer on two nodes,
for some reason, almost all the transactions hang in:
- “starting” state on the commit statement or on "".
- "acquiring total order isolation" on the "KILL CONNECTION" statement (The "KILL CONNECTION" was requested by the ProxySQL)
We tried to restart the service but it hangs on stopping, ProxySQL detected this node as down and switched the traffic to another node.
By looking at the backtrace it seems that we have a kind of "pthread_cond_wait() deadlock" executed by lock.wait() on the enter() function on the commit monitor during the commit order critical section.
Unfortunately, we didn't find a way to reproduce the problem
Attachments
Issue Links
- blocks
-
MDEV-30963 Assertion failure !lock.was_chosen_as_deadlock_victim in trx0trx.h:1065
-
- Closed
-
- causes
-
MDEV-29346 update_rows_log_event hung causing galera cluster failure
-
- Closed
-
-
MDEV-30372 Assertion `state() == s_executing || state() == s_preparing || state() == s_prepared || state() == s_must_abort || state() == s_aborting || state() == s_cert_failed || state() == s_must_replay' failed
-
- Closed
-
- includes
-
MDEV-31075 KILL QUERY maintains nodes data consistency but breaks GTID sequence
-
- Closed
-
- relates to
-
MDEV-28472 BF lock wait long for trx - Assertion `mode_ == m_local || transaction_.is_streaming()' failed
-
- Closed
-
-
MDEV-29323 Galera ha_abort_transaction is not honored if there are no InnoDB lock conflicts
-
- Open
-
Activity
Field | Original Value | New Value |
---|---|---|
Assignee | Jan Lindström [ jplindst ] |
Attachment | process-list-sample.txt [ 65246 ] |
Assignee | Jan Lindström [ jplindst ] | Seppo Jaakola [ seppo ] |
Attachment | processlist.png [ 65721 ] |
Fix Version/s | 10.5 [ 23123 ] |
Attachment | gdb.txt_test3_100insertQPS.gz [ 68071 ] | |
Attachment | gdb.txt_test1.gz [ 68072 ] | |
Attachment | gdb.txt_test2_200insertQPS.gz [ 68073 ] |
Status | Open [ 1 ] | In Progress [ 3 ] |
Status | In Progress [ 3 ] | Needs Feedback [ 10501 ] |
Attachment | gdb_010.txt.gz [ 68120 ] | |
Attachment | gdb_008.txt.gz [ 68121 ] | |
Attachment | gdb_007.txt.gz [ 68122 ] | |
Attachment | gdb_006.txt.gz [ 68123 ] |
Attachment | mariadb_003.err.gz [ 68164 ] | |
Attachment | mariadb_001.err.gz [ 68165 ] | |
Attachment | gdb.txt_003.gz [ 68166 ] | |
Attachment | gdb.txt_002.gz [ 68167 ] |
Status | Needs Feedback [ 10501 ] | Open [ 1 ] |
Status | Open [ 1 ] | Needs Feedback [ 10501 ] |
Attachment | oltp_insert_nba.lua.rtf [ 68189 ] |
Attachment | gdb_007.txt.gz [ 68122 ] |
Attachment | gdb_008.txt.gz [ 68121 ] |
Attachment | gdb_006.txt.gz [ 68123 ] |
Attachment | gdb_010.txt.gz [ 68120 ] |
Attachment | gdb.txt_002.gz [ 68167 ] |
Attachment | gdb.txt_003.gz [ 68166 ] |
Attachment | gdb.txt_test1.gz [ 68072 ] |
Attachment | gdb.txt_test2_200insertQPS.gz [ 68073 ] |
Attachment | gdb.txt_test3_100insertQPS.gz [ 68071 ] |
Attachment | mariadb_001.err.gz [ 68165 ] |
Attachment | mariadb_003.err.gz [ 68164 ] |
Attachment | oltp_insert_nba.lua.rtf [ 68189 ] |
Assignee | Seppo Jaakola [ seppo ] | Kwangbock Lee [ kb ] |
Assignee | Kwangbock Lee [ kb ] | Seppo Jaakola [ seppo ] |
Link | This issue includes MENT-1730 [ MENT-1730 ] |
Status | Needs Feedback [ 10501 ] | Open [ 1 ] |
Status | Open [ 1 ] | Confirmed [ 10101 ] |
Priority | Major [ 3 ] | Critical [ 2 ] |
Labels | galera | galera not-10.6+ |
Status | Confirmed [ 10101 ] | In Progress [ 3 ] |
Assignee | Seppo Jaakola [ seppo ] | Julien Fritsch [ julien.fritsch ] |
Assignee | Julien Fritsch [ julien.fritsch ] | Julius Goryavsky [ sysprg ] |
Status | In Progress [ 3 ] | In Review [ 10002 ] |
Assignee | Julius Goryavsky [ sysprg ] | Seppo Jaakola [ seppo ] |
Assignee | Julius Goryavsky [ sysprg ] | Julien Fritsch [ julien.fritsch ] |
Assignee | Julien Fritsch [ julien.fritsch ] | Seppo Jaakola [ seppo ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
Status | Stalled [ 10000 ] | In Progress [ 3 ] |
Status | In Progress [ 3 ] | Needs Feedback [ 10501 ] |
Fix Version/s | N/A [ 14700 ] | |
Fix Version/s | 10.5 [ 23123 ] | |
Resolution | Incomplete [ 4 ] | |
Status | Needs Feedback [ 10501 ] | Closed [ 6 ] |
Resolution | Incomplete [ 4 ] | |
Status | Closed [ 6 ] | Stalled [ 10000 ] |
Assignee | Seppo Jaakola [ seppo ] | Teemu Ollakka [ teemu.ollakka ] |
Status | Stalled [ 10000 ] | In Review [ 10002 ] |
Assignee | Teemu Ollakka [ teemu.ollakka ] | Jan Lindström [ JIRAUSER53125 ] |
Assignee | Jan Lindström [ JIRAUSER53125 ] | Marko Mäkelä [ marko ] |
Assignee | Marko Mäkelä [ marko ] | Jan Lindström [ JIRAUSER53125 ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
Link |
This issue blocks |
Fix Version/s | 10.4 [ 22408 ] | |
Fix Version/s | 10.5 [ 23123 ] | |
Fix Version/s | N/A [ 14700 ] |
Link |
This issue is blocked by |
Assignee | Jan Lindström [ JIRAUSER53125 ] | Marko Mäkelä [ marko ] |
Status | Stalled [ 10000 ] | In Review [ 10002 ] |
Link | This issue relates to MDEV-29323 [ MDEV-29323 ] |
Link |
This issue is blocked by |
Link |
This issue relates to |
Link |
This issue includes |
Assignee | Marko Mäkelä [ marko ] | Jan Lindström [ JIRAUSER53125 ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
Link | This issue blocks MENT-1693 [ MENT-1693 ] |
Affects Version/s | 10.6.12 [ 28513 ] |
Assignee | Jan Lindström [ JIRAUSER53125 ] | Oleksandr Byelkin [ sanja ] |
Status | Stalled [ 10000 ] | In Review [ 10002 ] |
Assignee | Oleksandr Byelkin [ sanja ] | Julius Goryavsky [ sysprg ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
Status | Stalled [ 10000 ] | In Progress [ 3 ] |
Link |
This issue causes |
issue.field.resolutiondate | 2023-05-22 02:02:46.0 | 2023-05-22 02:02:46.119 |
Fix Version/s | 11.0.2 [ 28706 ] | |
Fix Version/s | 10.4.30 [ 28912 ] | |
Fix Version/s | 10.5.21 [ 28913 ] | |
Fix Version/s | 10.6.14 [ 28914 ] | |
Fix Version/s | 10.9.7 [ 28916 ] | |
Fix Version/s | 10.10.5 [ 28917 ] | |
Fix Version/s | 10.11.4 [ 28918 ] | |
Fix Version/s | 10.4 [ 22408 ] | |
Fix Version/s | 10.5 [ 23123 ] | |
Resolution | Fixed [ 1 ] | |
Status | In Progress [ 3 ] | Closed [ 6 ] |
Fix Version/s | 10.4.31 [ 29010 ] | |
Fix Version/s | 10.5.22 [ 29011 ] | |
Fix Version/s | 10.6.15 [ 29013 ] | |
Fix Version/s | 10.9.8 [ 29015 ] | |
Fix Version/s | 10.10.6 [ 29017 ] | |
Fix Version/s | 10.11.5 [ 29019 ] | |
Fix Version/s | 11.0.3 [ 28920 ] | |
Fix Version/s | 11.1.2 [ 28921 ] | |
Fix Version/s | 11.0.2 [ 28706 ] | |
Fix Version/s | 10.4.30 [ 28912 ] | |
Fix Version/s | 10.5.21 [ 28913 ] | |
Fix Version/s | 10.6.14 [ 28914 ] | |
Fix Version/s | 10.9.7 [ 28916 ] | |
Fix Version/s | 10.10.5 [ 28917 ] | |
Fix Version/s | 10.11.4 [ 28918 ] |
Link | This issue relates to TODO-4011 [ TODO-4011 ] |
Labels | galera not-10.6+ | galera |
Labels | galera |
Link | This issue blocks MENT-1855 [ MENT-1855 ] |
Link |
This issue causes |
Zendesk Related Tickets | 175526 143407 133829 136221 147010 |
I have just came across this issue when trying to move a DB cluster from a percona cluster into a MariaDB using logical backups.
After a while of the applications running I ended up with hundreds of processes, which were stuck in starting commit state attached is a redacted sample of the process list process-list-sample.txt
.
I have restarted the cluster and enabled wsrep debug, to try and get some additional information, as to what is happening when it locks up into this state.
Version information is:
OS: Debian 11
Kernel: 5.10.0-16-amd64 #1 SMP Debian 5.10.127-1
MariaDB: 10.5.15-0+deb11u1
Galera: 26.4.11-0+deb11u1