Details
-
Bug
-
Status: Closed (View Workflow)
-
Blocker
-
Resolution: Fixed
-
10.3.28, 10.2(EOL), 10.4(EOL), 10.5
-
CentOS 7.9.2009
mysqld would have been started with the following arguments:
--datadir=/data/mariadb --socket=/var/lib/mysql/mysql.sock --user=mysql --symbolic-links=0 --max_allowed_packet=16M --thread_cache_size=8 --max_connections=1500 --slow_query_log=1 --log_error=/data/mariadb/mariadb.log --innodb_buffer_pool_size=49152M --innodb_log_file_size=2048M --innodb_log_buffer_size=16M --innodb_print_all_deadlocks=on --log-warnings=2 --plugin_load_add=auth_socket --default_storage_engine=InnoDB --innodb_file_per_table=1 --innodb_flush_log_at_trx_commit=0 --innodb_doublewrite=1 --log_slave_updates=1 --log_bin=bin-log --server_id=6666 --binlog_format=ROW --innodb_autoinc_lock_mode=2 --expire_logs_days=1 --wsrep_provider=/usr/lib64/galera/libgalera_smm.so --wsrep_cluster_address=gcomm://[redacted] --wsrep_node_address=[redacted] --wsrep_sst_method=mariabackup --wsrep_sst_auth=[redacted] --wsrep_provider_options=evs.keepalive_period = PT1S; evs.inactive_check_period = PT1S; evs.suspect_timeout = PT5S; evs.inactive_timeout = PT15S; evs.install_timeout = PT15S; gcache.size=2G --wsrep_on=ON --wsrep_log_conflicts=1CentOS 7.9.2009 mysqld would have been started with the following arguments: --datadir=/data/mariadb --socket=/var/lib/mysql/mysql.sock --user=mysql --symbolic-links=0 --max_allowed_packet=16M --thread_cache_size=8 --max_connections=1500 --slow_query_log=1 --log_error=/data/mariadb/mariadb.log --innodb_buffer_pool_size=49152M --innodb_log_file_size=2048M --innodb_log_buffer_size=16M --innodb_print_all_deadlocks=on --log-warnings=2 --plugin_load_add=auth_socket --default_storage_engine=InnoDB --innodb_file_per_table=1 --innodb_flush_log_at_trx_commit=0 --innodb_doublewrite=1 --log_slave_updates=1 --log_bin=bin-log --server_id=6666 --binlog_format=ROW --innodb_autoinc_lock_mode=2 --expire_logs_days=1 --wsrep_provider=/usr/lib64/galera/libgalera_smm.so --wsrep_cluster_address=gcomm://[redacted] --wsrep_node_address=[redacted] --wsrep_sst_method=mariabackup --wsrep_sst_auth=[redacted] --wsrep_provider_options=evs.keepalive_period = PT1S; evs.inactive_check_period = PT1S; evs.suspect_timeout = PT5S; evs.inactive_timeout = PT15S; evs.install_timeout = PT15S; gcache.size=2G --wsrep_on=ON --wsrep_log_conflicts=1
Description
About 29 hours after updating a previously very stable 3 server MariaDB Galera cluster from 10.3.27 to 10.3.28, one of the nodes crashed with the following message:
2021-03-10 18:22:48 0 [ERROR] WSREP: invalid state ROLLED_BACK (FATAL)
|
at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:abort_trx():735
|
2021-03-10 18:22:48 0 [ERROR] WSREP: cancel commit bad exit: 7 33792346039
|
210310 18:22:48 [ERROR] mysqld got signal 6 ;
|
Attached is a bit more log (coincidentally had been running with conflict logging enabled), but I don't think there's much more I can provide right now. I'm still logging this since before the update to 10.3.28 the cluster had been running very stable for months, and this could be somehow related to MDEV-25111, which we also encountered first time right after updating to 10.3.28.
Attachments
Issue Links
- blocks
-
MDEV-25368 Galera cluster hangs on Freeing items
-
- Closed
-
- causes
-
MDEV-24294 MariaDB - Cluster freezes if node hangs
-
- Closed
-
-
MDEV-25992 Galera 3 Server crash with signal 6 after RBR event apply failed
-
- Closed
-
-
MDEV-26099 MariaDB 10.5.10/10.5.11 Galera assertion crash
-
- Closed
-
- is caused by
-
MDEV-23328 Server hang due to Galera lock conflict resolution
-
- Closed
-
- is duplicated by
-
MDEV-28604 MariaDB crashing very often
-
- Closed
-
- relates to
-
MDEV-24915 Galera conflict resolution is unnecessarily complex
-
- Closed
-
-
MDEV-25410 Assertion `state_ == s_exec' failed - mysqld got signal 6
-
- Closed
-
-
MDEV-25609 Signal 11 on wsrep_mysqld.cc:2620
-
- Closed
-
-
MDEV-24397 MariaDB Galera Server Crashes on Large DELETE
-
- Closed
-
-
MDEV-25111 Long semaphore wait (> 800 secs), server stops responding
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Link |
This issue relates to |
Environment |
CentOS 7.9.2009
mysqld would have been started with the following arguments: --datadir=/data/mariadb --socket=/var/lib/mysql/mysql.sock --user=mysql --symbolic-links=0 --max_allowed_packet=16M --thread_cache_size=8 --max_connections=1500 --slow_query_log=1 --log_error=/data/mariadb/mariadb.log --innodb_buffer_pool_size=49152M --innodb_log_file_size=2048M --innodb_log_buffer_size=16M --innodb_print_all_deadlocks=on --log-warnings=2 --plugin_load_add=auth_socket --default_storage_engine=InnoDB --innodb_file_per_table=1 --innodb_flush_log_at_trx_commit=0 --innodb_doublewrite=1 --log_slave_updates=1 --log_bin=bin-log --server_id=6666 --binlog_format=ROW --innodb_autoinc_lock_mode=2 --expire_logs_days=1 --wsrep_provider=/usr/lib64/galera/libgalera_smm.so --wsrep_cluster_address=gcomm://finna-fe-db-1.csc.fi,finna-fe-db-2.csc.fi,finna-fe-db-3.csc.fi --wsrep_node_address=finna-fe-db-3.csc.fi --wsrep_sst_method=mariabackup --wsrep_sst_auth=mariabackup:aK92gTx --wsrep_provider_options=evs.keepalive_period = PT1S; evs.inactive_check_period = PT1S; evs.suspect_timeout = PT5S; evs.inactive_timeout = PT15S; evs.install_timeout = PT15S; gcache.size=2G --wsrep_on=ON --wsrep_log_conflicts=1 |
CentOS 7.9.2009
mysqld would have been started with the following arguments: --datadir=/data/mariadb --socket=/var/lib/mysql/mysql.sock --user=mysql --symbolic-links=0 --max_allowed_packet=16M --thread_cache_size=8 --max_connections=1500 --slow_query_log=1 --log_error=/data/mariadb/mariadb.log --innodb_buffer_pool_size=49152M --innodb_log_file_size=2048M --innodb_log_buffer_size=16M --innodb_print_all_deadlocks=on --log-warnings=2 --plugin_load_add=auth_socket --default_storage_engine=InnoDB --innodb_file_per_table=1 --innodb_flush_log_at_trx_commit=0 --innodb_doublewrite=1 --log_slave_updates=1 --log_bin=bin-log --server_id=6666 --binlog_format=ROW --innodb_autoinc_lock_mode=2 --expire_logs_days=1 --wsrep_provider=/usr/lib64/galera/libgalera_smm.so --wsrep_cluster_address=gcomm://[redacted] --wsrep_node_address=[redacted] --wsrep_sst_method=mariabackup --wsrep_sst_auth=[redacted] --wsrep_provider_options=evs.keepalive_period = PT1S; evs.inactive_check_period = PT1S; evs.suspect_timeout = PT5S; evs.inactive_timeout = PT15S; evs.install_timeout = PT15S; gcache.size=2G --wsrep_on=ON --wsrep_log_conflicts=1 |
Labels | regression | |
Priority | Major [ 3 ] | Critical [ 2 ] |
Assignee | Seppo Jaakola [ seppo ] |
Fix Version/s | 10.3 [ 22126 ] |
Status | Open [ 1 ] | In Progress [ 3 ] |
Assignee | Seppo Jaakola [ seppo ] | Jan Lindström [ jplindst ] |
Status | In Progress [ 3 ] | Stalled [ 10000 ] |
Assignee | Jan Lindström [ jplindst ] | Seppo Jaakola [ seppo ] |
Attachment | gdb.txt [ 57937 ] |
Link |
This issue relates to |
Link |
This issue relates to |
Link |
This issue relates to |
Assignee | Seppo Jaakola [ seppo ] | Jan Lindström [ jplindst ] |
Status | Stalled [ 10000 ] | In Review [ 10002 ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
Status | Stalled [ 10000 ] | In Progress [ 3 ] |
Assignee | Jan Lindström [ jplindst ] | Marko Mäkelä [ marko ] |
Status | In Progress [ 3 ] | In Review [ 10002 ] |
Assignee | Marko Mäkelä [ marko ] | Jan Lindström [ jplindst ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
Assignee | Jan Lindström [ jplindst ] | Sergei Golubchik [ serg ] |
Status | Stalled [ 10000 ] | In Review [ 10002 ] |
Priority | Critical [ 2 ] | Blocker [ 1 ] |
Affects Version/s | 10.2 [ 14601 ] |
Affects Version/s | 10.4 [ 22408 ] |
Affects Version/s | 10.5 [ 23123 ] |
Fix Version/s | 10.2.39 [ 25731 ] | |
Fix Version/s | 10.3.30 [ 25732 ] | |
Fix Version/s | 10.4.20 [ 25733 ] | |
Fix Version/s | 10.5.11 [ 25734 ] | |
Fix Version/s | 10.3 [ 22126 ] |
Link | This issue is part of MENT-1202 [ MENT-1202 ] |
Fix Version/s | 10.2 [ 14601 ] | |
Fix Version/s | 10.3 [ 22126 ] | |
Fix Version/s | 10.4 [ 22408 ] | |
Fix Version/s | 10.5 [ 23123 ] | |
Fix Version/s | 10.2.39 [ 25731 ] | |
Fix Version/s | 10.3.30 [ 25732 ] | |
Fix Version/s | 10.4.20 [ 25733 ] | |
Fix Version/s | 10.5.11 [ 25734 ] |
Link | This issue is part of MENT-1202 [ MENT-1202 ] |
Link | This issue blocks MENT-1202 [ MENT-1202 ] |
Link | This issue blocks TODO-2984 [ TODO-2984 ] |
Link |
This issue relates to |
Link |
This issue causes |
Description |
About 29 hours after updating a previously very stable 3 server MariaDB Galera cluster from 10.3.27 to 10.3.28, one of the nodes crashed with the following message:
2021-03-10 18:22:48 0 [ERROR] WSREP: invalid state ROLLED_BACK (FATAL) at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:abort_trx():735 2021-03-10 18:22:48 0 [ERROR] WSREP: cancel commit bad exit: 7 33792346039 210310 18:22:48 [ERROR] mysqld got signal 6 ; Attached is a bit more log (coincidentally had been running with conflict logging enabled), but I don't think there's much more I can provide right now. I'm still logging this since before the update to 10.3.28 the cluster had been running very stable for months, and this could be somehow related to |
About 29 hours after updating a previously very stable 3 server MariaDB Galera cluster from 10.3.27 to 10.3.28, one of the nodes crashed with the following message:
{noformat} 2021-03-10 18:22:48 0 [ERROR] WSREP: invalid state ROLLED_BACK (FATAL) at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:abort_trx():735 2021-03-10 18:22:48 0 [ERROR] WSREP: cancel commit bad exit: 7 33792346039 210310 18:22:48 [ERROR] mysqld got signal 6 ; {noformat} Attached is a bit more log (coincidentally had been running with conflict logging enabled), but I don't think there's much more I can provide right now. I'm still logging this since before the update to 10.3.28 the cluster had been running very stable for months, and this could be somehow related to |
Assignee | Sergei Golubchik [ serg ] | Jan Lindström [ jplindst ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
Assignee | Jan Lindström [ jplindst ] | Seppo Jaakola [ seppo ] |
Assignee | Seppo Jaakola [ seppo ] | Jan Lindström [ jplindst ] |
Status | Stalled [ 10000 ] | In Progress [ 3 ] |
Labels | regression | crash regression |
Labels | crash regression | crash hang regression |
Link | This issue blocks MENT-1284 [ MENT-1284 ] |
Priority | Blocker [ 1 ] | Critical [ 2 ] |
Status | In Progress [ 3 ] | Stalled [ 10000 ] |
Priority | Critical [ 2 ] | Blocker [ 1 ] |
Labels | crash hang regression | crash hang need_feedback regression |
Assignee | Jan Lindström [ jplindst ] | Sergei Golubchik [ serg ] |
Labels | crash hang need_feedback regression | crash hang regression |
Link |
This issue blocks |
Link |
This issue relates to |
Link |
This issue blocks |
Link |
This issue relates to |
Assignee | Sergei Golubchik [ serg ] | Jan Lindström [ jplindst ] |
Status | Stalled [ 10000 ] | In Progress [ 3 ] |
Assignee | Jan Lindström [ jplindst ] | Sergei Golubchik [ serg ] |
Status | In Progress [ 3 ] | In Review [ 10002 ] |
Link |
This issue causes |
Assignee | Sergei Golubchik [ serg ] | Jan Lindström [ jplindst ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
Status | Stalled [ 10000 ] | In Progress [ 3 ] |
Status | In Progress [ 3 ] | Stalled [ 10000 ] |
Status | Stalled [ 10000 ] | In Progress [ 3 ] |
Assignee | Jan Lindström [ jplindst ] | Ramesh Sivaraman [ JIRAUSER48189 ] |
Assignee | Ramesh Sivaraman [ JIRAUSER48189 ] | Jan Lindström [ jplindst ] |
Assignee | Jan Lindström [ jplindst ] | Sergei Golubchik [ serg ] |
Status | In Progress [ 3 ] | In Review [ 10002 ] |
Assignee | Sergei Golubchik [ serg ] | Jan Lindström [ jplindst ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
Status | Stalled [ 10000 ] | In Progress [ 3 ] |
Link |
This issue causes |
Assignee | Jan Lindström [ jplindst ] | Sergei Golubchik [ serg ] |
Status | In Progress [ 3 ] | In Review [ 10002 ] |
Link |
This issue is caused by |
Assignee | Sergei Golubchik [ serg ] | Jan Lindström [ jplindst ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
Status | Stalled [ 10000 ] | In Progress [ 3 ] |
Assignee | Jan Lindström [ jplindst ] | Sergei Golubchik [ serg ] |
Status | In Progress [ 3 ] | In Review [ 10002 ] |
Link |
This issue blocks |
Labels | crash hang regression | crash hang not-10.6 not-10.7 regression |
Link | This issue blocks TODO-3199 [ TODO-3199 ] |
Assignee | Sergei Golubchik [ serg ] | Jan Lindström [ jplindst ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
Status | Stalled [ 10000 ] | In Progress [ 3 ] |
issue.field.resolutiondate | 2021-10-30 06:28:15.0 | 2021-10-30 06:28:15.345 |
Fix Version/s | 10.2.41 [ 26032 ] | |
Fix Version/s | 10.3.32 [ 26029 ] | |
Fix Version/s | 10.4.22 [ 26031 ] | |
Fix Version/s | 10.5.13 [ 26026 ] | |
Fix Version/s | 10.2 [ 14601 ] | |
Fix Version/s | 10.3 [ 22126 ] | |
Fix Version/s | 10.4 [ 22408 ] | |
Fix Version/s | 10.5 [ 23123 ] | |
Resolution | Fixed [ 1 ] | |
Status | In Progress [ 3 ] | Closed [ 6 ] |
Link |
This issue relates to |
Workflow | MariaDB v3 [ 120009 ] | MariaDB v4 [ 159016 ] |
Link |
This issue relates to |
Link |
This issue is duplicated by |
Zendesk Related Tickets | 116083 144997 139390 171032 120022 |