Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25114

Crash: WSREP: invalid state ROLLED_BACK (FATAL)

Details

    • Bug
    • Status: Closed (View Workflow)
    • Blocker
    • Resolution: Fixed
    • 10.3.28, 10.2(EOL), 10.4(EOL), 10.5
    • 10.2.41, 10.3.32, 10.4.22, 10.5.13
    • Galera

    Description

      About 29 hours after updating a previously very stable 3 server MariaDB Galera cluster from 10.3.27 to 10.3.28, one of the nodes crashed with the following message:

      2021-03-10 18:22:48 0 [ERROR] WSREP: invalid state ROLLED_BACK (FATAL)
               at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:abort_trx():735
      2021-03-10 18:22:48 0 [ERROR] WSREP: cancel commit bad exit: 7 33792346039
      210310 18:22:48 [ERROR] mysqld got signal 6 ;
      

      Attached is a bit more log (coincidentally had been running with conflict logging enabled), but I don't think there's much more I can provide right now. I'm still logging this since before the update to 10.3.28 the cluster had been running very stable for months, and this could be somehow related to MDEV-25111, which we also encountered first time right after updating to 10.3.28.

      Attachments

        Issue Links

          Activity

            emaijala Ere Maijala created issue -
            emaijala Ere Maijala made changes -
            Field Original Value New Value
            emaijala Ere Maijala made changes -
            Environment CentOS 7.9.2009

            mysqld would have been started with the following arguments:
            --datadir=/data/mariadb --socket=/var/lib/mysql/mysql.sock --user=mysql --symbolic-links=0 --max_allowed_packet=16M --thread_cache_size=8 --max_connections=1500 --slow_query_log=1 --log_error=/data/mariadb/mariadb.log --innodb_buffer_pool_size=49152M --innodb_log_file_size=2048M --innodb_log_buffer_size=16M --innodb_print_all_deadlocks=on --log-warnings=2 --plugin_load_add=auth_socket --default_storage_engine=InnoDB --innodb_file_per_table=1 --innodb_flush_log_at_trx_commit=0 --innodb_doublewrite=1 --log_slave_updates=1 --log_bin=bin-log --server_id=6666 --binlog_format=ROW --innodb_autoinc_lock_mode=2 --expire_logs_days=1 --wsrep_provider=/usr/lib64/galera/libgalera_smm.so --wsrep_cluster_address=gcomm://finna-fe-db-1.csc.fi,finna-fe-db-2.csc.fi,finna-fe-db-3.csc.fi --wsrep_node_address=finna-fe-db-3.csc.fi --wsrep_sst_method=mariabackup --wsrep_sst_auth=mariabackup:aK92gTx --wsrep_provider_options=evs.keepalive_period = PT1S; evs.inactive_check_period = PT1S; evs.suspect_timeout = PT5S; evs.inactive_timeout = PT15S; evs.install_timeout = PT15S; gcache.size=2G --wsrep_on=ON --wsrep_log_conflicts=1
            CentOS 7.9.2009

            mysqld would have been started with the following arguments:
            --datadir=/data/mariadb --socket=/var/lib/mysql/mysql.sock --user=mysql --symbolic-links=0 --max_allowed_packet=16M --thread_cache_size=8 --max_connections=1500 --slow_query_log=1 --log_error=/data/mariadb/mariadb.log --innodb_buffer_pool_size=49152M --innodb_log_file_size=2048M --innodb_log_buffer_size=16M --innodb_print_all_deadlocks=on --log-warnings=2 --plugin_load_add=auth_socket --default_storage_engine=InnoDB --innodb_file_per_table=1 --innodb_flush_log_at_trx_commit=0 --innodb_doublewrite=1 --log_slave_updates=1 --log_bin=bin-log --server_id=6666 --binlog_format=ROW --innodb_autoinc_lock_mode=2 --expire_logs_days=1 --wsrep_provider=/usr/lib64/galera/libgalera_smm.so --wsrep_cluster_address=gcomm://[redacted] --wsrep_node_address=[redacted] --wsrep_sst_method=mariabackup --wsrep_sst_auth=[redacted] --wsrep_provider_options=evs.keepalive_period = PT1S; evs.inactive_check_period = PT1S; evs.suspect_timeout = PT5S; evs.inactive_timeout = PT15S; evs.install_timeout = PT15S; gcache.size=2G --wsrep_on=ON --wsrep_log_conflicts=1
            valerii Valerii Kravchuk made changes -
            Labels regression
            Priority Major [ 3 ] Critical [ 2 ]
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Seppo Jaakola [ seppo ]
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s 10.3 [ 22126 ]
            seppo Seppo Jaakola made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Seppo Jaakola [ seppo ] Jan Lindström [ jplindst ]
            jplindst Jan Lindström (Inactive) made changes -
            Status In Progress [ 3 ] Stalled [ 10000 ]
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Jan Lindström [ jplindst ] Seppo Jaakola [ seppo ]
            jplindst Jan Lindström (Inactive) made changes -
            Attachment gdb.txt [ 57937 ]
            julien.fritsch Julien Fritsch made changes -
            julien.fritsch Julien Fritsch made changes -
            julien.fritsch Julien Fritsch made changes -
            seppo Seppo Jaakola made changes -
            Assignee Seppo Jaakola [ seppo ] Jan Lindström [ jplindst ]
            Status Stalled [ 10000 ] In Review [ 10002 ]
            jplindst Jan Lindström (Inactive) made changes -
            Status In Review [ 10002 ] Stalled [ 10000 ]
            jplindst Jan Lindström (Inactive) made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Jan Lindström [ jplindst ] Marko Mäkelä [ marko ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Jan Lindström [ jplindst ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Jan Lindström [ jplindst ] Sergei Golubchik [ serg ]
            Status Stalled [ 10000 ] In Review [ 10002 ]
            jplindst Jan Lindström (Inactive) made changes -
            Priority Critical [ 2 ] Blocker [ 1 ]
            jplindst Jan Lindström (Inactive) made changes -
            Affects Version/s 10.2 [ 14601 ]
            jplindst Jan Lindström (Inactive) made changes -
            Affects Version/s 10.4 [ 22408 ]
            jplindst Jan Lindström (Inactive) made changes -
            Affects Version/s 10.5 [ 23123 ]
            jplindst Jan Lindström (Inactive) made changes -
            Fix Version/s 10.2.39 [ 25731 ]
            Fix Version/s 10.3.30 [ 25732 ]
            Fix Version/s 10.4.20 [ 25733 ]
            Fix Version/s 10.5.11 [ 25734 ]
            Fix Version/s 10.3 [ 22126 ]
            jplindst Jan Lindström (Inactive) made changes -
            serg Sergei Golubchik made changes -
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            Fix Version/s 10.2.39 [ 25731 ]
            Fix Version/s 10.3.30 [ 25732 ]
            Fix Version/s 10.4.20 [ 25733 ]
            Fix Version/s 10.5.11 [ 25734 ]
            jplindst Jan Lindström (Inactive) made changes -
            jplindst Jan Lindström (Inactive) made changes -
            jplindst Jan Lindström (Inactive) made changes -
            seppo Seppo Jaakola made changes -
            jplindst Jan Lindström (Inactive) made changes -
            serg Sergei Golubchik made changes -
            Description About 29 hours after updating a previously very stable 3 server MariaDB Galera cluster from 10.3.27 to 10.3.28, one of the nodes crashed with the following message:

            2021-03-10 18:22:48 0 [ERROR] WSREP: invalid state ROLLED_BACK (FATAL)
                     at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:abort_trx():735
            2021-03-10 18:22:48 0 [ERROR] WSREP: cancel commit bad exit: 7 33792346039
            210310 18:22:48 [ERROR] mysqld got signal 6 ;

            Attached is a bit more log (coincidentally had been running with conflict logging enabled), but I don't think there's much more I can provide right now. I'm still logging this since before the update to 10.3.28 the cluster had been running very stable for months, and this could be somehow related to MDEV-25111, which we also encountered first time right after updating to 10.3.28.
            About 29 hours after updating a previously very stable 3 server MariaDB Galera cluster from 10.3.27 to 10.3.28, one of the nodes crashed with the following message:
            {noformat}
            2021-03-10 18:22:48 0 [ERROR] WSREP: invalid state ROLLED_BACK (FATAL)
                     at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:abort_trx():735
            2021-03-10 18:22:48 0 [ERROR] WSREP: cancel commit bad exit: 7 33792346039
            210310 18:22:48 [ERROR] mysqld got signal 6 ;
            {noformat}
            Attached is a bit more log (coincidentally had been running with conflict logging enabled), but I don't think there's much more I can provide right now. I'm still logging this since before the update to 10.3.28 the cluster had been running very stable for months, and this could be somehow related to MDEV-25111, which we also encountered first time right after updating to 10.3.28.
            serg Sergei Golubchik made changes -
            Assignee Sergei Golubchik [ serg ] Jan Lindström [ jplindst ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Jan Lindström [ jplindst ] Seppo Jaakola [ seppo ]
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Seppo Jaakola [ seppo ] Jan Lindström [ jplindst ]
            jplindst Jan Lindström (Inactive) made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            jplindst Jan Lindström (Inactive) made changes -
            Labels regression crash regression
            jplindst Jan Lindström (Inactive) made changes -
            Labels crash regression crash hang regression
            serg Sergei Golubchik made changes -
            serg Sergei Golubchik made changes -
            Priority Blocker [ 1 ] Critical [ 2 ]
            jplindst Jan Lindström (Inactive) made changes -
            Status In Progress [ 3 ] Stalled [ 10000 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Priority Critical [ 2 ] Blocker [ 1 ]
            jplindst Jan Lindström (Inactive) made changes -
            Labels crash hang regression crash hang need_feedback regression
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Jan Lindström [ jplindst ] Sergei Golubchik [ serg ]
            jplindst Jan Lindström (Inactive) made changes -
            Labels crash hang need_feedback regression crash hang regression
            julien.fritsch Julien Fritsch made changes -
            julien.fritsch Julien Fritsch made changes -
            julien.fritsch Julien Fritsch made changes -
            julien.fritsch Julien Fritsch made changes -
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Sergei Golubchik [ serg ] Jan Lindström [ jplindst ]
            jplindst Jan Lindström (Inactive) made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Jan Lindström [ jplindst ] Sergei Golubchik [ serg ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            jplindst Jan Lindström (Inactive) made changes -
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Sergei Golubchik [ serg ] Jan Lindström [ jplindst ]
            jplindst Jan Lindström (Inactive) made changes -
            Status In Review [ 10002 ] Stalled [ 10000 ]
            jplindst Jan Lindström (Inactive) made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            jplindst Jan Lindström (Inactive) made changes -
            Status In Progress [ 3 ] Stalled [ 10000 ]
            jplindst Jan Lindström (Inactive) made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            ramesh Ramesh Sivaraman made changes -
            Assignee Jan Lindström [ jplindst ] Ramesh Sivaraman [ JIRAUSER48189 ]
            ramesh Ramesh Sivaraman made changes -
            Assignee Ramesh Sivaraman [ JIRAUSER48189 ] Jan Lindström [ jplindst ]
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Jan Lindström [ jplindst ] Sergei Golubchik [ serg ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Sergei Golubchik [ serg ] Jan Lindström [ jplindst ]
            jplindst Jan Lindström (Inactive) made changes -
            Status In Review [ 10002 ] Stalled [ 10000 ]
            jplindst Jan Lindström (Inactive) made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            jplindst Jan Lindström (Inactive) made changes -
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Jan Lindström [ jplindst ] Sergei Golubchik [ serg ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            seppo Seppo Jaakola made changes -
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Sergei Golubchik [ serg ] Jan Lindström [ jplindst ]
            jplindst Jan Lindström (Inactive) made changes -
            Status In Review [ 10002 ] Stalled [ 10000 ]
            jplindst Jan Lindström (Inactive) made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Jan Lindström [ jplindst ] Sergei Golubchik [ serg ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            jplindst Jan Lindström (Inactive) made changes -
            jplindst Jan Lindström (Inactive) made changes -
            Labels crash hang regression crash hang not-10.6 not-10.7 regression
            julien.fritsch Julien Fritsch made changes -
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Sergei Golubchik [ serg ] Jan Lindström [ jplindst ]
            jplindst Jan Lindström (Inactive) made changes -
            Status In Review [ 10002 ] Stalled [ 10000 ]
            jplindst Jan Lindström (Inactive) made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            jplindst Jan Lindström (Inactive) made changes -
            issue.field.resolutiondate 2021-10-30 06:28:15.0 2021-10-30 06:28:15.345
            jplindst Jan Lindström (Inactive) made changes -
            Fix Version/s 10.2.41 [ 26032 ]
            Fix Version/s 10.3.32 [ 26029 ]
            Fix Version/s 10.4.22 [ 26031 ]
            Fix Version/s 10.5.13 [ 26026 ]
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            marko Marko Mäkelä made changes -
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 120009 ] MariaDB v4 [ 159016 ]
            Roel Roel Van de Paar made changes -
            danblack Daniel Black made changes -
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 116083 144997 139390 171032 120022

            People

              jplindst Jan Lindström (Inactive)
              emaijala Ere Maijala
              Votes:
              16 Vote for this issue
              Watchers:
              37 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.