Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25469

transactions blocked by system user rows_log_event process that never finish

    XMLWordPrintable

Details

    Description

      Hi.
      We are doing some testing with more load on our current mariadb multi-master (3) setup. However we've seen multiple times where our database stopped responding on all 3 nodes. When we checked closer it was always one node stuck with system user process like these which sleep there never-ending:

      |     1 | system user  |                    | NULL       | Sleep   |   531 | Write_rows_log_event::write_row(17936337) | insert into volume ... REST OF SQL QUERY |    0.000 |
      |      5 | system user |                    | NULL                | Sleep   |  3849 | Update_rows_log_event::find_row(17819732) | UPDATE `volume` SET ... REST OF SQL QUERY |    0.000 |
      

      Then LOCKS started to buildout and processes to hang. There was nothing we could do to unblock this only restart the affected node. After restart of affected node everything seems to start again. We've checked the queries and they differ. The primary keys are set on the table(s).
      Our current setup config

        mariadb.cnf: |
          [client]
          default-character-set = utf8
          [mysqld]
          core-file
          unix_socket = OFF
          performance_schema = ON
          character-set-server = utf8
          collation-server = utf8_general_ci
          ignore-db-dirs = lost+found 
          max_connections = 350
          interactive_timeout = 450 
          wait_timeout = 450
          join_buffer_size = 524288
          key_buffer_size = 600MB
          # InnoDB tuning
          innodb_log_file_size = 600MB
          innodb_buffer_pool_size = 5800MB
       galera.cnf: |
          [galera]
          user = mysql
          bind-address = 0.0.0.0
          default_storage_engine = InnoDB
          binlog_format = ROW
          innodb_autoinc_lock_mode = 2
          innodb_flush_log_at_trx_commit = 2
          query_cache_size = 0
          query_cache_type = 0
       
          # MariaDB Galera settings
          wsrep_on=ON
          wsrep_provider=/usr/lib/galera/libgalera_smm.so
          wsrep_sst_method=rsync
          wsrep_slave_threads=8
          wsrep_sync_wait=7
       
          # Cluster settings (automatically updated)
          wsrep_cluster_address=gcomm://
          wsrep_cluster_name=mysql
          wsrep_node_address=127.0.0.1
      

      Is there some specific case when this can happen or why we keep hitting this error?
      Are there maybe some specific options we can specify for this to not happen and get stuck?
      I've tried to dig deeper into this but not really sure why it happens and how to eliminate this. It's "random" but starts to happen quite often.

      UPDATE another case today:

      MariaDB [(none)]> show full processlist ;
      +-------+--------------+--------------------+---------------------+---------+-------+-------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
      | Id    | User         | Host               | db                  | Command | Time  | State                                     | Info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | Progress |
      +-------+--------------+--------------------+---------------------+---------+-------+-------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
      |     1 | system user  |                    |                     | Busy    | 76698 | wsrep aborter idle                        | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |    0.000 |
      |     2 | system user  |                    | NULL                | Sleep   | 34574 | Write_rows_log_event::write_row(18047602) | insert into snapshot |    0.000 |
      |     7 | system user  |                    | NULL                | Sleep   | 34602 | Write_rows_log_event::write_row(18047598) | insert into volume |    0.000 |
      |     6 | system user  |                    | NULL                | Sleep   | 34620 | wsrep applier committed                   | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |    0.000 |
      |    10 | system user  |                    | NULL                | Sleep   | 34573 | Write_rows_log_event::write_row(18047603) | insert into snapshot |    0.000 |
      |     8 | system user  |                    | NULL                | Sleep   | 34608 | wsrep applier committed                   | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |    0.000 |
      |     5 | system user  |                    | NULL                | Sleep   | 34602 | Write_rows_log_event::write_row(18047599) | insert into volume( |    0.000 |
      |     9 | system user  |                    | NULL                | Sleep   | 34620 | wsrep applier committed                   | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |    0.000 |
      |    11 | system user  |                    | NULL                | Sleep   | 34603 | Write_rows_log_event::write_row(18047596) | insert into volume|    0.000 |
      

      When I checked logs it seems there was signal 11 around that time

      2021-04-22 19:23:02 59446 [Warning] Aborted connection 59446 to db: 'storage' user: 'root' host: '10.255.5.166' (Got an error reading communication packets)
      2021-04-22 19:25:02 59566 [Warning] Aborted connection 59566 to db: 'storage' user: 'root' host: '10.255.5.166' (Got an error reading communication packets)
      2021-04-22 19:26:02 59614 [Warning] Aborted connection 59614 to db: 'storage' user: 'root' host: '10.255.5.166' (Got an error reading communication packets)
      2021-04-22 19:29:01 59804 [Warning] Aborted connection 59804 to db: 'storage' user: 'root' host: '10.255.5.166' (Got an error reading communication packets)
      2021-04-22 19:30:01 59868 [Warning] Aborted connection 59868 to db: 'storage' user: 'root' host: '10.255.5.166' (Got an error reading communication packets)
      210422 19:33:18 [ERROR] mysqld got signal 11 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.
       
      Server version: 10.5.9-MariaDB-1:10.5.9+maria~focal
      key_buffer_size=134217728
      read_buffer_size=131072
      max_used_connections=33
      max_threads=252
      thread_count=43
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 685804 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x0
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x0 thread_stack 0x49000
      mysqld(my_print_stacktrace+0x32)[0x55a4a3daeaf2]
      mysqld(handle_fatal_signal+0x485)[0x55a4a38046f5]
      

      However at that time it didn't create crashdump. I've send signal 11 to mysqld process which generated crash. Hopefully will help for this case https://file.io/yZfXkbIUYe0w

      Attachments

        1. error_20210702.log.gz
          645 kB
          Jan de Lalène
        2. system.log
          1.38 MB
          Jaroslav

        Activity

          People

            seppo Seppo Jaakola
            jaroslav Jaroslav
            Votes:
            3 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.