[MDEV-25469] transactions blocked by system user rows_log_event process that never finish - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Incomplete
Affects Version/s: 10.3.29, 10.5.9, 10.5
Fix Version/s: N/A
Component/s: Storage Engine - InnoDB, wsrep
Labels:
None

Description

Hi.
We are doing some testing with more load on our current mariadb multi-master (3) setup. However we've seen multiple times where our database stopped responding on all 3 nodes. When we checked closer it was always one node stuck with system user process like these which sleep there never-ending:

|     1 | system user  |                    | NULL       | Sleep   |   531 | Write_rows_log_event::write_row(17936337) | insert into volume ... REST OF SQL QUERY |    0.000 |

|      5 | system user |                    | NULL                | Sleep   |  3849 | Update_rows_log_event::find_row(17819732) | UPDATE `volume` SET ... REST OF SQL QUERY |    0.000 |

Then LOCKS started to buildout and processes to hang. There was nothing we could do to unblock this only restart the affected node. After restart of affected node everything seems to start again. We've checked the queries and they differ. The primary keys are set on the table(s).
Our current setup config

  mariadb.cnf: |

    [client]

    default-character-set = utf8

    [mysqld]

    core-file

    unix_socket = OFF

    performance_schema = ON

    character-set-server = utf8

    collation-server = utf8_general_ci

    ignore-db-dirs = lost+found

    max_connections = 350

    interactive_timeout = 450

    wait_timeout = 450

    join_buffer_size = 524288

    key_buffer_size = 600MB

    # InnoDB tuning

    innodb_log_file_size = 600MB

    innodb_buffer_pool_size = 5800MB

 galera.cnf: |

    [galera]

    user = mysql

    bind-address = 0.0.0.0

    default_storage_engine = InnoDB

    binlog_format = ROW

    innodb_autoinc_lock_mode = 2

    innodb_flush_log_at_trx_commit = 2

    query_cache_size = 0

    query_cache_type = 0

    # MariaDB Galera settings

    wsrep_on=ON

    wsrep_provider=/usr/lib/galera/libgalera_smm.so

    wsrep_sst_method=rsync

    wsrep_slave_threads=8

    wsrep_sync_wait=7

    # Cluster settings (automatically updated)

    wsrep_cluster_address=gcomm://

    wsrep_cluster_name=mysql

    wsrep_node_address=127.0.0.1

Is there some specific case when this can happen or why we keep hitting this error?
Are there maybe some specific options we can specify for this to not happen and get stuck?
I've tried to dig deeper into this but not really sure why it happens and how to eliminate this. It's "random" but starts to happen quite often.

UPDATE another case today:

MariaDB [(none)]> show full processlist ;

+-------+--------------+--------------------+---------------------+---------+-------+-------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+

| Id    | User         | Host               | db                  | Command | Time  | State                                     | Info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | Progress |

+-------+--------------+--------------------+---------------------+---------+-------+-------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+

|     1 | system user  |                    |                     | Busy    | 76698 | wsrep aborter idle                        | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |    0.000 |

|     2 | system user  |                    | NULL                | Sleep   | 34574 | Write_rows_log_event::write_row(18047602) | insert into snapshot |    0.000 |

|     7 | system user  |                    | NULL                | Sleep   | 34602 | Write_rows_log_event::write_row(18047598) | insert into volume |    0.000 |

|     6 | system user  |                    | NULL                | Sleep   | 34620 | wsrep applier committed                   | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |    0.000 |

|    10 | system user  |                    | NULL                | Sleep   | 34573 | Write_rows_log_event::write_row(18047603) | insert into snapshot |    0.000 |

|     8 | system user  |                    | NULL                | Sleep   | 34608 | wsrep applier committed                   | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |    0.000 |

|     5 | system user  |                    | NULL                | Sleep   | 34602 | Write_rows_log_event::write_row(18047599) | insert into volume( |    0.000 |

|     9 | system user  |                    | NULL                | Sleep   | 34620 | wsrep applier committed                   | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |    0.000 |

|    11 | system user  |                    | NULL                | Sleep   | 34603 | Write_rows_log_event::write_row(18047596) | insert into volume|    0.000 |

When I checked logs it seems there was signal 11 around that time

2021-04-22 19:23:02 59446 [Warning] Aborted connection 59446 to db: 'storage' user: 'root' host: '10.255.5.166' (Got an error reading communication packets)

2021-04-22 19:25:02 59566 [Warning] Aborted connection 59566 to db: 'storage' user: 'root' host: '10.255.5.166' (Got an error reading communication packets)

2021-04-22 19:26:02 59614 [Warning] Aborted connection 59614 to db: 'storage' user: 'root' host: '10.255.5.166' (Got an error reading communication packets)

2021-04-22 19:29:01 59804 [Warning] Aborted connection 59804 to db: 'storage' user: 'root' host: '10.255.5.166' (Got an error reading communication packets)

2021-04-22 19:30:01 59868 [Warning] Aborted connection 59868 to db: 'storage' user: 'root' host: '10.255.5.166' (Got an error reading communication packets)

210422 19:33:18 [ERROR] mysqld got signal 11 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.

Server version: 10.5.9-MariaDB-1:10.5.9+maria~focal

key_buffer_size=134217728

read_buffer_size=131072

max_used_connections=33

max_threads=252

thread_count=43

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 685804 K  bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

stack_bottom = 0x0 thread_stack 0x49000

mysqld(my_print_stacktrace+0x32)[0x55a4a3daeaf2]

mysqld(handle_fatal_signal+0x485)[0x55a4a38046f5]

However at that time it didn't create crashdump. I've send signal 11 to mysqld process which generated crash. Hopefully will help for this case https://file.io/yZfXkbIUYe0w

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

error_20210702.log.gz
645 kB
2021-07-05 15:14
system.log
1.38 MB
2021-04-26 06:03

transactions blocked by system user rows_log_event process that never finish

Details

Description

Attachments

Attachments

Activity

People

Dates

Git Integration