[MDEV-25469] transactions blocked by system user rows_log_event process that never finish Created: 2021-04-21 Updated: 2022-02-01 Resolved: 2022-01-31 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB, wsrep |
| Affects Version/s: | 10.3.29, 10.5.9, 10.5 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Critical |
| Reporter: | Jaroslav | Assignee: | Seppo Jaakola |
| Resolution: | Incomplete | Votes: | 3 |
| Labels: | None | ||
| Attachments: |
|
| Description |
|
Hi.
Then LOCKS started to buildout and processes to hang. There was nothing we could do to unblock this only restart the affected node. After restart of affected node everything seems to start again. We've checked the queries and they differ. The primary keys are set on the table(s).
Is there some specific case when this can happen or why we keep hitting this error? UPDATE another case today:
When I checked logs it seems there was signal 11 around that time
However at that time it didn't create crashdump. I've send signal 11 to mysqld process which generated crash. Hopefully will help for this case https://file.io/yZfXkbIUYe0w |
| Comments |
| Comment by Gavin Pidgley [ 2021-04-23 ] |
|
Hi, We also see the same issue on MariaDB 10.3.28 (10.3.28+maria~bionic). Example of a problematic query: | 2 | system user | | NULL | Sleep | 624 | Write_rows_log_event::write_row(21328484) | INSERT INTO `tm_options` (`option_name`, `option_value`, `autoload`) VALUES ('_transient_doing_cron', '1619186903.2794630527496337890625', 'yes') ON DUPLICATE KEY UPDATE `option_name` = VALUES(`option_name`), `option_value` = VALUES(`option_value`), `autoload` = VALUES(`autoload`) | 0.000 | Same symptoms of queries backing up on all nodes until mysqld is killed on the node in question. |
| Comment by Jan de Lalène [ 2021-07-05 ] |
|
Hello, attached the extract from /var/log/mysql/error.log error_20210702.log.gz currently we cannot reproduce (force) the error, but observe it 1-5 times a week |
| Comment by Seppo Jaakola [ 2021-12-30 ] |
|
To troubleshoot this further, more information is needed of the execution state in the hanging node. In jaroslav scenario, the table definitions of the related tables 'volume' and 'snapshot' are of interest, and the SQL statements executed in the hanging transactions. For all the above incarnations of this type of issue, configuring wsrep_provider_options with: 'cert.optimistic_pa=NO', would be effective workaround. |
| Comment by Jan de Lalène [ 2022-02-01 ] |
|
Just as a hint - we have "fixed" the issue (at least is has never occurred since): |
| Comment by Steve Baroti [ 2022-02-01 ] |
|
Hello Jan de Lalene, |