[MDEV-26670] Unable to maintain replication since upgrading from 10.4 to 10.5 - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Duplicate
Affects Version/s: 10.5.12
Fix Version/s: 10.5.13, 10.6.5
Component/s: Replication
Labels:
- regression
Environment:
Ubuntu Focal 20.04.3 LTS

Description

We're seeing the following error on a very regular basis when replicating a workload of XA transactions with a 10.5.12 primary / secondary:

210923 09:27:20 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended

210923 09:34:52 mysqld_safe Starting mariadbd daemon with databases from /opt/mysql

2021-09-23  9:34:52 0 [Note] /usr/sbin/mariadbd (mysqld 10.5.12-MariaDB-1:10.5.12+maria~focal-log) starting as process 3812964 ...

2021-09-23  9:34:52 0 [Note] Using unique option prefix 'myisam_recover' is error-prone and can break in the future. Please use the full name 'myisam-recover-options' instead.

2021-09-23  9:34:52 0 [Warning] The parameter innodb_buffer_pool_instances is deprecated and has no effect.

2021-09-23  9:34:52 0 [Warning] The parameter innodb_log_files_in_group is deprecated and has no effect.

2021-09-23  9:34:52 0 [Note] InnoDB: Uses event mutexes

2021-09-23  9:34:52 0 [Note] InnoDB: Compressed tables use zlib 1.2.11

2021-09-23  9:34:52 0 [Note] InnoDB: Number of pools: 1

2021-09-23  9:34:52 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions

2021-09-23  9:34:52 0 [Note] mariadbd: O_TMPFILE is not supported on /opt/tmp (disabling future attempts)

2021-09-23  9:34:52 0 [Note] InnoDB: Using Linux native AIO

2021-09-23  9:34:52 0 [Note] InnoDB: Initializing buffer pool, total size = 5100273664, chunk size = 134217728

2021-09-23  9:34:53 0 [Note] InnoDB: Completed initialization of buffer pool

2021-09-23  9:34:53 0 [Note] InnoDB: Transaction 51592635 was in the XA prepared state.

2021-09-23  9:34:53 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo

2021-09-23  9:34:53 0 [Note] InnoDB: Trx id counter is 51592647

2021-09-23  9:34:53 0 [Note] InnoDB: 128 rollback segments are active.

2021-09-23  9:34:53 0 [Note] InnoDB: Starting in background the rollback of recovered transactions

2021-09-23  9:34:53 0 [Note] InnoDB: Rollback of non-prepared transactions completed

2021-09-23  9:34:53 0 [Note] InnoDB: Creating shared tablespace for temporary tables

2021-09-23  9:34:53 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...

2021-09-23  9:34:53 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.

2021-09-23  9:34:53 0 [Note] InnoDB: 10.5.12 started; log sequence number 111796551705; transaction id 51592648

2021-09-23  9:34:53 0 [Note] Plugin 'FEEDBACK' is disabled.

2021-09-23  9:34:53 0 [Note] InnoDB: Loading buffer pool(s) from /opt/mysql/ib_buffer_pool

2021-09-23  9:34:53 0 [Note] InnoDB: Starting recovery for XA transactions...

2021-09-23  9:34:53 0 [Note] InnoDB: Transaction 51592635 in prepared state after recovery

2021-09-23  9:34:53 0 [Note] InnoDB: Transaction contains changes to 3 rows

2021-09-23  9:34:53 0 [Note] InnoDB: 1 transactions in prepared state after recovery

2021-09-23  9:34:53 0 [Note] Found 1 prepared transaction(s) in InnoDB

2021-09-23  9:34:53 0 [Warning] Found 1 prepared XA transactions

2021-09-23  9:34:55 0 [Note] Server socket created on IP: '0.0.0.0'.

2021-09-23  9:34:55 0 [Note] Reading of all Master_info entries succeeded

2021-09-23  9:34:55 0 [Note] Added new Master_info '' to hash table

2021-09-23  9:34:55 0 [Note] /usr/sbin/mariadbd: ready for connections.

Version: '10.5.12-MariaDB-1:10.5.12+maria~focal-log'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  mariadb.org binary distribution

2021-09-23  9:34:58 8 [Note] Slave I/O thread: Start asynchronous replication to master 'repl@ras2:3306' in log 'mariadb-bin.013324' at position 33344260

2021-09-23  9:34:58 9 [Note] Slave SQL thread initialized, starting replication in log 'mariadb-bin.013323' at position 253811069, relay log '/opt/mysql/mariadb-relay-bin.000110' position: 28258665

2021-09-23  9:34:58 8 [Note] Slave I/O thread: connected to master 'repl@ras2:3306',replication started in log 'mariadb-bin.013324' at position 33344260

2021-09-23  9:36:08 0 [Note] InnoDB: Buffer pool(s) load completed at 210923  9:36:08

2021-09-23 10:23:27 9 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table DBNAME.TBNAME; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mariadb-bin.013324, end_log_pos 248101224, Gtid 0-15-2237601904, Internal MariaDB error code: 1205

2021-09-23 10:24:18 9 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table DBNAME.TBNAME; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mariadb-bin.013324, end_log_pos 248101224, Gtid 0-15-2237601904, Internal MariaDB error code: 1205

2021-09-23 10:25:10 9 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table DBNAME.TBNAME; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mariadb-bin.013324, end_log_pos 248101224, Gtid 0-15-2237601904, Internal MariaDB error code: 1205

2021-09-23 10:26:03 9 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table DBNAME.TBNAME; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mariadb-bin.013324, end_log_pos 248101224, Gtid 0-15-2237601904, Internal MariaDB error code: 1205

2021-09-23 10:26:57 9 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table DBNAME.TBNAME; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mariadb-bin.013324, end_log_pos 248101224, Gtid 0-15-2237601904, Internal MariaDB error code: 1205

2021-09-23 10:27:52 9 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table DBNAME.TBNAME; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mariadb-bin.013324, end_log_pos 248101224, Gtid 0-15-2237601904, Internal MariaDB error code: 1205

2021-09-23 10:28:48 9 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table DBNAME.TBNAME; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mariadb-bin.013324, end_log_pos 248101224, Gtid 0-15-2237601904, Internal MariaDB error code: 1205

2021-09-23 10:29:44 9 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table DBNAME.TBNAME; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mariadb-bin.013324, end_log_pos 248101224, Gtid 0-15-2237601904, Internal MariaDB error code: 1205

2021-09-23 10:30:40 9 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table DBNAME.TBNAME; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mariadb-bin.013324, end_log_pos 248101224, Gtid 0-15-2237601904, Internal MariaDB error code: 1205

2021-09-23 10:31:36 9 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table DBNAME.TBNAME; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mariadb-bin.013324, end_log_pos 248101224, Gtid 0-15-2237601904, Internal MariaDB error code: 1205

2021-09-23 10:32:32 9 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table DBNAME.TBNAME; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mariadb-bin.013324, end_log_pos 248101224, Gtid 0-15-2237601904, Internal MariaDB error code: 1205

2021-09-23 10:32:32 9 [ERROR] Slave SQL thread retried transaction 10 time(s) in vain, giving up. Consider raising the value of the slave_transaction_retries variable.

2021-09-23 10:32:32 9 [ERROR] Slave SQL: Lock wait timeout exceeded; try restarting transaction, Gtid 0-15-2237601904, Internal MariaDB error code: 1205

2021-09-23 10:32:32 9 [Warning] Slave: Lock wait timeout exceeded; try restarting transaction Error_code: 1205

2021-09-23 10:32:32 9 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mariadb-bin.013324' position 248100321

2021-09-23 10:32:32 9 [Note] Slave SQL thread exiting, replication stopped in log 'mariadb-bin.013324' at position 248100321

2021-09-23 10:32:32 9 [Note] master was ras2:3306

2021-09-23 10:33:06 0 [Note] /usr/sbin/mariadbd (initiated by: unknown): Normal shutdown

2021-09-23 10:33:06 0 [Note] Event Scheduler: Purging the queue. 4 events

2021-09-23 10:33:06 8 [Note] Slave I/O thread exiting, read up to log 'mariadb-bin.013325', position 27431871

2021-09-23 10:33:06 8 [Note] master was ras2:3306

2021-09-23 10:33:06 0 [Note] InnoDB: FTS optimize thread exiting.

2021-09-23 10:33:06 0 [Note] InnoDB: Starting shutdown...

2021-09-23 10:33:06 0 [Note] InnoDB: Dumping buffer pool(s) to /opt/mysql/ib_buffer_pool

2021-09-23 10:33:06 0 [Note] InnoDB: Restricted to 76617 pages due to innodb_buf_pool_dump_pct=25

2021-09-23 10:33:06 0 [Note] InnoDB: Buffer pool(s) dump completed at 210923 10:33:06

2021-09-23 10:34:01 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"

2021-09-23 10:34:01 0 [Note] InnoDB: Shutdown completed; log sequence number 111905493822; transaction id 51806218

2021-09-23 10:34:01 0 [Note] /usr/sbin/mariadbd: Shutdown complete

The replication snapshot is setup with:

 mysqldump \

                -u"${REP_USER_NAME}" -p"${REP_USER_PASSWORD}" \

                --single-transaction \

                --routines --events \

                --master-data -h"${DB_MASTER_IP}" \

                DBNAME

We've tried changing slave_parallel_mode back to conservative and non-zero values for slave_parallel_threads but nothing keeps replication alive consistently. We're seeing this across multiple installs.

What can we do to debug / analyze this problem?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

variables-10.5.txt
2021-09-23 15:03
19 kB
Jon Ellis

Issue Links

is duplicated by

MDEV-26682 slave lock timeout with xa and gap locks

Closed

Unable to maintain replication since upgrading from 10.4 to 10.5

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration