Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26670

Unable to maintain replication since upgrading from 10.4 to 10.5

    XMLWordPrintable

    Details

      Description

      We're seeing the following error on a very regular basis when replicating a workload of XA transactions with a 10.5.12 primary / secondary:

      210923 09:27:20 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
      210923 09:34:52 mysqld_safe Starting mariadbd daemon with databases from /opt/mysql
      2021-09-23  9:34:52 0 [Note] /usr/sbin/mariadbd (mysqld 10.5.12-MariaDB-1:10.5.12+maria~focal-log) starting as process 3812964 ...
      2021-09-23  9:34:52 0 [Note] Using unique option prefix 'myisam_recover' is error-prone and can break in the future. Please use the full name 'myisam-recover-options' instead.
      2021-09-23  9:34:52 0 [Warning] The parameter innodb_buffer_pool_instances is deprecated and has no effect.
      2021-09-23  9:34:52 0 [Warning] The parameter innodb_log_files_in_group is deprecated and has no effect.
      2021-09-23  9:34:52 0 [Note] InnoDB: Uses event mutexes
      2021-09-23  9:34:52 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
      2021-09-23  9:34:52 0 [Note] InnoDB: Number of pools: 1
      2021-09-23  9:34:52 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
      2021-09-23  9:34:52 0 [Note] mariadbd: O_TMPFILE is not supported on /opt/tmp (disabling future attempts)
      2021-09-23  9:34:52 0 [Note] InnoDB: Using Linux native AIO
      2021-09-23  9:34:52 0 [Note] InnoDB: Initializing buffer pool, total size = 5100273664, chunk size = 134217728
      2021-09-23  9:34:53 0 [Note] InnoDB: Completed initialization of buffer pool
      2021-09-23  9:34:53 0 [Note] InnoDB: Transaction 51592635 was in the XA prepared state.
      2021-09-23  9:34:53 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo
      2021-09-23  9:34:53 0 [Note] InnoDB: Trx id counter is 51592647
      2021-09-23  9:34:53 0 [Note] InnoDB: 128 rollback segments are active.
      2021-09-23  9:34:53 0 [Note] InnoDB: Starting in background the rollback of recovered transactions
      2021-09-23  9:34:53 0 [Note] InnoDB: Rollback of non-prepared transactions completed
      2021-09-23  9:34:53 0 [Note] InnoDB: Creating shared tablespace for temporary tables
      2021-09-23  9:34:53 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
      2021-09-23  9:34:53 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
      2021-09-23  9:34:53 0 [Note] InnoDB: 10.5.12 started; log sequence number 111796551705; transaction id 51592648
      2021-09-23  9:34:53 0 [Note] Plugin 'FEEDBACK' is disabled.
      2021-09-23  9:34:53 0 [Note] InnoDB: Loading buffer pool(s) from /opt/mysql/ib_buffer_pool
      2021-09-23  9:34:53 0 [Note] InnoDB: Starting recovery for XA transactions...
      2021-09-23  9:34:53 0 [Note] InnoDB: Transaction 51592635 in prepared state after recovery
      2021-09-23  9:34:53 0 [Note] InnoDB: Transaction contains changes to 3 rows
      2021-09-23  9:34:53 0 [Note] InnoDB: 1 transactions in prepared state after recovery
      2021-09-23  9:34:53 0 [Note] Found 1 prepared transaction(s) in InnoDB
      2021-09-23  9:34:53 0 [Warning] Found 1 prepared XA transactions
      2021-09-23  9:34:55 0 [Note] Server socket created on IP: '0.0.0.0'.
      2021-09-23  9:34:55 0 [Note] Reading of all Master_info entries succeeded
      2021-09-23  9:34:55 0 [Note] Added new Master_info '' to hash table
      2021-09-23  9:34:55 0 [Note] /usr/sbin/mariadbd: ready for connections.
      Version: '10.5.12-MariaDB-1:10.5.12+maria~focal-log'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  mariadb.org binary distribution
      2021-09-23  9:34:58 8 [Note] Slave I/O thread: Start asynchronous replication to master 'repl@ras2:3306' in log 'mariadb-bin.013324' at position 33344260
      2021-09-23  9:34:58 9 [Note] Slave SQL thread initialized, starting replication in log 'mariadb-bin.013323' at position 253811069, relay log '/opt/mysql/mariadb-relay-bin.000110' position: 28258665
      2021-09-23  9:34:58 8 [Note] Slave I/O thread: connected to master 'repl@ras2:3306',replication started in log 'mariadb-bin.013324' at position 33344260
      2021-09-23  9:36:08 0 [Note] InnoDB: Buffer pool(s) load completed at 210923  9:36:08
      2021-09-23 10:23:27 9 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table DBNAME.TBNAME; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mariadb-bin.013324, end_log_pos 248101224, Gtid 0-15-2237601904, Internal MariaDB error code: 1205
      2021-09-23 10:24:18 9 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table DBNAME.TBNAME; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mariadb-bin.013324, end_log_pos 248101224, Gtid 0-15-2237601904, Internal MariaDB error code: 1205
      2021-09-23 10:25:10 9 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table DBNAME.TBNAME; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mariadb-bin.013324, end_log_pos 248101224, Gtid 0-15-2237601904, Internal MariaDB error code: 1205
      2021-09-23 10:26:03 9 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table DBNAME.TBNAME; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mariadb-bin.013324, end_log_pos 248101224, Gtid 0-15-2237601904, Internal MariaDB error code: 1205
      2021-09-23 10:26:57 9 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table DBNAME.TBNAME; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mariadb-bin.013324, end_log_pos 248101224, Gtid 0-15-2237601904, Internal MariaDB error code: 1205
      2021-09-23 10:27:52 9 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table DBNAME.TBNAME; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mariadb-bin.013324, end_log_pos 248101224, Gtid 0-15-2237601904, Internal MariaDB error code: 1205
      2021-09-23 10:28:48 9 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table DBNAME.TBNAME; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mariadb-bin.013324, end_log_pos 248101224, Gtid 0-15-2237601904, Internal MariaDB error code: 1205
      2021-09-23 10:29:44 9 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table DBNAME.TBNAME; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mariadb-bin.013324, end_log_pos 248101224, Gtid 0-15-2237601904, Internal MariaDB error code: 1205
      2021-09-23 10:30:40 9 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table DBNAME.TBNAME; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mariadb-bin.013324, end_log_pos 248101224, Gtid 0-15-2237601904, Internal MariaDB error code: 1205
      2021-09-23 10:31:36 9 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table DBNAME.TBNAME; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mariadb-bin.013324, end_log_pos 248101224, Gtid 0-15-2237601904, Internal MariaDB error code: 1205
      2021-09-23 10:32:32 9 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table DBNAME.TBNAME; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mariadb-bin.013324, end_log_pos 248101224, Gtid 0-15-2237601904, Internal MariaDB error code: 1205
      2021-09-23 10:32:32 9 [ERROR] Slave SQL thread retried transaction 10 time(s) in vain, giving up. Consider raising the value of the slave_transaction_retries variable.
      2021-09-23 10:32:32 9 [ERROR] Slave SQL: Lock wait timeout exceeded; try restarting transaction, Gtid 0-15-2237601904, Internal MariaDB error code: 1205
      2021-09-23 10:32:32 9 [Warning] Slave: Lock wait timeout exceeded; try restarting transaction Error_code: 1205
      2021-09-23 10:32:32 9 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mariadb-bin.013324' position 248100321
      2021-09-23 10:32:32 9 [Note] Slave SQL thread exiting, replication stopped in log 'mariadb-bin.013324' at position 248100321
      2021-09-23 10:32:32 9 [Note] master was ras2:3306
      2021-09-23 10:33:06 0 [Note] /usr/sbin/mariadbd (initiated by: unknown): Normal shutdown
      2021-09-23 10:33:06 0 [Note] Event Scheduler: Purging the queue. 4 events
      2021-09-23 10:33:06 8 [Note] Slave I/O thread exiting, read up to log 'mariadb-bin.013325', position 27431871
      2021-09-23 10:33:06 8 [Note] master was ras2:3306
      2021-09-23 10:33:06 0 [Note] InnoDB: FTS optimize thread exiting.
      2021-09-23 10:33:06 0 [Note] InnoDB: Starting shutdown...
      2021-09-23 10:33:06 0 [Note] InnoDB: Dumping buffer pool(s) to /opt/mysql/ib_buffer_pool
      2021-09-23 10:33:06 0 [Note] InnoDB: Restricted to 76617 pages due to innodb_buf_pool_dump_pct=25
      2021-09-23 10:33:06 0 [Note] InnoDB: Buffer pool(s) dump completed at 210923 10:33:06
      2021-09-23 10:34:01 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
      2021-09-23 10:34:01 0 [Note] InnoDB: Shutdown completed; log sequence number 111905493822; transaction id 51806218
      2021-09-23 10:34:01 0 [Note] /usr/sbin/mariadbd: Shutdown complete
      

      The replication snapshot is setup with:

       mysqldump \
                      -u"${REP_USER_NAME}" -p"${REP_USER_PASSWORD}" \
                      --single-transaction \
                      --routines --events \
                      --master-data -h"${DB_MASTER_IP}" \
                      DBNAME
      

      We've tried changing slave_parallel_mode back to conservative and non-zero values for slave_parallel_threads but nothing keeps replication alive consistently. We're seeing this across multiple installs.

      What can we do to debug / analyze this problem?

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              marko Marko Mäkelä
              Reporter:
              jellisgwn Jon Ellis
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Git Integration