[MDEV-26126] Replication delay increasing infinitely from grandpa master when enable parallel replication and use temporary table? Created: 2021-07-12  Updated: 2021-08-09

Status: Open
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.4.18
Fix Version/s: 10.4

Type: Bug Priority: Major
Reporter: Kuijun Cui Assignee: Andrei Elkin
Resolution: Unresolved Votes: 0
Labels: replication
Environment:

Distributor ID: CentOS
Description: CentOS Linux release 7.8.2003 (Core)
Release: 7.8.2003
Codename: Core
Linux 3.10.0-1127.19.1.el7.x86_64 #1 SMP Tue Aug 25 17:23:54 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux


Issue Links:
Relates
relates to MDEV-10356 rpl.rpl_parallel_temptable failure du... Closed
relates to MDEV-25064 rpl.rpl_parallel_temptable failed in ... Closed

 Description   

Dear community:

we have a replcation topological structure like below

t1-node-master-dbmyd01
    |    |     A
    |    V     |
    |   t1-node-master-dbmyd02(direct slave, enable parallel replication)
    |
    V
   t1-node-master-dbmyd03(direct slave, enable parallel replication)
            |
            V
            t1-node-standby-dbmym01(indrect slave, enable parallel replication, issue happened about replication issue)

When Im testing the below case on t1-node-master-dbmyd01:

use sbtest;
DROP TABLE IF EXISTS real_table;
CREATE TABLE real_table (
  id int primary key,
  str varchar(10)
);
 
DROP TEMPORARY  TABLE IF EXISTS tmp_table;
CREATE TEMPORARY TABLE tmp_table (
  id int primary key,
  str varchar(10)
);
 
INSERT INTO real_table VALUES (1, 'str1');
INSERT INTO real_table VALUES (2, 'str2');
INSERT INTO tmp_table SELECT id, str FROM real_table;
UPDATE real_table SET str = concat(str, '***') WHERE id IN (SELECT id FROM tmp_table);

It will cause t1-node-standby-dbmym01 replication delay, and can not be disappeared.

Node t1-node-master-dbmyd02 and t1-node-master-dbmyd03 did not have this issue.

And when I'm checking the processlist, can find below prompt on t1-node-standby-dbmym01:

 
(dbausr@localhost) [(none)] 18:39:34> show processlist;
+--------+-------------+-----------+------+--------------+------+-----------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+----------+
| Id     | User        | Host      | db   | Command      | Time | State                                                                       | Info                                                                                    | Progress |
+--------+-------------+-----------+------+--------------+------+-----------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+----------+
|      1 | system user |           | NULL | Daemon       | NULL | InnoDB purge coordinator                                                    | NULL                                                                                    |    0.000 |
|      2 | system user |           | NULL | Daemon       | NULL | InnoDB purge worker                                                         | NULL                                                                                    |    0.000 |
|      4 | system user |           | NULL | Daemon       | NULL | InnoDB purge worker                                                         | NULL                                                                                    |    0.000 |
|      3 | system user |           | NULL | Daemon       | NULL | InnoDB purge worker                                                         | NULL                                                                                    |    0.000 |
|      5 | system user |           | NULL | Daemon       | NULL | InnoDB shutdown handler                                                     | NULL                                                                                    |    0.000 |
| 463111 | dbausr      | localhost | NULL | Query        |    0 | Init                                                                        | show processlist                                                                        |    0.000 |
| 463112 | system user |           | NULL | Slave_IO     |   42 | Waiting for master to send event                                            | NULL                                                                                    |    0.000 |
| 463114 | system user |           | NULL | Slave_worker |   42 | Waiting for work from SQL thread                                            | NULL                                                                                    |    0.000 |
| 463115 | system user |           | NULL | Slave_worker |   42 | Waiting for work from SQL thread                                            | NULL                                                                                    |    0.000 |
| 463116 | system user |           | NULL | Slave_worker |   42 | Waiting for work from SQL thread                                            | NULL                                                                                    |    0.000 |
| 463117 | system user |           | NULL | Slave_worker |   42 | Waiting for work from SQL thread                                            | NULL                                                                                    |    0.000 |
| 463118 | system user |           | NULL | Slave_worker |   42 | Waiting for work from SQL thread                                            | NULL                                                                                    |    0.000 |
| 463120 | system user |           | NULL | Slave_worker |   42 | Waiting for work from SQL thread                                            | NULL                                                                                    |    0.000 |
| 463119 | system user |           | NULL | Slave_worker |   42 | Waiting for work from SQL thread                                            | NULL                                                                                    |    0.000 |
| 463121 | system user |           | NULL | Slave_worker |   42 | Waiting for work from SQL thread                                            | NULL                                                                                    |    0.000 |
| 463122 | system user |           | NULL | Slave_worker |   42 | Waiting for work from SQL thread                                            | NULL                                                                                    |    0.000 |
| 463123 | system user |           | NULL | Slave_worker |   42 | Waiting for work from SQL thread                                            | NULL                                                                                    |    0.000 |
| 463124 | system user |           | NULL | Slave_worker |   42 | Waiting for work from SQL thread                                            | NULL                                                                                    |    0.000 |
| 463125 | system user |           | NULL | Slave_worker |   42 | Waiting for work from SQL thread                                            | NULL                                                                                    |    0.000 |
| 463126 | system user |           | NULL | Slave_worker |   42 | Waiting for work from SQL thread                                            | NULL                                                                                    |    0.000 |
| 463127 | system user |           | NULL | Slave_worker |   42 | Waiting for work from SQL thread                                            | NULL                                                                                    |    0.000 |
| 463128 | system user |           | NULL | Slave_worker |   42 | Update_rows_log_event::find_row(-1)                                         | UPDATE real_table SET str = concat(str, '@@@@@') WHERE id IN (SELECT id FROM tmp_table) |    0.000 |
| 463129 | system user |           | NULL | Slave_worker |   42 | Waiting for work from SQL thread                                            | NULL                                                                                    |    0.000 |
| 463130 | system user |           | NULL | Slave_worker |   42 | Waiting for prior transaction to commit                                     | INSERT INTO real_table VALUES (2, 'str2')                                               |    0.000 |
| 463131 | system user |           | NULL | Slave_worker |   42 | Waiting for work from SQL thread                                            | NULL                                                                                    |    0.000 |
| 463132 | system user |           | NULL | Slave_worker |   42 | Waiting for prior transaction to commit                                     | INSERT INTO real_table VALUES (1, 'str1')                                               |    0.000 |
| 463133 | system user |           | NULL | Slave_worker |   42 | Waiting for work from SQL thread                                            | NULL                                                                                    |    0.000 |
| 463113 | system user |           | NULL | Slave_SQL    |   42 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL                                                                                    |    0.000 |
+--------+-------------+-----------+------+--------------+------+-----------------------------------------------------------------------------+-----------------------------------------------------------------------------------------+----------+

after several times checking, the processlist promt will like below:

(dbausr@localhost) [(none)] 18:40:05> show processlist;
+--------+-------------+-----------+------+--------------+------+-----------------------------------------------------------------------------+-------------------------------------------+----------+
| Id     | User        | Host      | db   | Command      | Time | State                                                                       | Info                                      | Progress |
+--------+-------------+-----------+------+--------------+------+-----------------------------------------------------------------------------+-------------------------------------------+----------+
|      1 | system user |           | NULL | Daemon       | NULL | InnoDB purge coordinator                                                    | NULL                                      |    0.000 |
|      2 | system user |           | NULL | Daemon       | NULL | InnoDB purge worker                                                         | NULL                                      |    0.000 |
|      4 | system user |           | NULL | Daemon       | NULL | InnoDB purge worker                                                         | NULL                                      |    0.000 |
|      3 | system user |           | NULL | Daemon       | NULL | InnoDB purge worker                                                         | NULL                                      |    0.000 |
|      5 | system user |           | NULL | Daemon       | NULL | InnoDB shutdown handler                                                     | NULL                                      |    0.000 |
| 463111 | dbausr      | localhost | NULL | Query        |    0 | Init                                                                        | show processlist                          |    0.000 |
| 463112 | system user |           | NULL | Slave_IO     |   59 | Waiting for master to send event                                            | NULL                                      |    0.000 |
| 463114 | system user |           | NULL | Slave_worker |   59 | Waiting for work from SQL thread                                            | NULL                                      |    0.000 |
| 463115 | system user |           | NULL | Slave_worker |   59 | Waiting for work from SQL thread                                            | NULL                                      |    0.000 |
| 463116 | system user |           | NULL | Slave_worker |   59 | Waiting for work from SQL thread                                            | NULL                                      |    0.000 |
| 463117 | system user |           | NULL | Slave_worker |   59 | Waiting for work from SQL thread                                            | NULL                                      |    0.000 |
| 463118 | system user |           | NULL | Slave_worker |   59 | Waiting for work from SQL thread                                            | NULL                                      |    0.000 |
| 463120 | system user |           | NULL | Slave_worker |   59 | Waiting for work from SQL thread                                            | NULL                                      |    0.000 |
| 463119 | system user |           | NULL | Slave_worker |   59 | Waiting for work from SQL thread                                            | NULL                                      |    0.000 |
| 463121 | system user |           | NULL | Slave_worker |   59 | Waiting for work from SQL thread                                            | NULL                                      |    0.000 |
| 463122 | system user |           | NULL | Slave_worker |   59 | Waiting for work from SQL thread                                            | NULL                                      |    0.000 |
| 463123 | system user |           | NULL | Slave_worker |   59 | Waiting for work from SQL thread                                            | NULL                                      |    0.000 |
| 463124 | system user |           | NULL | Slave_worker |   59 | Waiting for work from SQL thread                                            | NULL                                      |    0.000 |
| 463125 | system user |           | NULL | Slave_worker |   59 | Waiting for work from SQL thread                                            | NULL                                      |    0.000 |
| 463126 | system user |           | NULL | Slave_worker |   59 | Waiting for work from SQL thread                                            | NULL                                      |    0.000 |
| 463127 | system user |           | NULL | Slave_worker |   59 | Waiting for work from SQL thread                                            | NULL                                      |    0.000 |
| 463128 | system user |           | NULL | Slave_worker |   59 | Waiting for prior transaction to commit                                     | NULL                                      |    0.000 |
| 463129 | system user |           | NULL | Slave_worker |   59 | Waiting for work from SQL thread                                            | NULL                                      |    0.000 |
| 463130 | system user |           | NULL | Slave_worker |   59 | Waiting for prior transaction to commit                                     | INSERT INTO real_table VALUES (2, 'str2') |    0.000 |
| 463131 | system user |           | NULL | Slave_worker |   59 | Waiting for work from SQL thread                                            | NULL                                      |    0.000 |
| 463132 | system user |           | NULL | Slave_worker |   59 | Waiting for prior transaction to commit                                     | INSERT INTO real_table VALUES (1, 'str1') |    0.000 |
| 463133 | system user |           | NULL | Slave_worker |   59 | Waiting for work from SQL thread                                            | NULL                                      |    0.000 |
| 463113 | system user |           | NULL | Slave_SQL    |   59 | Slave has read all relay log; waiting for the slave I/O thread to update it | NULL                                      |    0.000 |
+--------+-------------+-----------+------+--------------+------+-----------------------------------------------------------------------------+-------------------------------------------+----------+

After I disabled the parallel replication, this phenomenon disappeared, so is there any issue will cause the replication delay when enable parallel replication and use temporary table?

below is node t1-node-standby-dbmym01's configuration options:

[mysqld]
########## Server Level ##########
large_pages=ON
port=3306
basedir=/opt/maria10.1
datadir=/opt/maria10.1/data/db
tmpdir=/opt/maria10.1/data/tmp
socket=/opt/maria10.1/data/mysql.sock
pid-file=/opt/maria10.1/data/mysqld.pid
lc-messages-dir=/opt/maria10.1/share
lc-messages=en_US
lower-case-table-names=1
 
##### Character #####
character-set-client-handshake=FALSE
character-set-server=utf8mb4
collation-server=utf8mb4_unicode_ci
 
##### Security/Mode #####
skip_name_resolve=1
bind_address=0.0.0.0
explicit_defaults_for_timestamp=1
sql_mode=ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
skip-ssl
#ssl
#ssl_ca=/opt/maria10.1/data/ssl/ca-cert.pem
#ssl_cert=/opt/maria10.1/data/ssl/server-cert.pem
#ssl_key=/opt/maria10.1/data/ssl/server-key.pem
 
##### Connection #####
max_connections=3050
max_connect_errors=65536
max_user_connections=3050
max_allowed_packet=128M
connect_timeout=20
back_log=65535
net_read_timeout=30
net_write_timeout=60
net_buffer_length=2K # default 16K
max_prepared_stmt_count=1048576
 
##### per-thread Buffer #####
#read_buffer_size=128K
#read_rnd_buffer_size=256K
sort_buffer_size=4M
join_buffer_size=2M
binlog_cache_size=4M
thread_stack=292K
## MRR
optimizer_switch="index_merge_sort_intersection=on,mrr=on,mrr_cost_based=on,mrr_sort_keys=on,optimize_join_buffer_size=on"
join_cache_level=8
join_buffer_space_limit=250M
mrr_buffer_size=4M
 
##### Global Buffer #####
key_buffer_size=10M
query_cache_type=0
query_cache_size=0
#query_cache_size=64M
#query_cache_limit=8M
#query_cache_min_res_unit=1M
 
##### THREAD POOL #####
thread_handling=pool-of-threads
thread_pool_max_threads=3050
#thread_cache_size=256 # If the thread pool is active, thread_cache_size is ignored.
 
##### TABLE CACHE #####
table_definition_cache=16500
table_open_cache=2000
table_open_cache_instances=8
 
##### IO #####
open_files_limit=65000
 
##### OTHERS #####
tmp_table_size=500M
max_heap_table_size=16M # Memory Engine
 
 
########## Server LOG ##########
##### general log #####
general_log=0
#general_log_file=/opt/maria10.1/log/general.log
 
##### error log #####
log_error=/opt/maria10.1/log/alert_TMRART01.log
log_warnings=2
 
##### binary log #####
server_id=16110
log_bin=/opt/maria10.1/binlog/BINLOG
binlog_format=ROW
binlog_row_image=FULL
sync_binlog=1
log_slave_updates=1
expire_logs_days=7
max_binlog_size=128M
max_binlog_cache_size=2G # equal to wsrep_max_ws_size
 
##### GTID #####
gtid_domain_id=111
gtid_strict_mode=1
gtid_ignore_duplicates=ON
 
##### slow log #####
slow_query_log=1
slow_query_log_file=/opt/maria10.1/log/slow.log
log_output='FILE'
long_query_time=0.5
log_slow_admin_statements=0
log_slow_rate_limit=1
log_slow_verbosity='query_plan,explain'
#log_queries_not_using_indexes=1
 
##### audit log #####
plugin-load=server_audit=server_audit.so
server_audit=FORCE_PLUS_PERMANENT
server_audit_logging=1
server_audit_file_path=/opt/maria10.1/log/server_audit.log
server_audit_events='CONNECT,QUERY'
server_audit_file_rotate_size=524288000 # 500M
server_audit_file_rotations=9
server_audit_output_type='file'
#server_audit_excl_users='lepus_monitor'
server_audit_query_log_limit=10240
 
##### sql error log #####
plugin_load_add=sql_errlog
sql_error_log_filename='/opt/maria10.1/log/sql_errors.log'
sql_error_log_rotate=1
sql_error_log_rotations=9
sql_error_log_size_limit=524288000 # 500M
 
########## INNODB ##########
default_storage_engine=InnoDB
##### Tablespace #####
innodb_file_per_table=1
 
##### Buffer Pool #####
#innodb_data_file_path=ibdata1:50M;ibdata2:50M:autoextend
innodb_buffer_pool_size=100G # innodb data cache. 70% of total RAM
innodb_buffer_pool_instances=8
innodb_log_buffer_size=200M
innodb_max_dirty_pages_pct=60
innodb_buffer_pool_dump_pct=60
 
##### Redo #####
innodb_log_file_size=1280M
innodb_log_files_in_group=2
 
##### Undo #####
innodb_undo_directory=/opt/maria10.1/data/db
innodb_undo_tablespaces=4
innodb_undo_logs=128
innodb_purge_threads=16
innodb_purge_batch_size=5000
 
##### Locking #####
innodb_print_all_deadlocks=1
innodb_lock_wait_timeout=50 # 50s
innodb_autoinc_lock_mode=2
innodb_rollback_on_timeout=1
 
##### Data Security #####
innodb_doublewrite=1
innodb_strict_mode=1
innodb_flush_log_at_trx_commit=1
transaction-isolation=READ-COMMITTED
 
##### CPU #####
innodb_sync_array_size=8
 
##### IO #####
innodb_flush_method=O_DIRECT
innodb_purge_threads=4
innodb_io_capacity=10000 # if SSD, set 4000
innodb_flush_neighbors=1 # if SSD, set 0
innodb_read_io_threads=16
innodb_write_io_threads=8
innodb_open_files=60000
innodb_use_native_aio=1
 
##### Turning #####
#innodb_defragment=1
 
########## Performance Schema ##########
performance_schema=ON
performance_schema_instrument='stage/%=ON'
performance_schema_instrument='%mdl%=ON'
performance_schema_consumer_events_stages_current=ON
performance_schema_consumer_events_stages_history_long=ON
performance_schema_consumer_events_statements_current=ON
performance_schema_consumer_events_statements_history_long=ON
performance_schema_consumer_events_waits_current=ON
performance_schema_consumer_events_waits_history_long=ON
 
########## Rpl_Slave ##########
##### only on slave node #####
relay_log=/opt/maria10.1/binlog/RELAY
max_relay_log_size=128M
relay_log_recovery=1
relay_log_purge=1
replicate_wild_ignore_table=sys.%
replicate_wild_ignore_table=test.%
replicate_wild_ignore_table=jnxdba.%
replicate_wild_ignore_table=mysql.slow_log
read_only=1
skip_slave_start=1
 
########## Rpl_Semi_Sync ##########
#rpl_semi_sync_master_wait_point=AFTER_SYNC
##### master #####
#rpl_semi_sync_master_enabled=0
#rpl_semi_sync_master_timeout=1000 # 1s
#rpl_semi_sync_master_timeout=18446744073709551615 # max
##### slave #####
#rpl_semi_sync_slave_enabled=1
slave_parallel_mode=optimistic
slave_parallel_threads=20
slave_parallel_max_queued=2147483647 #max value
 
[galera]
########## GALERA ##########
wsrep_on=OFF # switch of galera
wsrep_cluster_name="mariarac"
wsrep_cluster_address=gcomm://
#wsrep_cluster_address="gcomm://IPDB1,IPDB2,IPDB3"
#wsrep_node_name=hostname
#wsrep_node_address="IPDBLOCAL"
wsrep_provider=/opt/maria10.1/galera/libgalera_smm.so
wsrep_provider_options="gcache.size=10G; gcache.page_size=128M; gcache.keep_pages_size=5G; gmcast.listen_addr=tcp://IPDBLOCAL:4567; gcs.fc_limit=256; gcs.fc_factor=0.9; evs.keepalive_period=PT10S; evs.suspect_timeout=PT30S; evs.inactive_timeout=PT45S; evs.install_timeout=PT45S;"
wsrep_slave_threads=1
wsrep_max_ws_rows=0
wsrep_max_ws_size=2147483647 # 2G, default/max value
wsrep_log_conflicts=1
wsrep_sync_wait=1
##### GTID #####
wsrep_gtid_mode=ON
wsrep_gtid_domain_id=1
##### SST #####
#wsrep_sst_method=xtrabackup-v2 # if 10.2, xtrabackup
wsrep_sst_method=mariabackup # if 10.3, mariabackup
wsrep_sst_auth=bkpuser:bkpuser
##### DEBUG #####
#wsrep_debug=0
##### Galera-4 #####
#wsrep_ignore_apply_errors=7
 
# this is only for embedded server
[embedded]
 
# This group is only read by MariaDB-10.3 servers.
# If you use the same .cnf file for MariaDB of different versions,
# use this group for options that older servers don't understand
[mariadb-10.3]
#innodb_fast_shutdown=1
 
# MariaDB Plugin
#plugin_maturity=gamma
#plugin_load_add=metadata_lock_info
#plugin_load_add=query_cache_info
#plugin_load_add=wsrep_info
#plugin_load_add=ha_connect
#plugin_load_add=disks
 
[mariadb-10.4]
unix_socket=OFF
 
[xtrabackup]
user=bkpuser
password=bkpuser



 Comments   
Comment by Kuijun Cui [ 2021-07-12 ]

It can be reproducible.

Generated at Thu Feb 08 09:42:56 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.