[MDEV-21189] Dropping partition with 'wsrep_OSU_method=RSU' and 'SESSION sql_log_bin = 0' cases the galera node to hang - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Blocker
Resolution: Fixed
Affects Version/s: 10.2.27
Fix Version/s: 10.2.31, 10.3.22, 10.4.12
Component/s: Galera
Labels:
None

Description

Suppose there is a 3 node mariadb galera cluster and an async replicated node connected to it. The galera cluster has a partitioned table that you wish to keep only for a few weeks, while the slave has several months of data in it. Once in a while you have to drop the partitions in the galera cluster without affecting the async slave.

On the load-balancer (haproxy): disable traffic to galera1 node.

galera1 node)
SET GLOBAL wsrep_OSU_method='RSU';
SET SESSION sql_log_bin = 0;

ALTER TABLE db.table DROP PARTITION 201909;
ALTER TABLE db.table DROP PARTITION 201910;
ALTER TABLE db.table DROP PARTITION 201911;

And so on for all galera nodes. This way the galera nodes will have the 201909-201911 partitions dropped, while the slave will have the table as before.

In some cases the DROP PARTITION statement causes the mariadb process to hang, the operation doesn't end and a new connection cannot be made to the server. Server version 10.2.27. There is nothing in the error log.

Galera nodes have the following configuration:
ignore_db_dirs = lost+found
innodb_buffer_pool_size = 120G
innodb_flush_log_at_trx_commit = 1
max_allowed_packet = 60M
table_open_cache = 800

wsrep_on=ON
binlog_format=ROW
default_storage_engine=innodb
innodb_autoinc_lock_mode=2
query_cache_size=0
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_provider_options="gcache.size=20G; gcache.page_size=20G; gcs.fc_limit=256; gcs.fc_factor=0.99; gcs.fc_master_slave=YES"
wsrep-node-name=node1
wsrep_node_address=1.1.1.1
wsrep_cluster_name=cluster
wsrep_cluster_address="gcomm://1.1.1.1,1.1.1.2,1.1.1.3"
wsrep_sst_method=mariabackup
wsrep_sst_auth=user:pass
wsrep_slave_threads=24

binlog_checksum = CRC32
innodb_checksum_algorithm = crc32
innodb_strict_mode = 0
log_slave_updates = 1
master_verify_checksum = 1
max_connect_errors = 500
max_connections = 1500
max_heap_table_size = 1G
slave_parallel_threads = 20
slave_sql_verify_checksum = 1
sql_mode = "NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"
sync_binlog = 1
tmp_table_size = 1G

This also happened in version 10.1, but it does not happen all the time. If it happens, only sending kill -9 to mariadb works to stop it.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

galera_partition.cnf
2019-12-04 11:41
0.4 kB
Jan Lindström
galera_partition.test
2019-12-05 11:01
24 kB
Jan Lindström
mysqld.err
2019-12-03 08:11
24 kB
Tomáš Mózes
stack.txt
2019-12-05 11:05
106 kB
Jan Lindström
table.sql
2019-12-03 08:10
2 kB
Tomáš Mózes

Activity

People

Assignee:: Seppo Jaakola

Reporter:: Tomáš Mózes

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2019-12-02 11:24

Updated:: 2019-12-23 10:59

Resolved:: 2019-12-23 10:59

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.