[MDEV-28641] Query cache entries not invalidated on slave of a Galera cluster Created: 2022-05-21  Updated: 2023-05-25  Resolved: 2023-04-04

Status: Closed
Project: MariaDB Server
Component/s: Galera, Query Cache
Affects Version/s: 10.6.8, 10.5, 10.7, 10.8, 10.9
Fix Version/s: 11.1.0, 10.11.3, 10.5.20, 10.6.13, 10.8.8, 10.9.6, 10.10.4

Type: Bug Priority: Critical
Reporter: Hartmut Holzgraefe Assignee: Julius Goryavsky
Resolution: Fixed Votes: 2
Labels: None


 Description   

When using async replication to between two Galera clusters query cache entries on the actual slave node in the slave cluster do not get invalidated on table row changes.

Setup: two two-node clusters, with nodes master-node-1, master-node-2, slave-node-1 and slave-node-2, and slave-node-1 being an async replicationslave to master-node-1. Query cache enabled on all nodes.

Running a query on a large table that can't use an index is slow the first time, then fast the 2nd time as expected as it now comes from the query cache, as expected.

Now adding another row to the table on master-node-1, then running the slow select on all four instances again we see that it takes its time again on master-node-1, master-node-2 and slave-node-2, but still instantly returns the cached result on slave-node-1



 Comments   
Comment by Hartmut Holzgraefe [ 2022-05-21 ]

Node configuration (server-id and host names of course being different per node):

[mysqld]
server-id=1
bind-address=0.0.0.0
 
log-bin
log-slave-updates
binlog-format=ROW
 
innodb_buffer_pool_size = 1G
 
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
 
wsrep_cluster_name=master_cluster
wsrep_cluster_address=gcomm://master-node-1,master-node-2,
wsrep_node_address=master-node-1
wsrep_node_name=master-node-1
 
wsrep_sst_method=mariabackup
wsrep_sst_auth=galera:...password...
 
query_cache_type=1
query_cache_size=10M

Test table created on master-node-1 as:

CREATE TABLE t1 (id serial primary key, msg varchar(100));
INSERT INTO t1 values(NULL, md5(rand());
INSERT INTO t1 SELECT NULL, md5(rand()) from t1 LIMIT 1000000;
... repeat previous line until the table has a few million rows in it
INSERT INTO t1 values(NULL, 'foobar');

Run

SELECT * FROM t1 WHERE msg='foobar';

on all nodes to see that it takes a non-zero amount of time.

Run the same query once again to verify that the result now immediately comes back from the query cache.

Now add another row with same msg value once more:

INSERT INTO t1 values(NULL, 'foobar');

And now re-run the SELECT on all nodes, see that it takes its time and returns two rows now, as expected, on master-node-1, master-node-2 and slave-node-2, but is still fast and returns a single row only on slave-node-1.

Rewrite it slightly and run it again on slave-node-1 to verify that the change was correctly applied, just the query cache entries for table t1 not purged:

SELECT id, msg FROM t1 WHERE msg='foobar';

With "id, msg" instead of "*" it now takes its time on slave-node-1, too, and correctly returns two result rows now.

Comment by Jan Lindström [ 2023-03-27 ]

https://github.com/MariaDB/server/pull/2575

Comment by Julius Goryavsky [ 2023-04-04 ]

Thanks, everything works and no regression found on tests of mtr suites related to Galera, commit merged as https://github.com/MariaDB/server/commit/afdf19cf3303bf3797fe47e5cef398227134cc32

Comment by Julius Goryavsky [ 2023-04-04 ]

Fix merged as https://github.com/MariaDB/server/commit/afdf19cf3303bf3797fe47e5cef398227134cc32

Generated at Thu Feb 08 10:02:21 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.