Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-28641

Query cache entries not invalidated on slave of a Galera cluster

Details

    Description

      When using async replication to between two Galera clusters query cache entries on the actual slave node in the slave cluster do not get invalidated on table row changes.

      Setup: two two-node clusters, with nodes master-node-1, master-node-2, slave-node-1 and slave-node-2, and slave-node-1 being an async replicationslave to master-node-1. Query cache enabled on all nodes.

      Running a query on a large table that can't use an index is slow the first time, then fast the 2nd time as expected as it now comes from the query cache, as expected.

      Now adding another row to the table on master-node-1, then running the slow select on all four instances again we see that it takes its time again on master-node-1, master-node-2 and slave-node-2, but still instantly returns the cached result on slave-node-1

      Attachments

        Activity

          Node configuration (server-id and host names of course being different per node):

          [mysqld]
          server-id=1
          bind-address=0.0.0.0
           
          log-bin
          log-slave-updates
          binlog-format=ROW
           
          innodb_buffer_pool_size = 1G
           
          wsrep_on=ON
          wsrep_provider=/usr/lib/galera/libgalera_smm.so
           
          wsrep_cluster_name=master_cluster
          wsrep_cluster_address=gcomm://master-node-1,master-node-2,
          wsrep_node_address=master-node-1
          wsrep_node_name=master-node-1
           
          wsrep_sst_method=mariabackup
          wsrep_sst_auth=galera:...password...
           
          query_cache_type=1
          query_cache_size=10M
          

          Test table created on master-node-1 as:

          CREATE TABLE t1 (id serial primary key, msg varchar(100));
          INSERT INTO t1 values(NULL, md5(rand());
          INSERT INTO t1 SELECT NULL, md5(rand()) from t1 LIMIT 1000000;
          ... repeat previous line until the table has a few million rows in it
          INSERT INTO t1 values(NULL, 'foobar');
          

          Run

          SELECT * FROM t1 WHERE msg='foobar';
          

          on all nodes to see that it takes a non-zero amount of time.

          Run the same query once again to verify that the result now immediately comes back from the query cache.

          Now add another row with same msg value once more:

          INSERT INTO t1 values(NULL, 'foobar');
          

          And now re-run the SELECT on all nodes, see that it takes its time and returns two rows now, as expected, on master-node-1, master-node-2 and slave-node-2, but is still fast and returns a single row only on slave-node-1.

          Rewrite it slightly and run it again on slave-node-1 to verify that the change was correctly applied, just the query cache entries for table t1 not purged:

          SELECT id, msg FROM t1 WHERE msg='foobar';
          

          With "id, msg" instead of "*" it now takes its time on slave-node-1, too, and correctly returns two result rows now.

          hholzgra Hartmut Holzgraefe added a comment - Node configuration (server-id and host names of course being different per node): [mysqld] server-id=1 bind-address=0.0.0.0   log-bin log-slave-updates binlog-format=ROW   innodb_buffer_pool_size = 1G   wsrep_on=ON wsrep_provider=/usr/lib/galera/libgalera_smm.so   wsrep_cluster_name=master_cluster wsrep_cluster_address=gcomm://master-node-1,master-node-2, wsrep_node_address=master-node-1 wsrep_node_name=master-node-1   wsrep_sst_method=mariabackup wsrep_sst_auth=galera:...password...   query_cache_type=1 query_cache_size=10M Test table created on master-node-1 as: CREATE TABLE t1 (id serial primary key, msg varchar(100)); INSERT INTO t1 values(NULL, md5(rand()); INSERT INTO t1 SELECT NULL, md5(rand()) from t1 LIMIT 1000000; ... repeat previous line until the table has a few million rows in it INSERT INTO t1 values(NULL, 'foobar'); Run SELECT * FROM t1 WHERE msg='foobar'; on all nodes to see that it takes a non-zero amount of time. Run the same query once again to verify that the result now immediately comes back from the query cache. Now add another row with same msg value once more: INSERT INTO t1 values(NULL, 'foobar'); And now re-run the SELECT on all nodes, see that it takes its time and returns two rows now, as expected, on master-node-1, master-node-2 and slave-node-2, but is still fast and returns a single row only on slave-node-1. Rewrite it slightly and run it again on slave-node-1 to verify that the change was correctly applied, just the query cache entries for table t1 not purged: SELECT id, msg FROM t1 WHERE msg='foobar'; With "id, msg" instead of "*" it now takes its time on slave-node-1, too, and correctly returns two result rows now.
          janlindstrom Jan Lindström added a comment - https://github.com/MariaDB/server/pull/2575

          Thanks, everything works and no regression found on tests of mtr suites related to Galera, commit merged as https://github.com/MariaDB/server/commit/afdf19cf3303bf3797fe47e5cef398227134cc32

          sysprg Julius Goryavsky added a comment - Thanks, everything works and no regression found on tests of mtr suites related to Galera, commit merged as https://github.com/MariaDB/server/commit/afdf19cf3303bf3797fe47e5cef398227134cc32
          sysprg Julius Goryavsky added a comment - Fix merged as https://github.com/MariaDB/server/commit/afdf19cf3303bf3797fe47e5cef398227134cc32

          People

            sysprg Julius Goryavsky
            hholzgra Hartmut Holzgraefe
            Votes:
            2 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.