[MDEV-10406] IST does not occur after schema changes in RSU mode with specific delete query - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.1.14, 10.1.16
Fix Version/s: N/A
Component/s: Galera
Labels:
- galera

Description

If we put a node into RSU mode, temporarily take it out of the cluster by setting the gcomm string to "gcomm://", perform a schema change, and on the nodes that remain in the cluster execute a delete statement in the format:

delete from <table> where <pk> >= <n>;

then when the node rejoins the cluster (by restoring the gcomm string), IST is not performed and the node determines that it has the same state as the other cluster nodes, despite those nodes having had a bunch of rows deleted from a table. If we then try to delete the rows manually on the affected node, mysqld on the other nodes crashes immediately because it can't process the delete transaction for rows that don't exist.

If we do not alter the schema while the node is out of the cluster, the error does not occur and IST is performed correctly. We did not get any issues with insert or update statements, and also deletes in the format of:

delete from <table> where id = <b>;

also did not trigger a problem, so it seems to be a specific issue with:

delete from <table> where id >= <n>;

In our tests the table that had the schema changes was always a different table to the one that had rows deleted. The latter table had around 120 rows, and our delete statement deleted around ~20 of them.

Steps to reproduce:

1. Set up a 3-node cluster with two tables, one that has thousands of rows (table a) and another that has around ~120 (table b).

2. On node 0, enter RSU mode:

SET GLOBAL wsrep_OSU_method='RSU';

3. Disconnect the node from the cluster by resetting the gcomm string:

SET GLOBAL wsrep_cluster_address="gcomm://";

4. On node 0, add a single column to table a with a default value of null to ensure backward-compatibility.

alter table table_a add column foo int(11) unsigned null;

5. On node 1 or node 2, delete ~20 rows from table b using a query with a >= where condition, eg:

delete from table_b where id >= <n>;

6. Rejoin node 0 to the cluster by setting the gcomm string back to the correct gcomm string for the cluster that references all three nodes:

SET GLOBAL wsrep_cluster_address='node1,node2,node3';

EXPECTED RESULT:

The node will rejoin the cluster and perform IST to synchronise the changes. The 20 rows from table_b will be deleted on node 0.

ACTUAL RESULT:

The node joins the cluster without performing IST. All three nodes are deemed to be in sync but the deleted rows remain on node 0.

Attachments

Activity

Ascending order - Click to sort in descending order

Nirbhay Choubey (Inactive) added a comment - 2016-07-20 16:29

Some related scenarios :
https://gist.github.com/nirbhayc/5f84cc9a243fd1a58d287b191313f5a6

Nirbhay Choubey (Inactive) added a comment - 2016-07-20 16:29 Some related scenarios : https://gist.github.com/nirbhayc/5f84cc9a243fd1a58d287b191313f5a6

Nirbhay Choubey (Inactive) added a comment - 2016-07-20 16:48

2. On node 0, enter RSU mode:
SET GLOBAL wsrep_OSU_method='RSU';
...
alter table table_a add column foo int(11) unsigned null;

You should update wsrep_osu_method's SESSION value (instead of GLOBAL), in case you are
performing the subsequent ALTER in the same session.

Nirbhay Choubey (Inactive) added a comment - 2016-07-20 16:48 2. On node 0, enter RSU mode: SET GLOBAL wsrep_OSU_method='RSU'; ... alter table table_a add column foo int(11) unsigned null; You should update wsrep_osu_method's SESSION value (instead of GLOBAL), in case you are performing the subsequent ALTER in the same session.

Nirbhay Choubey (Inactive) added a comment - 2016-07-20 17:02

10.0-galera HEAD shows similar behavior.

Nirbhay Choubey (Inactive) added a comment - 2016-07-20 17:02 10.0-galera HEAD shows similar behavior.

Jan Lindström (Inactive) added a comment - 2019-12-12 12:06

Support for 10.0-galera has ended.

Jan Lindström (Inactive) added a comment - 2019-12-12 12:06 Support for 10.0-galera has ended.

People

Assignee:: Jan Lindström (Inactive)

Reporter:: Mark Wadham

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2016-07-20 16:11

Updated:: 2019-12-12 12:06

Resolved:: 2019-12-12 12:06

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server