Details
-
Bug
-
Status: Confirmed (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.4(EOL)
-
None
Description
Consider first normal master-slave topology with gtid_strict_mode=0 where user stops slave and sets:
SET GLOBAL gtid_slave_pos= '1-2-3,2-4-6';
|
Yes, this could be totally incorrect i.e. there could not even be any node with domain_id with 1 or 2. This command is executed like this:
rpl_slave_state::record_gtid (this=0x55fb9f8f6c90, thd=0x7f2e80031480, gtid=0x7f2e89efa710, sub_id=2, in_transaction=false,
|
in_statement=true, out_hton=0x7f2e89efa6f8) at /home/jan/work/mariadb/10.4/sql/rpl_gtid.cc:690
|
#1 0x000055fb9b9ff12e in rpl_slave_state::load (this=0x55fb9f8f6c90, thd=0x7f2e80031480, state_from_master=0x7f2e8003e053 "", len=11,
|
reset=true, in_statement=true) at /home/jan/work/mariadb/10.4/sql/rpl_gtid.cc:1409
|
#2 0x000055fb9b81d972 in rpl_gtid_pos_update (thd=0x7f2e80031480, str=0x7f2e8003e048 "1-2-3,2-4-6", len=11)
|
at /home/jan/work/mariadb/10.4/sql/sql_repl.cc:4728
|
#3 0x000055fb9b99469a in Sys_var_gtid_slave_pos::global_update (this=0x55fb9d1fde20 <Sys_gtid_slave_pos>, thd=0x7f2e80031480,
|
var=0x7f2e8003dff8) at /home/jan/work/mariadb/10.4/sql/sys_vars.cc:1858
|
#4 0x000055fb9b6a8c5e in sys_var::update (this=0x55fb9d1fde20 <Sys_gtid_slave_pos>, thd=0x7f2e80031480, var=0x7f2e8003dff8)
|
at /home/jan/work/mariadb/10.4/sql/set_var.cc:208
|
#5 0x000055fb9b6aab8e in set_var::update (this=0x7f2e8003dff8, thd=0x7f2e80031480) at /home/jan/work/mariadb/10.4/sql/set_var.cc:837
|
#6 0x000055fb9b6aa7f0 in sql_set_variables (thd=0x7f2e80031480, var_list=0x7f2e80036360, free=true)
|
at /home/jan/work/mariadb/10.4/sql/set_var.cc:740
|
#7 0x000055fb9b7db3f1 in mysql_execute_command (thd=0x7f2e80031480) at /home/jan/work/mariadb/10.4/sql/sql_parse.cc:5047
|
#8 0x000055fb9b7e5303 in mysql_parse (thd=0x7f2e80031480, rawbuf=0x7f2e8003de68 "SET GLOBAL gtid_slave_pos= '1-2-3,2-4-6'", length=40,
|
parser_state=0x7f2e89efb300, is_com_multi=false, is_next_command=false) at /home/jan/work/mariadb/10.4/sql/sql_parse.cc:8012
|
#9 0x000055fb9b7e499d in wsrep_mysql_parse (thd=0x7f2e80031480, rawbuf=0x7f2e8003de68 "SET GLOBAL gtid_slave_pos= '1-2-3,2-4-6'", length=40,
|
parser_state=0x7f2e89efb300, is_com_multi=false, is_next_command=false) at /home/jan/work/mariadb/10.4/sql/sql_parse.cc:7814
|
#10 0x000055fb9b7d0979 in dispatch_command (command=COM_QUERY, thd=0x7f2e80031480,
|
packet=0x7f2e8004fa01 "SET GLOBAL gtid_slave_pos= '1-2-3,2-4-6'", packet_length=40, is_com_multi=false, is_next_command=false)
|
at /home/jan/work/mariadb/10.4/sql/sql_parse.cc:1843
|
#11 0x000055fb9b7cf2ce in do_command (thd=0x7f2e80031480) at /home/jan/work/mariadb/10.4/sql/sql_parse.cc:1378
|
#12 0x000055fb9b975bbe in do_handle_one_connection (connect=0x55fb9fded720) at /home/jan/work/mariadb/10.4/sql/sql_connect.cc:1420
|
#13 0x000055fb9b97591a in handle_one_connection (arg=0x55fb9fded720) at /home/jan/work/mariadb/10.4/sql/sql_connect.cc:1324
|
#14 0x000055fb9bf1154b in pfs_spawn_thread (arg=0x55fb9f972430) at /home/jan/work/mariadb/10.4/storage/perfschema/pfs.cc:1869
|
#15 0x00007f2e97c97ada in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:444
|
#16 0x00007f2e97d282e4 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100
|
Anyway, this is not atomic because rpl_slave_state::load contains a loop i.e. it takes one gtid and calls rpl_slave_state::record_gtid where we have
if (err || (err= ha_commit_trans(thd, FALSE)))
|
ha_rollback_trans(thd, FALSE);
|
The fact that storing these gtids is not atomic might have problems in following cases:
- In Galera we replicate gtid_slave_pos table to other nodes and assume it is InnoDB. This replication is required at least on case where slave node is configured to use skip_slave_start=0 and node goes down and then starts again
- What happens if we have stored first gtid position and committed transaction and then node crashes?
- For Galera we need to have galera transaction and so we start it on rpl_slave_state::record_gtid but we lost that transaction because of ha_commit or ha_rollback. We might be able to fix this by cleaning Galera transaction context and start a new transaction but it is not optimal because gtid position update is not atomic.
Attachments
Issue Links
- causes
-
MDEV-32193 Assertion `state() == s_executing || state() == s_prepared || state() == s_committing || state() == s_must_abort || state() == s_replaying' failed.
- Stalled
-
MDEV-33129 Crash in wsrep::wsrep_provider_v26::replay when setting gtid_slave_pos
- Needs Feedback