[MDEV-11323] Master feeds corrupt binary log events to slave if it does not have the key to decrypt the binlog Created: 2016-11-21  Updated: 2019-04-30  Resolved: 2019-04-30

Status: Closed
Project: MariaDB Server
Component/s: Encryption, Replication
Affects Version/s: 10.1, 10.2, 10.3
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Sergei Golubchik
Resolution: Not a Bug Votes: 0
Labels: None


 Description   
  • start server with binlog encryption and file_key_management plugin/keyfile;
  • create some events;
  • restart server with a different keyfile (different value for key 1);
  • configure replication from this server as a master to a slave;
  • start slave IO thread
    => it reads all events from master without visible errors;
  • start slave SQL thread
    => it produces

                   Last_SQL_Errno: 1594
                   Last_SQL_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
    

I don't think it's specified anywhere how it's supposed to work; logically, I would expect that master would refuse to feed the logs which it cannot decrypt, or that IO thread would produce an error getting the rubbish from the master. However, only SQL thread notices that something is wrong.

If it's by design, please feel free to close as not a bug.



 Comments   
Comment by Elena Stepanova [ 2016-11-21 ]

The scenario is implemented in the MTR test binlog_encryption.encrypted_master_lost_key, which will be partially disabled until this bug report is handled. Whatever happens to it, please assign it to me before closing, to finalize the test.

Comment by Sergei Golubchik [ 2016-12-03 ]

Yes, this is not a bug. Incorrectly decrypted even looks no difference from a corrupted event — it's filled with garbage just the same. And neither the master, when sending events, nor the slave IO thread do any even validation. Only the slave SQL thread, when it tries actually to execute the event, notices that the event is, in fact, garbage .

You can enable binlog checksums, them the garbage event will be detected earlier.

Comment by Elena Stepanova [ 2016-12-04 ]

I've amended the test to reflect this behavior for SQL thread, but I also need to know what's the expected behavior for the IO thread, as I see different outcomes.

Test case

# To be run with --mysqld=--plugin-load-add=file_key_management --mysqld=--file-key-management-filename=`pwd`/std_data/keys.txt --mysqld=--encrypt-binlog --repeat=10
 
--source include/master-slave.inc
 
--source include/have_binlog_format_mixed.inc
 
--connection slave
 
--disable_connect_log
--source include/stop_slave.inc
 
--connection master
 
CREATE TABLE table1_to_encrypt (i INT);
INSERT INTO table1_to_encrypt VALUES (1);
INSERT INTO table1_to_encrypt VALUES (2);
SET binlog_format=ROW;
INSERT INTO table1_to_encrypt VALUES (3);
INSERT INTO table1_to_encrypt VALUES (4);
 
--write_file $MYSQL_TMP_DIR/master_lose_key.key
1;00000AAAAAAAAAAAAAAAAAAAAAA00000
EOF
 
--connection master
--enable_reconnect
 
--exec echo "wait" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
shutdown_server 10;
--source include/wait_until_disconnected.inc
--exec echo "restart:--file-key-management-filename=$MYSQL_TMP_DIR/master_lose_key.key" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
--source include/wait_until_connected_again.inc
 
CREATE TABLE table2_to_encrypt (i INT);
INSERT INTO table2_to_encrypt VALUES (1);
INSERT INTO table2_to_encrypt VALUES (2);
SET binlog_format=ROW;
INSERT INTO table2_to_encrypt VALUES (3);
INSERT INTO table2_to_encrypt VALUES (4);
 
--exec echo "wait" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
shutdown_server 10;
--source include/wait_until_disconnected.inc
--exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
--source include/wait_until_connected_again.inc
 
CREATE TABLE table3_to_encrypt (i INT);
INSERT INTO table3_to_encrypt VALUES (1);
INSERT INTO table3_to_encrypt VALUES (2);
SET binlog_format=ROW;
INSERT INTO table3_to_encrypt VALUES (3);
INSERT INTO table3_to_encrypt VALUES (4);
FLUSH BINARY LOGS;
INSERT INTO table3_to_encrypt VALUES (5);
 
--source include/save_master_pos.inc
 
--connection slave
 
START SLAVE IO_THREAD;
#--let $slave_io_errno= 1595
#--let $show_slave_io_error= 1
--let $slave_timeout= 20
--source include/sync_io_with_master.inc
 
--connection master
DROP TABLE table1_to_encrypt, table2_to_encrypt, table3_to_encrypt;
 
--connection slave
stop slave;
--remove_file $MYSQL_TMP_DIR/master_lose_key.key

Sometimes the test passes:

Master_Log_File	master-bin.000004
Read_Master_Log_Pos	601
Relay_Log_File	slave-relay-bin.000002
Relay_Log_Pos	639
Relay_Master_Log_File	master-bin.000001
Slave_IO_Running	Yes
Slave_SQL_Running	No

Sometimes it causes garbage in the slave position, but oddly enough the status of IO thread is fine:

Master_Log_File	master-bin.000002
Read_Master_Log_Pos	2913064400
Relay_Log_File	slave-relay-bin.000002
Relay_Log_Pos	639
Relay_Master_Log_File	master-bin.000001
Slave_IO_Running	Yes

Sometimes position looks okay, but the IO thread fails with an error:

Master_Log_File	master-bin.000002
Read_Master_Log_Pos	249
Relay_Log_File	slave-relay-bin.000002
Relay_Log_Pos	639
Relay_Master_Log_File	master-bin.000001
Slave_IO_Running	No
...
Last_IO_Errno	1595
Last_IO_Error	Relay log write failure: could not queue event from master

Sometimes these two conditions come together:

Master_Log_File	�l�G��}�rjA���uM5,m>�Z��iwt��C�CC���F
                                                         �ϫ���N�7�R��1,�m�:qc܏,yq
Read_Master_Log_Pos	4885226175613774293
Relay_Log_File	slave-relay-bin.000002
Relay_Log_Pos	639
Relay_Master_Log_File	master-bin.000001
Slave_IO_Running	No
...
Last_IO_Errno	1595
Last_IO_Error	Relay log write failure: could not queue event from master

Also, once (but only once so far) I got an assertion failure on master:

mysqld: /data/src/bb-10.1-mdev9038/sql/net_serv.cc:1075: ulong my_real_read(NET*, size_t*, my_bool): Assertion 
`0' failed.
161204 20:34:57 [ERROR] mysqld got signal 6 ;
 
#7  0x00007f31d259a312 in __GI___assert_fail (assertion=0x7f31d56a45c6 "0", file=0x7f31d56a42e0 "/data/src/bb-10.1-mdev9038/sql/net_serv.cc", line=1075, function=0x7f31d56a4620 <my_real_read(st_net*, unsigned long*, char)::__PRETTY_FUNCTION__> "ulong my_real_read(NET*, size_t*, my_bool)") at assert.c:101
#8  0x00007f31d4ce5489 in my_real_read (net=0x7f31cb4d2330, complen=0x7f31d5c2e150, header=0 '\000') at /data/src/bb-10.1-mdev9038/sql/net_serv.cc:1075
#9  0x00007f31d4ce5793 in my_net_read_packet (net=0x7f31cb4d2330, read_from_server=0 '\000') at /data/src/bb-10.1-mdev9038/sql/net_serv.cc:1138
#10 0x00007f31d4dd2543 in send_file (thd=0x7f31cb4d2070) at /data/src/bb-10.1-mdev9038/sql/sql_repl.cc:348
#11 0x00007f31d4dd590f in send_event_to_slave (info=0x7f31d5c2fb80, event_type=LOAD_EVENT, log=0x7f31d5c2f730, ev_offset=1, error_gtid=0x7f31d5c30228) at /data/src/bb-10.1-mdev9038/sql/sql_repl.cc:2001
#12 0x00007f31d4dd74b3 in send_events (info=0x7f31d5c2fb80, log=0x7f31d5c2f730, linfo=0x7f31d5c2f8b0, end_pos=1336) at /data/src/bb-10.1-mdev9038/sql/sql_repl.cc:2680
#13 0x00007f31d4dd7812 in send_one_binlog_file (info=0x7f31d5c2fb80, log=0x7f31d5c2f730, linfo=0x7f31d5c2f8b0, start_pos=4) at /data/src/bb-10.1-mdev9038/sql/sql_repl.cc:2760
#14 0x00007f31d4dd7c17 in mysql_binlog_send (thd=0x7f31cb4d2070, log_ident=0x7f31ca022088 "master-bin.000001", pos=4, flags=0) at /data/src/bb-10.1-mdev9038/sql/sql_repl.cc:2883
#15 0x00007f31d4da04fa in dispatch_command (command=COM_BINLOG_DUMP, thd=0x7f31cb4d2070, packet=0x7f31cb4d8071 "", packet_length=27) at /data/src/bb-10.1-mdev9038/sql/sql_parse.cc:1724
#16 0x00007f31d4d9e610 in do_command (thd=0x7f31cb4d2070) at /data/src/bb-10.1-mdev9038/sql/sql_parse.cc:1108
#17 0x00007f31d4ed46f5 in do_handle_one_connection (thd_arg=0x7f31cb4d2070) at /data/src/bb-10.1-mdev9038/sql/sql_connect.cc:1350
#18 0x00007f31d4ed4459 in handle_one_connection (arg=0x7f31cb4d2070) at /data/src/bb-10.1-mdev9038/sql/sql_connect.cc:1262
#19 0x00007f31d51b75bc in pfs_spawn_thread (arg=0x7f31d1c283f0) at /data/src/bb-10.1-mdev9038/storage/perfschema/pfs.cc:1860
#20 0x00007f31d449c0a4 in start_thread (arg=0x7f31d5c31b00) at pthread_create.c:309
#21 0x00007f31d265487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Comment by Sergei Golubchik [ 2019-04-30 ]

I couldn't get the assert. I was mostly getting "Relay log write failure: could not queue event from master" and it turned out to be not a bug. The event is completely encrypted, only the event length is kept plain-text.

In particular, event type is encrypted too. I/O thread ignores most binlog events, but not all, it tries to process STOP_EVENT, ROTATE_EVENT, FORMAT_DESCRIPTION_EVENT, HEARTBEAT_LOG_EVENT, GTID_LIST_EVENT, GTID_EVENT, and XID_EVENT. If the encrypted event type (essentially, a random value) will match any of those events, I/O thread will try to interpret the event and will fail.

Generated at Thu Feb 08 07:49:03 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.