[MDEV-30985] Replica stops with error on ALTER ONLINE with Geometry Types Created: 2023-04-02  Updated: 2023-08-16  Resolved: 2023-08-16

Status: Closed
Project: MariaDB Server
Component/s: Data Definition - Alter Table, Replication, Storage Engine - InnoDB, Storage Engine - MyISAM
Affects Version/s: None
Fix Version/s: 11.2.1

Type: Bug Priority: Critical
Reporter: Angelique Sklavounos (Inactive) Assignee: Nikita Malyavin
Resolution: Fixed Votes: 0
Labels: None

Attachments: File 1048.test    
Issue Links:
Blocks
blocks MDEV-16329 Engine-independent online ALTER TABLE Closed
Relates
relates to MDEV-16329 Engine-independent online ALTER TABLE Closed

 Description   

The following is non-deterministic - usually I can get a failure with --repeat=100.

--source include/master-slave.inc
--source include/have_innodb.inc
--source include/have_binlog_format_row.inc
 
--connection slave
call mtr.add_suppression("Can't find record in 'DD'");
 
--connection master
USE test;
 
CREATE TABLE `DD` (geocol7 GEOMETRYCOLLECTION NULL DEFAULT ST_GEOMFROMTEXT('Point(1 1)'), scol4 BINARY NOT NULL DEFAULT '', vcol9 POINT AS ( geocol7 ) );
INSERT INTO `DD` () VALUES (),(),();
 
# not sure if this is necessary, but failure seems to occur with or without
UNLOCK TABLES;
 
# FLUSH TABLES seems necessary for replication to error
FLUSH TABLES;
 
ALTER ONLINE TABLE `DD` ADD COLUMN ( geocol3 GEOMETRY NULL DEFAULT ST_GEOMFROMTEXT('Point(1 1)'), ncol4 DECIMAL UNSIGNED NOT NULL ), ALGORITHM=COPY, LOCK=NONE;
DELETE FROM `DD` LIMIT 7;
 
DROP TABLE DD;
--source include/rpl_end.inc

CURRENT_TEST: rpl.1048_small
mysqltest: In included file "./include/check_slave_param.inc":
included from ./include/check_slave_no_error.inc at line 24:
included from ./include/rpl_end.inc at line 74:
included from /home/angelique/dev/build-11.1-MDEV-16329-e599b8b4586/build1/mysql-test/suite/rpl/t/1048_small.test at line 25:
At line 31: Wrong value for slave parameter Last_SQL_Errno

**** SHOW SLAVE STATUS on server_2 ****
SHOW SLAVE STATUS;
Slave_IO_State	Waiting for master to send event
Master_Host	127.0.0.1
Master_User	root
Master_Port	16000
Connect_Retry	1
Master_Log_File	master-bin.000001
Read_Master_Log_Pos	2125
Relay_Log_File	slave-relay-bin.000002
Relay_Log_Pos	1733
Relay_Master_Log_File	master-bin.000001
Slave_IO_Running	Yes
Slave_SQL_Running	No
Replicate_Rewrite_DB
Replicate_Do_DB
Replicate_Ignore_DB
Replicate_Do_Table
Replicate_Ignore_Table
Replicate_Wild_Do_Table
Replicate_Wild_Ignore_Table
Last_Errno	1032
Last_Error	Could not execute Delete_rows_v1 event on table test.DD; Can't find record in 'DD', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log master-bin.000001, end_log_pos 1893
Skip_Counter	0
Exec_Master_Log_Pos	1433
Relay_Log_Space	2734
Until_Condition	None
Until_Log_File
Until_Log_Pos	0
Master_SSL_Allowed	No
Master_SSL_CA_File
Master_SSL_CA_Path
Master_SSL_Cert
Master_SSL_Cipher
Master_SSL_Key
Seconds_Behind_Master	NULL
Master_SSL_Verify_Server_Cert	No
Last_IO_Errno	0
Last_IO_Error
Last_SQL_Errno	1032
Last_SQL_Error	Could not execute Delete_rows_v1 event on table test.DD; Can't find record in 'DD', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log master-bin.000001, end_log_pos 1893
Replicate_Ignore_Server_Ids
Master_Server_Id	1
Master_SSL_Crl
Master_SSL_Crlpath
Using_Gtid	Slave_Pos
Gtid_IO_Pos	0-1-6
Replicate_Do_Domain_Ids
Replicate_Ignore_Domain_Ids
Parallel_Mode	optimistic
SQL_Delay	0
SQL_Remaining_Delay	NULL
Slave_SQL_Running_State
Slave_DDL_Groups	11
Slave_Non_Transactional_Groups	6
Slave_Transactional_Groups	0

Replication failure also occurs with InnoDB, but doesn't seem to occur if the table is defined with the Aria engine.

A much longer test that always seems to fail is attached (1048.test).



 Comments   
Comment by Nikita Malyavin [ 2023-04-03 ]

The ASAN use-after-free failure happens with even simpler test, without ALTER TABLE, but with virtual columns on GEOMETRY field:

--source include/master-slave.inc
--source include/have_innodb.inc
--source include/have_binlog_format_row.inc
 
--connection master
CREATE TABLE DD(geocol7 GEOMETRYCOLLECTION NULL DEFAULT ST_GEOMFROMTEXT('Point(1 1)'),
                vcol9 POINT AS ( geocol7 ));
INSERT INTO DD () VALUES (),(),();
DELETE FROM DD;
 
DROP TABLE DD;
--source include/rpl_end.inc

Comment by Nikita Malyavin [ 2023-04-03 ]

The bug is reproducible on stock, but in a different form.

--source include/master-slave.inc
--source include/have_innodb.inc
--source include/have_binlog_format_row.inc
 
--connection master
CREATE TABLE DD(geocol7 GEOMETRYCOLLECTION NULL DEFAULT ST_GEOMFROMTEXT('Point(1 1)'));
INSERT INTO DD () VALUES (),(),();
--sync_slave_with_master
ALTER TABLE DD add vcol9 POINT AS ( geocol7 ), add key(vcol9);
connection master;
DELETE FROM DD;
 
DROP TABLE DD;
--source include/rpl_end.inc

da114c70 fix for --view-protocol

2023-04-03 23:28:47 5 [Note] Deleted Master_info file '/home/nik/mariadb/bld/mysql-test/var/mysqld.2/data/master.info'.
2023-04-03 23:28:47 5 [Note] Deleted Master_info file '/home/nik/mariadb/bld/mysql-test/var/mysqld.2/data/relay-log.info'.
2023-04-03 23:28:47 5 [Note] Master connection name: ''  Master_info_file: 'master.info'  Relay_info_file: 'relay-log.info'
2023-04-03 23:28:47 5 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='', master_port='3306', master_log_file='', master_log_pos='4'. New state master_host='127.0.0.1', master_port='16020', master_log_file='', master_log_pos='4'.
2023-04-03 23:28:47 5 [Note] Previous Using_Gtid=Slave_Pos. New Using_Gtid=Slave_Pos
2023-04-03 23:28:47 7 [Note] Slave I/O thread: Start asynchronous replication to master 'root@127.0.0.1:16020' in log '' at position 4
2023-04-03 23:28:47 8 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 4, relay log './slave-relay-bin.000001' position: 4; GTID position ''
2023-04-03 23:28:47 7 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:16020',replication starts at GTID position ''
2023-04-03 23:28:47 8 [ERROR] mariadbd: Can't find record in 'DD'
2023-04-03 23:28:47 8 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table test.DD; Can't find record in 'DD', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log master-bin.000001, end_log_pos 1166, Gtid 0-1-3, Internal MariaDB error code: 1032
2023-04-03 23:28:47 8 [Warning] Slave: Can't find record in 'DD' Error_code: 1032
2023-04-03 23:28:47 8 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.000001' position 918; GTID position '0-1-2'
2023-04-03 23:28:47 8 [Note] Slave SQL thread exiting, replication stopped in log 'master-bin.000001' at position 918; GTID position '0-1-2', master: 127.0.0.1:16020
2023-04-03 23:28:48 0 [Note] /home/nik/mariadb/bld/sql/mariadbd (initiated by: unknown): Normal shutdown
2023-04-03 23:28:48 7 [Note] Slave: received end packet from server, apparent master shutdown: 
2023-04-03 23:28:48 7 [Note] Slave I/O thread exiting, read up to log 'master-bin.000001', position 1398; GTID position 0-1-4, master 127.0.0.1:16020
2023-04-03 23:28:48 0 [Note] InnoDB: FTS optimize thread exiting.
2023-04-03 23:28:48 0 [Note] InnoDB: Starting shutdown...
2023-04-03 23:28:48 0 [Note] InnoDB: Dumping buffer pool(s) to /dev/shm/var_auto_3JdT/mysqld.2/data/ib_buffer_pool
2023-04-03 23:28:48 0 [Note] InnoDB: Restricted to 124 pages due to innodb_buf_pool_dump_pct=25
2023-04-03 23:28:48 0 [Note] InnoDB: Buffer pool(s) dump completed at 230403 23:28:48
2023-04-03 23:28:48 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
2023-04-03 23:28:48 0 [Note] InnoDB: Shutdown completed; log sequence number 58659; transaction id 16
2023-04-03 23:28:48 0 [Note] Debug sync points hit:                   4961
2023-04-03 23:28:48 0 [Note] Debug sync points executed:              0
2023-04-03 23:28:48 0 [Note] Debug sync points max active per thread: 0
2023-04-03 23:28:48 0 [Note] /home/nik/mariadb/bld/sql/mariadbd: Shutdown complete

Comment by Nikita Malyavin [ 2023-04-04 ]

Also reproducible with blobs of different types

--source include/master-slave.inc
--source include/have_binlog_format_row.inc
--connection master
CREATE TABLE DD(t text default '111111111', i longblob as (t));
INSERT INTO DD () VALUES (),(),();
DELETE FROM DD;
 
DROP TABLE DD;
--source include/rpl_end.inc

Comment by Nikita Malyavin [ 2023-04-05 ]

Please review commit 6f768d54

Comment by Oleksandr Byelkin [ 2023-04-11 ]

OK to push

Generated at Thu Feb 08 10:20:23 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.