[MDEV-13724] ALTER TABLE…ALGORITHM=INPLACE is modifying the source table Created: 2017-09-04  Updated: 2017-09-06  Resolved: 2017-09-06

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.3.2
Fix Version/s: 10.3.2

Type: Bug Priority: Major
Reporter: Alice Sherepa Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Problem/Incident
is caused by MDEV-13654 Various crashes due to DB_TRX_ID mism... Closed

 Description   

http://buildbot.askmonty.org/buildbot/builders/kvm-fulltest/builds/10072/steps/test_2/logs/stdio

sys_vars.log_slow_admin_statements_func 'innodb' w3 [ fail ]
        Test ended at 2017-09-02 15:21:43
 
CURRENT_TEST: sys_vars.log_slow_admin_statements_func
--- /mnt/buildbot/build/mariadb-10.3.2/mysql-test/suite/sys_vars/r/log_slow_admin_statements_func.result	2017-09-02 12:56:48.000000000 -0400
+++ /mnt/buildbot/build/mariadb-10.3.2/mysql-test/suite/sys_vars/r/log_slow_admin_statements_func.reject	2017-09-02 15:21:43.124060071 -0400
@@ -27,10 +27,14 @@
 OPTIMIZE TABLE log_slow_admin_statements;
 Table	Op	Msg_type	Msg_text
 test.log_slow_admin_statements	optimize	note	Table does not support optimize, doing recreate + analyze instead
-test.log_slow_admin_statements	optimize	status	OK
+test.log_slow_admin_statements	optimize	error	Got error 192 'Table encrypted but decryption failed. This could be because correct encryption management plugin is not loaded, used encryption key is not available or encryption method does not match.' from InnoDB
+test.log_slow_admin_statements	optimize	status	Operation failed
+Warnings:
+Error	1296	Got error 192 'Table encrypted but decryption failed. This could be because correct encryption management plugin is not loaded, used encryption key is not available or encryption method does not match.' from InnoDB
 CHECK TABLE log_slow_admin_statements EXTENDED;
 Table	Op	Msg_type	Msg_text
-test.log_slow_admin_statements	check	status	OK
+test.log_slow_admin_statements	check	Warning	InnoDB: Index PRIMARY is marked as corrupted
+test.log_slow_admin_statements	check	error	Corrupt
 DROP TABLE log_slow_admin_statements;
 SELECT sql_text FROM mysql.slow_log WHERE sql_text LIKE '%TABLE log_slow_admin_statements%';
 sql_text
 
mysqltest: Result length mismatch



 Comments   
Comment by Elena Stepanova [ 2017-09-06 ]

Various problems with this test are reproducible on 10.3 non-debug build running with --repeat=N --mem. For me usually N=10 is enough, but it might vary on different machines.

With this test (an excerpt from the original test case) i'm also quite often getting the assertion failure:

-- source include/have_innodb.inc
 
CREATE TABLE log_slow_admin_statements (
        i INT PRIMARY KEY AUTO_INCREMENT,
        j VARCHAR(255)
) ENGINE=InnoDB;
 
INSERT INTO log_slow_admin_statements(j) values (REPEAT('A', 255));
INSERT INTO log_slow_admin_statements(j) SELECT j FROM log_slow_admin_statements;
INSERT INTO log_slow_admin_statements(j) SELECT j FROM log_slow_admin_statements;
INSERT INTO log_slow_admin_statements(j) SELECT j FROM log_slow_admin_statements;
INSERT INTO log_slow_admin_statements(j) SELECT j FROM log_slow_admin_statements;
INSERT INTO log_slow_admin_statements(j) SELECT j FROM log_slow_admin_statements;
INSERT INTO log_slow_admin_statements(j) SELECT j FROM log_slow_admin_statements;
INSERT INTO log_slow_admin_statements(j) SELECT j FROM log_slow_admin_statements;
INSERT INTO log_slow_admin_statements(j) SELECT j FROM log_slow_admin_statements;
INSERT INTO log_slow_admin_statements(j) SELECT j FROM log_slow_admin_statements;
INSERT INTO log_slow_admin_statements(j) SELECT j FROM log_slow_admin_statements;
INSERT INTO log_slow_admin_statements(j) SELECT j FROM log_slow_admin_statements;
INSERT INTO log_slow_admin_statements(j) SELECT j FROM log_slow_admin_statements;
INSERT INTO log_slow_admin_statements(j) SELECT j FROM log_slow_admin_statements;
INSERT INTO log_slow_admin_statements(j) SELECT j FROM log_slow_admin_statements;
 
OPTIMIZE TABLE log_slow_admin_statements;
 
# Cleanup
DROP TABLE log_slow_admin_statements;

10.3 non-debug 83f9422f0555b6

2017-09-06 16:19:12 0 [ERROR] InnoDB: Database page corruption on disk or a failed file read of tablespace test/log_slow_admin_statements page [page id: space=8, page number=240]. You may have to recover from a backup.
2017-09-06 16:19:12 0 [Note] InnoDB: Page dump in ascii and hex (16384 bytes):
...
InnoDB: End of page dump
2017-09-06 16:19:12 0 [Note] InnoDB: Uncompressed page, stored checksum in field1 2074857521, calculated checksums for field1: crc32 3550863345/3589883547, innodb 
2491050503,  page type 17855 == INDEX.none 3735928559, stored checksum in field2 2074857521, calculated checksums for field2: crc32 3550863345/3589883547, innodb 8
5042567, none 3735928559,  page LSN 0 16959928, low 4 bytes of LSN at page end 16959928, page number (if stored to page already) 244, space id (if created with >= 
MySQL-4.1.1 and stored already) 8
2017-09-06 16:19:12 0 [Note] InnoDB: Page may be an index page where index id is 25
2017-09-06 16:19:12 0 [Note] InnoDB: Index 25 is `PRIMARY` in table `test`.`log_slow_admin_statements`
2017-09-06 16:19:12 0 [Note] InnoDB: It is also possible that your operating system has corrupted its own file cache and rebooting your computer removes the error.
 If the corrupt page is an index page. You can also try to fix the corruption by dumping, dropping, and reimporting the corrupt table. You can use CHECK TABLE to s
can your table for corruption. Please refer to http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html for information about forcing recovery.
2017-09-06 16:19:12 0x7f4334cbe700  InnoDB: Assertion failure in file /data/src/10.3/storage/innobase/buf/buf0lru.cc line 2148
InnoDB: Failing assertion: bpage->buf_fix_count == 0
 
#4  0x00007f43417013fa in abort () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x0000564cc8e55287 in ut_dbg_assertion_failed (expr=expr@entry=0x564cc9721a98 "bpage->buf_fix_count == 0", file=file@entry=0x564cc9721548 "/data/src/10.3/storage/innobase/buf/buf0lru.cc", line=line@entry=2148) at /data/src/10.3/storage/innobase/ut/ut0dbg.cc:61
#6  0x0000564cc9479710 in buf_LRU_block_remove_hashed (bpage=bpage@entry=0x7f4336bb0358, zip=zip@entry=true) at /data/src/10.3/storage/innobase/buf/buf0lru.cc:2148
#7  0x0000564cc947e650 in buf_LRU_free_one_page (bpage=bpage@entry=0x7f4336bb0358) at /data/src/10.3/storage/innobase/buf/buf0lru.cc:2412
#8  0x0000564cc9458e66 in buf_mark_space_corrupt (bpage=0x7f4336bb0358) at /data/src/10.3/storage/innobase/buf/buf0buf.cc:5761
#9  buf_page_io_complete (bpage=0x7f4336bb0358, evict=evict@entry=false) at /data/src/10.3/storage/innobase/buf/buf0buf.cc:6006
#10 0x0000564cc94bcc5c in fil_aio_wait (segment=segment@entry=3) at /data/src/10.3/storage/innobase/fil/fil0fil.cc:5500
#11 0x0000564cc93d60c8 in io_handler_thread (arg=<optimized out>) at /data/src/10.3/storage/innobase/srv/srv0start.cc:340
#12 0x00007f434363d494 in start_thread (arg=0x7f4334cbe700) at pthread_create.c:333
#13 0x00007f43417b593f in clone () from /lib/x86_64-linux-gnu/libc.so.6

The problem was apparently introduced (or revealed) by this commit:

commit ff81faf670e083e1da4f3e7bce33fe9104410b2c
Author: Marko Mäkelä <marko.makela@mariadb.com>
Date:   Thu Aug 31 18:03:50 2017 +0300
 
    MDEV-13654 Various crashes due to DB_TRX_ID mismatch in table-rebuilding ALTER TABLE…LOCK=NONE

Comment by Marko Mäkelä [ 2017-09-06 ]

The OPTIMIZE TABLE can be replaced with

ALTER TABLE log_sow_admin_statements FORCE, ALGORITHM=INPLACE;

and the failure can still be repeated. With ALGORITHM=COPY, there is no corruption.

Comment by Marko Mäkelä [ 2017-09-06 ]

MDEV-13654 is actually modifying the contents of a buffer pool page, instead of modifying a copy of the record. Removing the offending writes makes the crash go away:

diff --git a/storage/innobase/row/row0merge.cc b/storage/innobase/row/row0merge.cc
index 63b1e2a4105..27752cf4c0f 100644
--- a/storage/innobase/row/row0merge.cc
+++ b/storage/innobase/row/row0merge.cc
@@ -2302,13 +2302,6 @@ row_merge_read_clustered_index(
 			for which history is not going to be
 			available after the rebuild operation.
 			This essentially mimics row_purge_reset_trx_id(). */
-			byte* ptr = static_cast<byte*>(
-				row->fields[new_trx_id_col].data);
-
-			memset(ptr, 0, DATA_TRX_ID_LEN + DATA_ROLL_PTR_LEN);
-			ptr[DATA_TRX_ID_LEN] = 1U
-				<< (ROLL_PTR_INSERT_FLAG_POS - CHAR_BIT
-				    * (DATA_ROLL_PTR_LEN - 1));
 		}
 
 		if (add_autoinc != ULINT_UNDEFINED) {

This change would make the original table appear corrupted while the table is being copied. I guess that the read request that is noticing the corruption would be from a purge thread.

Generated at Thu Feb 08 08:07:52 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.