[MDEV-13534] InnoDB STATS_PERSISTENT fails to ignore garbage delete-mark flag on node pointer pages Created: 2017-08-15  Updated: 2019-06-17  Resolved: 2017-08-24

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.2.7
Fix Version/s: 10.2.9

Type: Bug Priority: Major
Reporter: Tim Westervoorde Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: assertion, innodb
Environment:

CentOS 7, Linux 3.10.0-514.26.2.el7.x86_64 #1 SMP Tue Jul 4 15:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux


Issue Links:
Duplicate
is duplicated by MDEV-13582 Mariadb Slave - mysqld got signal 6 ... Closed
Problem/Incident
causes MDEV-13667 Connection stuck in Unlocking tables ... Closed
is caused by MDEV-12698 innodb.innodb_stats_del_mark test fai... Closed

 Description   

The MariaDB crashed with an assertion failure. At that time there was a large transaction running with (according to the rollback log) approx. 4191650 row operations.

2017-08-15 15:33:20 0x7fc018037700  InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.2.7/storage/innobase/dict/dict0stats.cc line 1572
InnoDB: Failing assertion: offsets_rec != NULL
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
170815 15:33:20 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.2.7-MariaDB
key_buffer_size=16777216
read_buffer_size=2097152
max_used_connections=101
max_threads=102
thread_count=26
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 436307 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x7fc13e53d37e]
/usr/sbin/mysqld(handle_fatal_signal+0x30d)[0x7fc13df82fad]
/lib64/libpthread.so.0(+0xf370)[0x7fc13d4f0370]
/lib64/libc.so.6(gsignal+0x37)[0x7fc13ba791d7]
/lib64/libc.so.6(abort+0x148)[0x7fc13ba7a8c8]
/usr/sbin/mysqld(+0x422d82)[0x7fc13dd42d82]
/usr/sbin/mysqld(+0xa727c0)[0x7fc13e3927c0]
/usr/sbin/mysqld(+0xa749c7)[0x7fc13e3949c7]
/usr/sbin/mysqld(+0xa781ed)[0x7fc13e3981ed]
/usr/sbin/mysqld(+0xa7a283)[0x7fc13e39a283]
/lib64/libpthread.so.0(+0x7dc5)[0x7fc13d4e8dc5]
/lib64/libc.so.6(clone+0x6d)[0x7fc13bb3b76d]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.



 Comments   
Comment by Elena Stepanova [ 2017-08-15 ]

Please paste or attach your cnf file(s). Also, whatever information you retrieved from the log or any monitoring tools, might also be helpful.
Were there any errors/warnings in the error log before the assertion failure?

Comment by Tim Westervoorde [ 2017-08-16 ]

The config is as follows. There weren't any log messages before the crash. Unfortunately this server wasn't in our monitoring environment yet so I have no other info except this.

[server]
 
skip-name-resolve
 
key_buffer_size = 16M
max_allowed_packet = 16M
table_open_cache = 512
join_buffer_size = 2M
sort_buffer_size = 2M
read_buffer_size = 2M
read_rnd_buffer_size = 2M
myisam_sort_buffer_size = 16M
 
max_connections = 100
thread_cache_size = 20
tmp_table_size = 256M
max_heap_table_size = 256M
 
query_cache_type = 0
query_cache_size = 0M
 
#log_slow_queries
#log_queries_not_using_indexes
 
innodb_data_home_dir = /var/lib/mysql/
innodb_data_file_path = ibdata1:1G:autoextend
innodb_log_group_home_dir = /var/lib/mysql/
 
innodb_buffer_pool_size = 4G
innodb_buffer_pool_instances = 4
innodb_log_file_size = 1G
innodb_log_buffer_size = 16M
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 50
innodb_file_per_table
 
[client]
default-character-set=utf8mb4
 
[mysql]
default-character-set=utf8mb4
 
[mysqld]
collation-server = utf8mb4_unicode_ci
init-connect='SET NAMES utf8mb4'
character-set-server = utf8mb4
sql-mode = ''

Comment by Tim Westervoorde [ 2017-08-22 ]

Hi Elena,

This happened again today. It happens in a large transaction.

2017-08-22 10:14:36 140105706969216 [Note] InnoDB: To recover: 20810 pages from log
2017-08-22 10:16:16 140105706969216 [Note] InnoDB: 2 transaction(s) which must be rolled back or cleaned up in total 4610727 row operations to undo
2017-08-22 10:16:16 140105706969216 [Note] InnoDB: Trx id counter is 3804286720
2017-08-22 10:16:16 140105706969216 [Note] InnoDB: Starting final batch to recover 20507 pages from redo log.
2017-08-22 10:16:16 140100691932928 [Note] InnoDB: To recover: 20506 pages from log
2017-08-22 10:16:27 140105706969216 [Note] InnoDB: 128 out of 128 rollback segments are active.
2017-08-22 10:16:27 140100591220480 [Note] InnoDB: Starting in background the rollback of recovered transactions
2017-08-22 10:16:27 140100591220480 [Note] InnoDB: Rolling back trx with id 3804285856, 50 rows to undo
2017-08-22 10:16:27 140105706969216 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2017-08-22 10:16:27 140105706969216 [Note] InnoDB: Creating shared tablespace for temporary tables
2017-08-22 10:16:27 140105706969216 [Note] InnoDB: Setting file '/var/lib/mysql/ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2017-08-22 10:16:27 140100591220480 [Note] InnoDB: Rollback of trx with id 3804285856 completed
2017-08-22 10:16:27 140100591220480 [Note] InnoDB: Rolling back trx with id 3804278128, 4610677 rows to undo

Is there anything I can do to help pinpointing this issue? We have an idea when this gets triggered so we could trigger it by hand.

Comment by Marko Mäkelä [ 2017-08-23 ]

dicode, this assertion failure occurs when the InnoDB persistent statistics are updated.
I believe that this code was introduced already in MySQL 5.6 (and thus MariaDB 10.0), but what changed in MySQL 5.7 and MariaDB 10.2.2 was that some defaults were changed, and possibly also that some logic was added that would trigger the recomputation of persistent statistics when a lot of the table was changed.

On the surface, the assertion failure looks like a sign of corruption: an index page is empty even though the table is not empty. An empty index page is only allowed when it is the root page and the whole index (and table) is empty.

		offsets_rec = dict_stats_scan_page(
			&rec, offsets1, offsets2, index, page, n_prefix,
			QUIT_ON_FIRST_NON_BORING, n_diff, NULL);
 
		/* pages on level > 0 are not allowed to be empty */
		ut_a(offsets_rec != NULL);

However, if we look deeper, a change to the logic of dict_stats_scan_page() seems to be behind this:

	if (page_rec_is_supremum(rec)) {
		/* the page is empty or contains only delete-marked records */
		*n_diff = 0;
		*out_rec = NULL;
		return(NULL);
	}

It is entirely possible and OK for a page to consist only of delete-marked records.
Does this occur with a massive DELETE, or with a massive UPDATE of an indexed column?
It looks like this bug was introduced already in 2012 in MySQL 5.6.6.

We should tolerate a NULL return value if !page_is_empty(page).

Comment by Marko Mäkelä [ 2017-08-24 ]

I debugged this a little with the following test:

--source include/have_innodb.inc
 
SET @saved_include_delete_marked = @@GLOBAL.innodb_stats_include_delete_marked;
SET @saved_innodb_stats_traditional = @@GLOBAL.innodb_stats_traditional;
SET @saved_innodb_stats_modified_counter = @@GLOBAL.innodb_stats_modified_counter;
SET GLOBAL innodb_stats_include_delete_marked = OFF;
SET GLOBAL innodb_stats_traditional=false;
SET GLOBAL innodb_stats_modified_counter=1;
 
CREATE TABLE t(a SERIAL)ENGINE=InnoDB STATS_PERSISTENT=1 STATS_SAMPLE_PAGES=1;
 
BEGIN;
INSERT t VALUES(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),(),();
INSERT t SELECT NULL FROM t;
INSERT t SELECT NULL FROM t;
INSERT t SELECT NULL FROM t;
INSERT t SELECT NULL FROM t;
INSERT t SELECT NULL FROM t;
INSERT t SELECT NULL FROM t;
COMMIT;
 
connect(disable_purge, localhost, root,,);
START TRANSACTION WITH CONSISTENT SNAPSHOT;
 
connection default;
SELECT COUNT(*) FROM t;
DELETE FROM t;
ANALYZE TABLE t;
EXPLAIN SELECT COUNT(*) FROM t;
DROP TABLE t;
 
disconnect disable_purge;
 
SET GLOBAL innodb_stats_include_delete_marked = @saved_include_delete_marked;
SET GLOBAL innodb_stats_traditional = @saved_innodb_stats_traditional;
SET GLOBAL innodb_stats_modified_counter = @saved_innodb_stats_modified_counter;

During this test, persistent statistics were calculated twice: first during an INSERT…SELECT, then during the ANALYZE TABLE which was what I was interested in. In the ANALYZE TABLE, the execution got this far, on a leaf page that consisted entirely of delete-marked records:

		if (page_is_leaf(page)) {
			/* leaf level */
			break;
		}
		/* else */
 
		/* search for the first non-boring record on the page */
		offsets_rec = dict_stats_scan_page(
			&rec, offsets1, offsets2, index, page, n_prefix,
			QUIT_ON_FIRST_NON_BORING, n_diff, NULL);
 
		/* pages on level > 0 are not allowed to be empty */
		ut_a(offsets_rec != NULL);

Unfortunately, because this is a leaf page, we would not get to the dict_stats_scan_page() call. But, this is also good news: I think I have found the bug.

While I cannot repeat this yet, I do know that on non-leaf pages, the delete-mark flag in the node pointer records is basically garbage. (Delete-marking only makes sense at the leaf level anyway. The purpose of the delete-mark is to tell MVCC, locking and purge that a leaf-level record does not exist in the READ UNCOMMITTED view, but it used to exist.)
Node pointer records and non-leaf pages are glue that attaches multiple leaf pages to an index. This glue is supposed to be transparent to the transactional layer.

When a page is split, InnoDB creates a node pointer record out of the child page record that the cursor is positioned on. The node pointer record for the parent page will be a copy of the child page record, amended with the child page number. If the child page record happened to carry the delete-mark flag, then the node pointer record would also carry this flag (even though the flag makes no sense outside child pages). (On a related note, for the first node pointer record in the first node pointer page of each tree level, if the MIN_REC_FLAG is set, the rest of the record contents (except the child page number) is basically garbage. From this garbage you could deduce at which point the child was originally split.)

I believe that to repeat this bug, you would need a node pointer page that is full of node pointer records that happen to have the delete-mark flag set. This is possible when a page split is triggered on a leaf page that is full of delete-marked records.

Next, I will try to revise my test case so that it will first construct a single leaf page full of delete-marked records, and then cause an INSERT that makes each node pointer record carry a delete-mark flag, and finally execute ANALYZE TABLE to hit the bug.

The fix is obvious: dict_stats_scan_page() should ignore the garbage delete-mark flags on non-leaf pages. But I want to repeat this first.

Comment by Tim Westervoorde [ 2017-08-24 ]

This seems valid for my case. A table with ~450k rows is completely rebuild along with a 2nd table with 1.7m rows which references this table. So there are a lot of insert / delete statements in a short period of time, within a transaction.

If there is anything I can debug for you let me know, I can trigger this to happen by running the same update script over and over again

Kind regards,
Tim

Comment by Marko Mäkelä [ 2017-08-24 ]

A prerequisite of this bug is that a non-root node pointer page is filled with records that carry the (garbage) delete-mark flag. There is no problem if all root page records are ‘delete-marked’ or if all leaf page records are delete-marked. So, the minimum required index tree height is 3, that is, 2 levels of node pointers above the leaf level.

Because the maximum PRIMARY KEY length is so small (768 bytes on innodb_page_size=4k; 3072 bytes on innodb_page_size=16k), I thought that it is infeasible to create an ’organic’ test for this. Yes, I could pad the leaf page record sizes with non-key columns, but if each node pointer page will have at least 4 records, I’d still need quite a few row before the node pointer page is split.

I was able to reproduce this by using debug instrumentation, adapting the great innodb.innodb_bug14676111 test painstakingly created by Yasufumi Kinoshita:

--source include/have_innodb.inc
--source include/have_debug.inc
 
CREATE TABLE t(a INT UNSIGNED PRIMARY KEY)
ENGINE=InnoDB STATS_PERSISTENT=1 STATS_SAMPLE_PAGES=1;
 
BEGIN;
# Create an index tree of height 3.
# This is adapted from innodb.innodb_bug14676111.
 
SET @save_debug = @@GLOBAL.innodb_limit_optimistic_insert_debug;
SET GLOBAL innodb_limit_optimistic_insert_debug=2;
 
INSERT t VALUES(1),(5);
DELETE FROM t;
INSERT t VALUES(4);
DELETE FROM t;
INSERT t VALUES(3);
DELETE FROM t;
SET GLOBAL innodb_limit_optimistic_insert_debug = @save_debug;
 
connect(con1, localhost, root,,);
ANALYZE TABLE t;
disconnect con1;
 
connection default;
DROP TABLE t;

Comment by Marko Mäkelä [ 2017-08-24 ]

I introduced this bug in MDEV-12698 in MySQL 10.2.7 with the following change:

@@ -1392,13 +1388,10 @@ dict_stats_scan_page(
 	Because offsets1,offsets2 should be big enough,
 	this memory heap should never be used. */
 	mem_heap_t*	heap			= NULL;
-	const rec_t*	(*get_next)(const rec_t*);
-
-	if (scan_method == COUNT_ALL_NON_BORING_AND_SKIP_DEL_MARKED) {
-		get_next = page_rec_get_next_non_del_marked;
-	} else {
-		get_next = page_rec_get_next_const;
-	}
+	const rec_t*	(*get_next)(const rec_t*)
+		= srv_stats_include_delete_marked
+		? page_rec_get_next_const
+		: page_rec_get_next_non_del_marked;
 
 	const bool	should_count_external_pages = n_external_pages != NULL;
 

This change wrongly causes the bogus (garbage) delete-mark flags to be considered on node pointer pages.

The only MariaDB Server releases that are affected by this bug are 10.2.7 and 10.2.8.

Comment by Andrii Nikitin (Inactive) [ 2017-09-08 ]

To workaround the problem (besides upgrade when 10.2.9 is released) one may try to disable persistent statistics :

put following line into [mysqld] section of .cnf file:
innodb_stats_persistent=0

To disable persistent stats for current server without restart - execute SQL command:

set global innodb_stats_persistent=0;

Comment by Guangpu Feng [ 2019-06-17 ]

set global innodb_stats_persistent=0;

will not workaround this bug, verified in 10.2.8.

Generated at Thu Feb 08 08:06:20 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.