|
The stored page number on an index root page is wrong. Looks like page corruption.
|
|
reproducible on 10.4, datadir attached in dt.7z
|
10.4 4447a02cf13a49876001a40c
|
mysqld: 10.4/src/storage/innobase/btr/btr0pcur.cc:139: void btr_pcur_store_position(btr_pcur_t*, mtr_t*): Assertion `page_get_page_no(page) == index->page' failed.
|
181126 14:42:31 [ERROR] mysqld got signal 6 ;
|
|
stdlib/abort.c:91(__GI_abort)[0x7f1d26e0b02a]
|
assert/assert.c:92(__assert_fail_base)[0x7f1d26e01bd7]
|
/lib/x86_64-linux-gnu/libc.so.6(+0x2dc82)[0x7f1d26e01c82]
|
btr/btr0pcur.cc:141(btr_pcur_store_position(btr_pcur_t*, mtr_t*))[0x55f4ce6ec7f8]
|
row/row0sel.cc:5568(row_search_mvcc(unsigned char*, page_cur_mode_t, row_prebuilt_t*, unsigned long, unsigned long))[0x55f4ce58addc]
|
handler/ha_innodb.cc:9265(ha_innobase::index_read(unsigned char*, unsigned char const*, unsigned int, ha_rkey_function))[0x55f4ce272ef1]
|
handler/ha_innodb.cc:9642(ha_innobase::index_first(unsigned char*))[0x55f4ce275999]
|
handler/ha_innodb.cc:9735(ha_innobase::rnd_next(unsigned char*))[0x55f4ce275f78]
|
sql/handler.cc:2765(handler::ha_rnd_next(unsigned char*))[0x55f4cda4b886]
|
sql/ha_partition.cc:5064(ha_partition::rnd_next(unsigned char*))[0x55f4cea3e6bb]
|
sql/handler.cc:2765(handler::ha_rnd_next(unsigned char*))[0x55f4cda4b886]
|
sql/records.cc:481(rr_sequential(READ_RECORD*))[0x55f4cdde3bb5]
|
sql/records.h:73(READ_RECORD::read_record())[0x55f4cd1d7fa6]
|
sql/sql_select.cc:20323(join_init_read_record(st_join_table*))[0x55f4cd45e167]
|
sql/sql_select.cc:19387(sub_select(JOIN*, st_join_table*, bool))[0x55f4cd457b6d]
|
sql/sql_select.cc:18930(do_select(JOIN*, Procedure*))[0x55f4cd4561c5]
|
sql/sql_select.cc:4094(JOIN::exec_inner())[0x55f4cd3f4bbd]
|
sql/sql_select.cc:3889(JOIN::exec())[0x55f4cd3f2862]
|
sql/sql_select.cc:4295(mysql_select(THD*, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*))[0x55f4cd3f5c8d]
|
sql/sql_select.cc:385(handle_select(THD*, LEX*, select_result*, unsigned long))[0x55f4cd3cff30]
|
sql/sql_parse.cc:6554(execute_sqlcom_select(THD*, TABLE_LIST*))[0x55f4cd35604a]
|
sql/sql_parse.cc:3773(mysql_execute_command(THD*))[0x55f4cd3447a4]
|
sql/sql_parse.cc:8091(mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool))[0x55f4cd35e5ef]
|
sql/sql_parse.cc:1853(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool))[0x55f4cd3394eb]
|
sql/sql_parse.cc:1396(do_command(THD*))[0x55f4cd336683]
|
sql/sql_connect.cc:1402(do_handle_one_connection(CONNECT*))[0x55f4cd68dc2d]
|
sql/sql_connect.cc:1309(handle_one_connection)[0x55f4cd68d60a]
|
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f1d27a466ba]
|
x86_64/clone.S:111(clone)[0x7f1d26edb41d]
|
|
Query (0x62b00007e288): select * from `table10_innodb_compressed_key_pk_parts_2_int_autoinc` LIMIT 5
|
|
|
marko, see alice's comment above. It's reproducible with the attached datadir – start server on it with all defaults and run select from the crash report:
select * from `table10_innodb_compressed_key_pk_parts_2_int_autoinc` LIMIT 5
|
Hopefully it will be enough for fixing.
|
|
Already as part of the innodb_change_buffering (implementing buffering of delete-mark and purge operations) we added various assertions that ensure that a page cannot be empty except if the whole table is empty, and the page is the root page of an index. The assertion is failing, because the empty page is not the root page.
In partition p0 of that table, the PRIMARY KEY index consists of two pages:
- page 3: only one node pointer to page 42
- page 42: empty page (which trips the assertion)
The index ID and everything else on the pages looks valid to me.
I am afraid that this dataset is not enough for fixing the bug. I would need a reproducible test case.
If the last record of a page is being deleted, we should execute btr_discard_page(), which in this case should also have executed btr_discard_only_page_on_level(). I would need a reproducible test case or a reduced grammar to debug and fix this.
I think that this bug (or possibly bogus debug assertion) should affect all versions since 5.5.
|
|
In [^threads], no thread is executing btr_discard_page(), so this does not look like a race condition.
Also, no purge thread is executing on this PRIMARY KEY index. Apparently, some delete operation (ROLLBACK of INSERT, or purge) fails to invoke btr_discard_page() when the page becomes empty.
|
|
marko,
The test case is attached as mdev14126.test . It contains only a few statements (creates 2 tables, inserts some data, then does DELETE and UPDATE), but it's ugly because of the data it inserts. Simplifying it further seems to reduce the probability of the failure, so I'm keeping it as is.
To run it, copy the attached earth*.jpg files to the /tmp dir and run the test with --mysqld=--secure-file-priv=/tmp --repeat=N. It usually fails for me on the first attempt, but sometimes it misses the mark, so there is some non-determinism somewhere.
The earth* files are used on numerous occasions by LOAD_FILE function in the test. The same files can also be found in <rqg>/data/ folder.
If you want to change the location from which the files are loaded, you'll need to modify the test.
You can also feed the test to a running server through MySQL client, there is nothing MTR-specific there; just don't forget --secure-file-priv=/tmp (or keep it empty).
|
|
I cannot repeat this in 10.3 778192454eba32f7b1f041026b3124d3de9c61c3 or 10.4 4edb29380c98058a28e49c826bacee9c83473579.
I suspect that this could be explained by MDEV-17821, but given that I was unable to repeat the failure, I cannot confirm that.
|
|
I cannot repeat this with the mentioned revision 4447a02cf13a49876001a40ca7db8fdedb731fd5 either. I tried CMAKE_BUILD_TYPE=Debug both without optimization and with -O2, built using clang 7.0.1.
|
|
It still fails for me as readily as before on the current 10.3 and 10.4:
|
10.3 4e75bfcb214
|
mysqld: /data/src/10.3/storage/innobase/btr/btr0pcur.cc:139: void btr_pcur_store_position(btr_pcur_t*, mtr_t*): Assertion `page_get_page_no(page) == index->page' failed.
|
190121 14:23:31 [ERROR] mysqld got signal 6 ;
|
|
#7 0x00007f81f4fa5ee2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
|
#8 0x000055e2fddd03e3 in btr_pcur_store_position (cursor=0x7f8198173bd0, mtr=0x7f81f00f08d0) at /data/src/10.3/storage/innobase/btr/btr0pcur.cc:139
|
#9 0x000055e2fdcfa134 in row_search_mvcc (buf=0x7f8198006a10 "\377\377\377\377\377\377\377\377", mode=PAGE_CUR_G, prebuilt=0x7f8198173a08, match_mode=0, direction=0) at /data/src/10.3/storage/innobase/row/row0sel.cc:5585
|
#10 0x000055e2fdb66271 in ha_innobase::index_read (this=0x7f81980732a8, buf=0x7f8198006a10 "\377\377\377\377\377\377\377\377", key_ptr=0x0, key_len=0, find_flag=HA_READ_AFTER_KEY) at /data/src/10.3/storage/innobase/handler/ha_innodb.cc:9269
|
#11 0x000055e2fdb67098 in ha_innobase::index_first (this=0x7f81980732a8, buf=0x7f8198006a10 "\377\377\377\377\377\377\377\377") at /data/src/10.3/storage/innobase/handler/ha_innodb.cc:9643
|
#12 0x000055e2fd845962 in handler::ha_index_first (this=0x7f81980732a8, buf=0x7f8198006a10 "\377\377\377\377\377\377\377\377") at /data/src/10.3/sql/handler.cc:2914
|
#13 0x000055e2fd5bdc8b in join_read_first (tab=0x7f8198018220) at /data/src/10.3/sql/sql_select.cc:20395
|
#14 0x000055e2fd5bb502 in sub_select (join=0x7f8198015e80, join_tab=0x7f8198018220, end_of_records=false) at /data/src/10.3/sql/sql_select.cc:19393
|
#15 0x000055e2fd5bbc84 in evaluate_join_record (join=0x7f8198015e80, join_tab=0x7f8198017e70, error=0) at /data/src/10.3/sql/sql_select.cc:19616
|
#16 0x000055e2fd5bb720 in sub_select (join=0x7f8198015e80, join_tab=0x7f8198017e70, end_of_records=false) at /data/src/10.3/sql/sql_select.cc:19435
|
#17 0x000055e2fd5baa9e in do_select (join=0x7f8198015e80, procedure=0x0) at /data/src/10.3/sql/sql_select.cc:18936
|
#18 0x000055e2fd593695 in JOIN::exec_inner (this=0x7f8198015e80) at /data/src/10.3/sql/sql_select.cc:4040
|
#19 0x000055e2fd592ad6 in JOIN::exec (this=0x7f8198015e80) at /data/src/10.3/sql/sql_select.cc:3834
|
#20 0x000055e2fd593d76 in mysql_select (thd=0x7f8198000b00, tables=0x7f8198014e00, wild_num=0, fields=..., conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=1342177408, result=0x7f8198015da8, unit=0x7f81980049b0, select_lex=0x7f8198005120) at /data/src/10.3/sql/sql_select.cc:4239
|
#21 0x000055e2fd643ebd in mysql_multi_update (thd=0x7f8198000b00, table_list=0x7f8198014e00, fields=0x7f8198005248, values=0x7f8198005750, conds=0x0, options=0, handle_duplicates=DUP_ERROR, ignore=false, unit=0x7f81980049b0, select_lex=0x7f8198005120, result=0x7f81f00f17b0) at /data/src/10.3/sql/sql_update.cc:1764
|
#22 0x000055e2fd54957f in mysql_execute_command (thd=0x7f8198000b00) at /data/src/10.3/sql/sql_parse.cc:4649
|
#23 0x000055e2fd55468d in mysql_parse (thd=0x7f8198000b00, rawbuf=0x7f8198014cd8 "UPDATE t2, t1 SET t2.`col_text_utf8_not_null_unique` = '1993'", length=61, parser_state=0x7f81f00f25f0, is_com_multi=false, is_next_command=false) at /data/src/10.3/sql/sql_parse.cc:8092
|
#24 0x000055e2fd541841 in dispatch_command (command=COM_QUERY, thd=0x7f8198000b00, packet=0x7f8198190d21 "UPDATE t2, t1 SET t2.`col_text_utf8_not_null_unique` = '1993'", packet_length=61, is_com_multi=false, is_next_command=false) at /data/src/10.3/sql/sql_parse.cc:1851
|
#25 0x000055e2fd540265 in do_command (thd=0x7f8198000b00) at /data/src/10.3/sql/sql_parse.cc:1396
|
#26 0x000055e2fd6a827a in do_handle_one_connection (connect=0x55e300deaee0) at /data/src/10.3/sql/sql_connect.cc:1402
|
#27 0x000055e2fd6a7ffe in handle_one_connection (arg=0x55e300deaee0) at /data/src/10.3/sql/sql_connect.cc:1308
|
#28 0x000055e2fdb42901 in pfs_spawn_thread (arg=0x55e300e2c880) at /data/src/10.3/storage/perfschema/pfs.cc:1862
|
#29 0x00007f81f6c7c494 in start_thread (arg=0x7f81f00f3700) at pthread_create.c:333
|
#30 0x00007f81f506293f in clone () from /lib/x86_64-linux-gnu/libc.so.6
|
An easy way to fail to reproduce it is, for example, not have all earth* files in the right place. If you're sure that everything is in order, and both secure-file-priv and all paths in the test point at the location where all 8 earth* files exist, then it might be something machine- or build-dependent. I'll set it up somewhere else, then.
|
|
I can reproduce the crash. The clustered index is in a strange state. The root page contains only one node pointer record, which is pointing to the empty leaf page.
Empty leaf pages are not allowed, except when the entire table is empty and the empty leaf page is also the root page. Somehow, we failed to shrink the index tree to a single empty root page when the last record was deleted from the leaf page.
|
|
I cannot reproduce the crash on an ASAN-instrumented executable, when setting a conditional breakpoint in gdb, or when disabling purge by setting --innodb-force-recovery=2. This supports the hypothesis that a race condition between purge and DML is involved.
|
|
This is a very elusive bug. The test case involves ROW_FORMAT=COMPRESSED and a lot of BLOB page creation and deletion. The uncompressed page frames of the compressed pages are being evicted. The empty non-leaf page (page number 69 or 70, depending on run) is several times used for writing BLOBs. If I add too much instrumentation, the bug will stop repeating.
|
|
Finally, I nailed down the cause. It turns out that the predicate page_is_root() is holding for all pages that have no siblings. InnoDB does sometimes leave behind a B-tree where a parent page only has 1 child page. With the following patch, the assertion will fail at the time when the corruption is about to occur:
diff --git a/storage/innobase/btr/btr0cur.cc b/storage/innobase/btr/btr0cur.cc
|
index c8951f7e01c..71e3d0f2494 100644
|
--- a/storage/innobase/btr/btr0cur.cc
|
+++ b/storage/innobase/btr/btr0cur.cc
|
@@ -5460,6 +5460,7 @@ btr_cur_optimistic_delete_func(
|
&& page_get_n_recs(block->frame) == 1
|
+ (cursor->index->is_instant()
|
&& !rec_is_metadata(rec, cursor->index)))) {
|
+ ut_ad(block->page.id.page_no() == cursor->index->page);
|
/* The whole index (and table) becomes logically empty.
|
Empty the whole page. That is, if we are deleting the
|
only user record, also delete the metadata record
|
The predicate page_is_root() was introduced by me in MySQL 5.7.6 and merged to MariaDB 10.2.2.
I think that independently of fixing this bug, we must find out which operation caused the leaf page to lose its siblings. Because such degenerate non-branching B-trees can theoretically exist in any InnoDB data files, we must remove the predicate page_is_root() and instead compare the page number to the index root page number.
|
|
I pushed a fix and additional assertions to bb-10.3-marko for testing. The assertions once caused a crash in the test innodb.innodb-change-buffer-recovery, because a change buffer merge was emptying a leaf page. So, there is hope for finding more InnoDB corruption bugs. If these assertions are failing too often, then we can for now take the HEAD^ version (and HEAD^^2 instead of HEAD^2 for the 10.2 version).
Additional assertions will be needed for catching MDEV-19022, using the test case from this report. We should assert that we never leave a leaf page that only carries 1 record and has no sibling pages, except if the leaf page is the root page.
|
|
The test innodb.innodb_bug14676111 demonstrates that we can indeed have internal B-tree pages that only have 1 child page pointer. In the following case the predicate page_is_root() would still work, because all pages except the root do have siblings:
#current tree form
|
# (1, 5)
|
# (1, 4) (5)
|
#(1, 3) (4) (5)
|
However, in the following case, page_is_root() would wrongly hold on the useless second-level page:
#deleting 1 record of 2 records don't cause merge artificially.
|
#current tree form
|
# (1)
|
# (1)
|
# (1, 3) <- lift up this level next, when deleting node ptr
|
#(1, 2) (3) <- merged next
|
In MDEV-19022 we shall fix the logic so that the useless page will be discarded from the tree.
|