Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Cannot Reproduce
-
10.2.9, 10.2.10, 10.2.12
-
RHEL 6.5 x64 on physical server.
Last error on 10.2.12.
-
10.2.14
Description
Problem started after an upgrade 10.0.22 -> 10.2.9 and continues with 10.2.12
error log for 10.2.12 |
2018-01-10 0:15:13 139882323265280 [ERROR] [FATAL] InnoDB: Trying to read page number 376886 in space 4294967294, space name innodb_temporary, which is outside the tablespace bounds. Byte offset 0, len 16384
|
180110 0:15:13 [ERROR] mysqld got signal 6 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
|
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
|
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
|
Server version: 10.2.12-MariaDB-log
|
key_buffer_size=33554432
|
read_buffer_size=2097152
|
max_used_connections=421
|
max_threads=401
|
thread_count=401
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1700897 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
|
Thread pointer: 0x7f36d0f4d878
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
stack_bottom = 0x7f38e4303cc8 thread_stack 0x30000
|
(my_addr_resolve failure: fork)
|
/usr/sbin/mysqld(my_print_stacktrace+0x2b) [0x7f4373fb2a2b]
|
/usr/sbin/mysqld(handle_fatal_signal+0x535) [0x7f4373a7f915]
|
/lib64/libpthread.so.0(+0x3e5260f7e0) [0x7f4372fa67e0]
|
/lib64/libc.so.6(gsignal+0x35) [0x7f43716005e5]
|
/lib64/libc.so.6(abort+0x175) [0x7f4371601dc5]
|
/usr/sbin/mysqld(+0x9addca) [0x7f4373d83dca]
|
/usr/sbin/mysqld(+0xa56b0d) [0x7f4373e2cb0d]
|
/usr/sbin/mysqld(+0xa5f3bc) [0x7f4373e353bc]
|
/usr/sbin/mysqld(+0xa12b3b) [0x7f4373de8b3b]
|
/usr/sbin/mysqld(+0xa141e6) [0x7f4373dea1e6]
|
/usr/sbin/mysqld(+0x9f1b2b) [0x7f4373dc7b2b]
|
/usr/sbin/mysqld(+0x98ae6c) [0x7f4373d60e6c]
|
/usr/sbin/mysqld(+0x947263) [0x7f4373d1d263]
|
/usr/sbin/mysqld(+0x8ea2a7) [0x7f4373cc02a7]
|
/usr/sbin/mysqld(+0x98bde5) [0x7f4373d61de5]
|
/usr/sbin/mysqld(+0x98c80b) [0x7f4373d6280b]
|
/usr/sbin/mysqld(+0x98cec0) [0x7f4373d62ec0]
|
/usr/sbin/mysqld(+0x857c44) [0x7f4373c2dc44]
|
/usr/sbin/mysqld(ha_rollback_trans(THD*, bool)+0xe2) [0x7f4373a825c2]
|
/usr/sbin/mysqld(trans_rollback(THD*)+0x5a) [0x7f43739d3c3a]
|
/usr/sbin/mysqld(mysql_execute_command(THD*)+0x127c) [0x7f43738fee9c]
|
/usr/sbin/mysqld(mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool)+0x29a) [0x7f437390617a]
|
/usr/sbin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool)+0x1cde) [0x7f43739084ee]
|
/usr/sbin/mysqld(do_command(THD*)+0x14e) [0x7f4373908a9e]
|
/usr/sbin/mysqld(tp_callback(TP_connection*)+0x67) [0x7f43739ff917]
|
/usr/sbin/mysqld(+0x682b20) [0x7f4373a58b20]
|
/lib64/libpthread.so.0(+0x3e52607aa1) [0x7f4372f9eaa1]
|
/lib64/libc.so.6(clone+0x6d) [0x7f43716b6aad]
|
|
Trying to get some variables.
|
Some pointers may be invalid and cause the dump to abort.
|
Query (0x7f36d05987f0): rollback
|
Connection ID (thread ID): 64393
|
Status: NOT_KILLED
|
|
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=on,mrr_cost_based=on,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on
|
similar error from 10.2.9 |
:
|
2017-11-19 17:30:46 139866819159808 [ERROR] [FATAL] InnoDB: Trying to read page number 1755915 in space 4294967294, space name innodb_temporary, which is outside the tablespace bounds. Byte offset 0, len 16384
|
171119 17:30:46 [ERROR] mysqld got signal 6 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
|
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
|
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
|
Server version: 10.2.9-MariaDB-log
|
key_buffer_size=33554432
|
read_buffer_size=2097152
|
max_used_connections=263
|
max_threads=401
|
thread_count=125
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1700682 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
|
Thread pointer: 0x7f28bd04ba58
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
stack_bottom = 0x7f3548125cc8 thread_stack 0x30000
|
(my_addr_resolve failure: fork)
|
/usr/sbin/mysqld(my_print_stacktrace+0x2b) [0x7f35507b3d8b]
|
/usr/sbin/mysqld(handle_fatal_signal+0x4d5) [0x7f355027e425]
|
/lib64/libpthread.so.0(+0x3e5260f7e0) [0x7f354f7a67e0]
|
/lib64/libc.so.6(gsignal+0x35) [0x7f354de005e5]
|
/lib64/libc.so.6(abort+0x175) [0x7f354de01dc5]
|
/usr/sbin/mysqld(+0x9a7dfa) [0x7f355057ddfa]
|
/usr/sbin/mysqld(+0xa5771d) [0x7f355062d71d]
|
/usr/sbin/mysqld(+0xa5c96f) [0x7f355063296f]
|
/usr/sbin/mysqld(+0xa1231b) [0x7f35505e831b]
|
/usr/sbin/mysqld(+0xa139c6) [0x7f35505e99c6]
|
/usr/sbin/mysqld(+0x9ebb71) [0x7f35505c1b71]
|
/usr/sbin/mysqld(+0x984f3c) [0x7f355055af3c]
|
/usr/sbin/mysqld(+0x9425bb) [0x7f35505185bb]
|
/usr/sbin/mysqld(+0x8e5837) [0x7f35504bb837]
|
/usr/sbin/mysqld(+0x985f85) [0x7f355055bf85]
|
/usr/sbin/mysqld(+0x98627b) [0x7f355055c27b]
|
/usr/sbin/mysqld(+0x91276a) [0x7f35504e876a]
|
/usr/sbin/mysqld(+0x914931) [0x7f35504ea931]
|
/usr/sbin/mysqld(+0x8490b0) [0x7f355041f0b0]
|
/usr/sbin/mysqld(+0x8583f0) [0x7f355042e3f0]
|
/usr/sbin/mysqld(handler::ha_write_row(unsigned char*)+0x177) [0x7f3550287d77]
|
/usr/sbin/mysqld(write_record(THD*, TABLE*, st_copy_info*)+0x71) [0x7f35500e5f21]
|
/usr/sbin/mysqld(select_insert::send_data(List<Item>&)+0xd2) [0x7f35500e6882]
|
/usr/sbin/mysqld(+0x5684bf) [0x7f355013e4bf]
|
/usr/sbin/mysqld(+0x555381) [0x7f355012b381]
|
/usr/sbin/mysqld(sub_select(JOIN*, st_join_table*, bool)+0x17e) [0x7f35501336fe]
|
/usr/sbin/mysqld(JOIN::exec_inner()+0x9f9) [0x7f35501516c9]
|
/usr/sbin/mysqld(JOIN::exec()+0x3c) [0x7f35501518fc]
|
/usr/sbin/mysqld(mysql_select(THD*, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*)+0x12a) [0x7f355015080a]
|
/usr/sbin/mysqld(handle_select(THD*, LEX*, select_result*, unsigned long)+0x1d4) [0x7f3550151b14]
|
/usr/sbin/mysqld(mysql_execute_command(THD*)+0x78d6) [0x7f3550105076]
|
/usr/sbin/mysqld(Prepared_statement::execute(String*, bool)+0x4de) [0x7f355011677e]
|
/usr/sbin/mysqld(+0x543dbb) [0x7f3550119dbb]
|
/usr/sbin/mysqld(+0x544a32) [0x7f355011aa32]
|
/usr/sbin/mysqld(mysqld_stmt_execute(THD*, char*, unsigned int)+0x27) [0x7f355011ab77]
|
/usr/sbin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool)+0xefa) [0x7f355010739a]
|
/usr/sbin/mysqld(do_command(THD*)+0x14e) [0x7f35501087be]
|
/usr/sbin/mysqld(tp_callback(TP_connection*)+0x67) [0x7f35501fe967]
|
/usr/sbin/mysqld(+0x6816c0) [0x7f35502576c0]
|
/lib64/libpthread.so.0(+0x3e52607aa1) [0x7f354f79eaa1]
|
/lib64/libc.so.6(clone+0x6d) [0x7f354deb6aad]
|
|
Trying to get some variables.
|
Some pointers may be invalid and cause the dump to abort.
|
Query (0x7f28c8014d08): insert ignore into asset_lite select * from asset_lite_temp order by asset_id
|
|
Connection ID (thread ID): 327798
|
Status: NOT_KILLED
|
|
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=on,mrr_cost_based=on,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on
|
Attachments
Issue Links
- relates to
-
MDEV-15524 Do not write garbage or disable page checksums for temporary tables
-
- Closed
-
The crash happens when executing a rollback statement.
In InnoDB, all tables and records are exclusively locked and changes written upfront. Rollback should replay the undo log backwards, decrementing the trx->undo_no all the way to 0. Finally, the transaction with no net changes to tables will be committed.
MariaDB 10.2 (and MySQL 5.7) introduced a separate tablespace for temporary tables and temporary undo logs. Until then, temporary tables were treated as persistent.
It turns out that the function trx_roll_pop_top_rec_of_trx() is actually prioritizing the undo logs of persistent tables. Only after all persistent changes have been rolled back, InnoDB would move to processing any temporary undo log records that are past the rollback savepoint. While I have refactored this code in MariaDB 10.2 and 10.3, this logic of prioritizing persisted undo logs in rollback remains, just like it is in MySQL 5.7.
The cause of the crash is a wrong page reference. In the core dump, I successfully looked up two undo pages: one in the persistent tablespace (system tablespace; innodb_undo_tablespaces was not used) and another in the temporary tablespace. The one in the temporary tablespace appears to be empty (containing only garbage after the start=end=0x110 byte offset). The one in the persistent tablespace is almost full with records, but there is a record at offset 0xb4c, not 0xb4b = 2891. Here is the wrong pointer:
(gdb) thread 1
(gdb) frame 13
(gdb) p *undo
$1 = {id = 0, type = 1, state = 1, del_marks = 0, trx_id = 19350037632, xid = {formatID = 1,
gtrid_length = 24, bqual_length = 0,
data = "MySQLXid\346\001\000\000\000\000\000\000Y%\343\v", '\000' <repeats 107 times>},
dict_operation = 0, table_id = 6197768787689408977, rseg = 0x7f43b90c6360, space = 4294967294,
hdr_page_no = 90544, hdr_offset = 86, last_page_no = 90544, size = 1, empty = 0,
top_page_no = 376886, top_offset = 2891, top_undo_no = 13, guess_block = 0x7f3fed2ac008,
withdraw_clock = 0, undo_list = {prev = 0x0, next = 0x0}}
…
The temporary tablespace does not have page number 376886, because it is much smaller:
(gdb) frame 7
(gdb) p *space
$3 = {name = 0x7f43b4a2b9e0 "innodb_temporary", id = 4294967294, max_lsn = 0, stop_ios = false,
stop_new_ops = false, is_being_truncated = false, purpose = FIL_TYPE_TEMPORARY, chain = {count = 1,
start = 0x7f43bcb24090, end = 0x7f43bcb24090, node = &fil_node_t::chain}, size = 98304,
size_in_header = 98304, free_len = 950, free_limit = 95872, recv_size = 0, flags = 0,
…
In the 3 undo logs of the transaction, the last_page_no should be equal to top_page_no:
(gdb) p trx->rsegs.m_redo.insert_undo.last_page_no
$4 = 4653060
(gdb) p trx->rsegs.m_redo.insert_undo.top_page_no
$5 = 4653060
(gdb) p trx->rsegs.m_redo.update_undo.last_page_no
$6 = 3964969
(gdb) p trx->rsegs.m_redo.update_undo.top_page_no
$7 = 3964969
That is not the case for the temporary undo log:
(gdb) p trx->rsegs.m_noredo.undo.top_page_no
$8 = 376886
(gdb) p trx->rsegs.m_noredo.undo.last_page_no
$9 = 90544
Page 90544 exists in the temporary tablespace, but it is empty apart from an undo log header, which contains a MySQLXid that does not match the trx->xid.
We also have trx->rsegs.m_noredo.undo.guess_block which matches much of the temporary undo log data structure in memory:
(gdb) p *trx->rsegs.m_noredo.undo
$10 = {id = 0, type = 1, state = 1, del_marks = 0, trx_id = 19350037632, xid = {
formatID = 1, gtrid_length = 24, bqual_length = 0,
data = "MySQLXid\346\001\000\000\000\000\000\000Y%\343\v", '\000' <repeats 107 times>}, dict_operation = 0, table_id = 6197768787689408977,
rseg = 0x7f43b90c6360, space = 4294967294, hdr_page_no = 90544,
hdr_offset = 86, last_page_no = 90544, size = 1, empty = 0,
top_page_no = 376886, top_offset = 2891, top_undo_no = 13,
guess_block = 0x7f3fed2ac008, withdraw_clock = 0, undo_list = {prev = 0x0,
next = 0x0}}
The guess_block contains only inserts to a single temporary table, in ascending order of a BIGINT AUTO_INCREMENT PRIMARY KEY column, with no gaps in the AUTO_INCREMENT values or the undo_no. The first undo record is at byte offset 0x38 and the last one at 0x2f8d. Each record is 21 bytes. The last undo_no is 0x88610 or 558,608. If this was the very last record written by the transaction, this would mean that there were 558,608 rows modified by the transaction. In addition to at least this one temporary table, the transaction modified 25 persistent tables (the size of trx->mod_tables). This temporary table had the PRIMARY KEY and one UNIQUE INDEX on 3 columns.
The top_offset is pointing to a undo record in the middle of the guess_block. The guess_block should be the very last temporary undo log block that was written in the course of the transaction before rollback.
But if this block were valid for the transaction, the top_undo_no should be much bigger (558,608 instead of 13).
In trx->rsegs.m_redo.insert_undo.guess_block.frame the last undo_no is 0x88611, or 558,609 (one larger than the last written temporary undo log record).
In trx->rsegs.m_redo.update_undo.guess_block.frame the last undo_no is 0x88330 (much smaller than the first record in the last written temporary undo log page, 0x883cf).
(gdb) p *trx->rsegs.m_redo.insert_undo
$11 = {id = 1, type = 1, state = 1, del_marks = 0, trx_id = 19350037632, xid = {
formatID = -1, gtrid_length = 0, bqual_length = 0,
data = '\000' <repeats 127 times>}, dict_operation = 0,
table_id = 228954336, rseg = 0x7f43b317e710, space = 0,
hdr_page_no = 4653060, hdr_offset = 86, last_page_no = 4653060, size = 1,
empty = 1, top_page_no = 4653060, top_offset = 272, top_undo_no = 0,
guess_block = 0x7f39e4aa8240, withdraw_clock = 0, undo_list = {prev = 0x0,
next = 0x0}}
(gdb) p *trx->rsegs.m_redo.update_undo
$12 = {id = 0, type = 2, state = 1, del_marks = 1, trx_id = 19350037632, xid = {
formatID = 1, gtrid_length = 24, bqual_length = 0,
data = "MySQLXid\346\001\000\000\000\000\000\000Y%\343\v", '\000' <repeats 107 times>}, dict_operation = 0, table_id = 4832885630348296191,
rseg = 0x7f43b317e710, space = 0, hdr_page_no = 3964969, hdr_offset = 7083,
last_page_no = 3964969, size = 1, empty = 1, top_page_no = 3964969,
top_offset = 7269, top_undo_no = 323531, guess_block = 0x7f3d22574710,
withdraw_clock = 0, undo_list = {prev = 0x0, next = 0x0}}
It remains a mystery how and why exactly the trx->rsegs.m_noredo.undo was corrupted. Currently my best guess is that a stray write from some other thread corrupted the memory while the explicit ROLLBACK from the client connection was being executed. There are not many assignments to top_page_no, and the FIL_PAGE_OFFSET in the undo pages looks sensible.
While painstakingly analyzing the core dump, I made a few observations that should eventually be addressed. But these should not explain the failure.