[MDEV-14132] innodb.innodb-64k failed in buildbot, lost connection to server Created: 2017-10-26  Updated: 2017-11-06  Resolved: 2017-10-27

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Storage Engine - XtraDB
Affects Version/s: 10.1.29, 10.2.10
Fix Version/s: 10.2.10, 10.3.3

Type: Bug Priority: Blocker
Reporter: Alice Sherepa Assignee: Vladislav Vaintroub
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
Relates
relates to MDEV-12235 sys_vars.log_slow_admin_statements_fu... Closed
relates to MDEV-13542 Crashing on a corrupted page is unhel... Closed
relates to MDEV-12235 sys_vars.log_slow_admin_statements_fu... Closed
relates to MDEV-14244 MariaDB 10.2.10 fails to run on Debia... Closed

 Description   

http://buildbot.askmonty.org/buildbot/builders/kvm-fulltest2/builds/10141/steps/test_2/logs/stdio

innodb.innodb-64k 'innodb'               w2 [ fail ]
        Test ended at 2017-10-26 03:11:14
 
CURRENT_TEST: innodb.innodb-64k
mysqltest: At line 635: query 'update t2 set col150=@a' failed: 2013: Lost connection to MySQL server during query
 
The result from queries just before the failure was:
< snip >
  `COL181` text DEFAULT NULL,
  `COL182` text DEFAULT NULL,
  `COL183` text DEFAULT NULL,
  `COL184` text DEFAULT NULL,
  `COL185` text DEFAULT NULL,
  `COL186` text DEFAULT NULL,
  `COL187` text DEFAULT NULL,
  `COL188` text DEFAULT NULL,
  `COL189` text DEFAULT NULL,
  `COL190` text DEFAULT NULL,
  `COL191` text DEFAULT NULL,
  `COL192` text DEFAULT NULL,
  `COL193` text DEFAULT NULL,
  `COL194` text DEFAULT NULL,
  `COL195` text DEFAULT NULL,
  `COL196` text DEFAULT NULL,
  `COL197` text DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC
BEGIN;
update t2 set col150=@a;
 
More results from queries before failure can be found in /mnt/buildbot/build/mariadb-10.2.10/mysql-test/var/2/log/innodb-64

2017-10-26  3:11:12 2932595520 [Note] InnoDB: Uncompressed page, stored checksum in field1 2609818021, calculated checksums for field1: crc32 1225725904/2230021968, innodb 4240219472,  page type 10 == BLOB.none 3735928559, stored checksum in field2 2609818021, calculated checksums for field2: crc32 1225725904/2230021968, innodb 2875794546, none 3735928559,  page LSN 0 14755692, low 4 bytes of LSN at page end 14755692, page number (if stored to page already) 629, space id (if created with >= MySQL-4.1.1 and stored already) 10
InnoDB: Page may be a BLOB page
2017-10-26  3:11:12 2932595520 [Note] InnoDB: It is also possible that your operating system has corrupted its own file cache and rebooting your computer removes the error. If the corrupt page is an index page. You can also try to fix the corruption by dumping, dropping, and reimporting the corrupt table. You can use CHECK TABLE to scan your table for corruption. Please refer to http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html for information about forcing recovery.
171026  3:11:12 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.2.10-MariaDB-debug-log
key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=1
max_threads=153
thread_count=7
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 61885 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0xa6305c38
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0xaecbd2a0 thread_stack 0x49000
/mnt/buildbot/build/mariadb-10.2.10/sql/mysqld(my_print_stacktrace+0x32)[0x8cf1db2]
/mnt/buildbot/build/mariadb-10.2.10/sql/mysqld(handle_fatal_signal+0x3c7)[0x85a44c3]
[0xb7768400]
/mnt/buildbot/build/mariadb-10.2.10/sql/mysqld[0x889f60d]
/mnt/buildbot/build/mariadb-10.2.10/sql/mysqld[0x88cdabe]
/mnt/buildbot/build/mariadb-10.2.10/sql/mysqld[0x8af68fb]
/mnt/buildbot/build/mariadb-10.2.10/sql/mysqld[0x8af7910]
/mnt/buildbot/build/mariadb-10.2.10/sql/mysqld[0x8af812a]
/mnt/buildbot/build/mariadb-10.2.10/sql/mysqld[0x8af8397]
/mnt/buildbot/build/mariadb-10.2.10/sql/mysqld[0x8af84ab]
/mnt/buildbot/build/mariadb-10.2.10/sql/mysqld[0x8a1f7bd]
/mnt/buildbot/build/mariadb-10.2.10/sql/mysqld[0x8a1feaf]
/mnt/buildbot/build/mariadb-10.2.10/sql/mysqld[0x8a24b91]
/mnt/buildbot/build/mariadb-10.2.10/sql/mysqld[0x88b5219]
/mnt/buildbot/build/mariadb-10.2.10/sql/mysqld[0x88b57fa]
/mnt/buildbot/build/mariadb-10.2.10/sql/mysqld(_ZN7handler11ha_rnd_nextEPh+0x15e)[0x85aad1a]
/mnt/buildbot/build/mariadb-10.2.10/sql/mysqld(_Z13rr_sequentialP11READ_RECORD+0x6e)[0x8708346]
/mnt/buildbot/build/mariadb-10.2.10/sql/mysqld(_Z12mysql_updateP3THDP10TABLE_LISTR4ListI4ItemES6_PS4_jP8st_ordery15enum_duplicatesbPySB_+0x2132)[0x841edcc]
/mnt/buildbot/build/mariadb-10.2.10/sql/mysqld(_Z21mysql_execute_commandP3THD+0x3f07)[0x8340c80]
/mnt/buildbot/build/mariadb-10.2.10/sql/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x290)[0x834ca70]
/mnt/buildbot/build/mariadb-10.2.10/sql/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0xfae)[0x8339eff]
/mnt/buildbot/build/mariadb-10.2.10/sql/mysqld(_Z10do_commandP3THD+0x72c)[0x83388b3]
/mnt/buildbot/build/mariadb-10.2.10/sql/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x1bf)[0x8478abc]
/mnt/buildbot/build/mariadb-10.2.10/sql/mysqld(handle_one_connection+0x38)[0x847884a]
/mnt/buildbot/build/mariadb-10.2.10/sql/mysqld[0x87b57d8]
/lib/i386-linux-gnu/libpthread.so.0(+0x6d4c)[0xb74f0d4c]
/lib/i386-linux-gnu/libc.so.6(clone+0x5e)[0xb7300ace]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0xa6314380): update t2 set col150=@a
Connection ID (thread ID): 8
Status: NOT_KILLED



 Comments   
Comment by Elena Stepanova [ 2017-10-26 ]

It happened 4 times in buildbot, all times on 10.2 main tree, all different builders (p8, deb, bintar, fulltest2), 2017-10-12 , 2017-10-14, 2017-10-25, 2017-10-26. This is the first occurrence, but not necessarily the push that caused it: http://buildbot.askmonty.org/buildbot/builders/p8-rhel6-bintar/builds/1203 . It also has a decent stack trace:

10.2 a4fa940bad7f3a2c4afff776977ad5e46dfe95d3

#3  <signal handler called>
#4  ._ZL20btr_copy_blob_prefixPhmmmm () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/innobase/btr/btr0cur.cc:7443
#5  0x000000003b7deb1c in ._Z32btr_copy_externally_stored_fieldPmPKhRK11page_size_tmP16mem_block_info_t () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/innobase/btr/btr0cur.cc:7651
#6  0x000000003b0f8578 in ._ZL30row_sel_store_mysql_field_funcPhP14row_prebuilt_tPKhPKmmPK17mysql_row_templ_t () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/innobase/row/row0sel.cc:3025
#7  0x000000003b0f88fc in ._ZL23row_sel_store_mysql_recPhP14row_prebuilt_tPKhPK8dtuple_tbPK12dict_index_tPKm () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/innobase/row/row0sel.cc:3230
#8  0x000000003b728218 in ._Z15row_search_mvccPh15page_cur_mode_tP14row_prebuilt_tmm () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/innobase/row/row0sel.cc:5435
#9  0x000000003b6245f8 in ._ZN11ha_innobase13general_fetchEPhjj () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/innobase/handler/ha_innodb.cc:10152
#10 0x000000003b624b04 in ._ZN11ha_innobase8rnd_nextEPh () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/storage/innobase/handler/ha_innodb.cc:10365
#11 0x000000003b3e57a8 in ._ZN7handler11ha_rnd_nextEPh () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/sql/handler.cc:2576
#12 0x000000003b57c550 in ._Z13rr_sequentialP11READ_RECORD ()
#13 0x000000003b29480c in ._Z12mysql_updateP3THDP10TABLE_LISTR4ListI4ItemES6_PS4_jP8st_ordery15enum_duplicatesbPySB_ () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/sql/sql_update.cc:735
#14 0x000000003b1e0168 in ._Z21mysql_execute_commandP3THD () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/sql/sql_parse.cc:4268
#15 0x000000003b1e4d2c in ._Z11mysql_parseP3THDPcjP12Parser_statebb () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/sql/sql_parse.cc:7868
#16 0x000000003b1e8348 in ._Z16dispatch_command19enum_server_commandP3THDPcjbb () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/sql/sql_parse.cc:1812
#17 0x000000003b1e8e24 in ._Z10do_commandP3THD () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/sql/sql_parse.cc:1360
#18 0x000000003b2d9694 in ._Z24do_handle_one_connectionP7CONNECT () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/sql/sql_connect.cc:1354
#19 0x000000003b2d9844 in .handle_one_connection () at /home/buildbot/maria-slave/p8-rhel6-bintar/build/sql/sql_connect.cc:1260
#20 0x000000003ba6b34c in .pfs_spawn_thread ()
#21 0x00000fffa3a1c548 in .start_thread () from /opt/at7.0/lib64/power7/libpthread.so.0
#22 0x00000fffa3308454 in .__clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:104

Comment by Elena Stepanova [ 2017-10-26 ]

Alternatively, here is an unresolved stack trace, but for it we have a binary:
Binaries: http://hasky.askmonty.org/archive/10.2/build-16788/kvm-bintar-centos5-amd64/mariadb-10.2.10-linux-x86_64.tar.gz
The stack trace was produced on vm-centos6-amd64-install.qcow2.

10.2 a6a4c25bf7d943ff5a429fbf32fe04fdaca25b16

171025  7:35:06 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.2.10-MariaDB-log
key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=1
max_threads=153
thread_count=7
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 62996 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7fab240009a8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fab782b6e18 thread_stack 0x49000
/usr/local/mariadb-10.2.10-linux-x86_64/bin/mysqld(my_print_stacktrace+0x2e)[0xde488e]
/usr/local/mariadb-10.2.10-linux-x86_64/bin/mysqld(handle_fatal_signal+0x444)[0x7dd3a4]
/lib64/libpthread.so.0(+0xf4a0)[0x7fab7ef964a0]
/usr/local/mariadb-10.2.10-linux-x86_64/bin/mysqld[0xbd2cf6]
/usr/local/mariadb-10.2.10-linux-x86_64/bin/mysqld[0xbd8e9e]
/usr/local/mariadb-10.2.10-linux-x86_64/bin/mysqld[0xb293bb]
/usr/local/mariadb-10.2.10-linux-x86_64/bin/mysqld[0xb29743]
/usr/local/mariadb-10.2.10-linux-x86_64/bin/mysqld[0xb30f2e]
/usr/local/mariadb-10.2.10-linux-x86_64/bin/mysqld[0xa3480d]
/usr/local/mariadb-10.2.10-linux-x86_64/bin/mysqld[0xa34b5a]
/usr/local/mariadb-10.2.10-linux-x86_64/bin/mysqld(_ZN7handler11ha_rnd_nextEPh+0x84)[0x7e9724]
/usr/local/mariadb-10.2.10-linux-x86_64/bin/mysqld(_Z13rr_sequentialP11READ_RECORD+0x36)[0x95ef96]
/usr/local/mariadb-10.2.10-linux-x86_64/bin/mysqld(_Z12mysql_updateP3THDP10TABLE_LISTR4ListI4ItemES6_PS4_jP8st_ordery15enum_duplicatesbPySB_+0xbd1)[0x69c7a1]
/usr/local/mariadb-10.2.10-linux-x86_64/bin/mysqld(_Z21mysql_execute_commandP3THD+0x7610)[0x5f0650]
/usr/local/mariadb-10.2.10-linux-x86_64/bin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x3a2)[0x5f3eb2]
/usr/local/mariadb-10.2.10-linux-x86_64/bin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x26ef)[0x5f6dcf]
/usr/local/mariadb-10.2.10-linux-x86_64/bin/mysqld(_Z10do_commandP3THD+0x15d)[0x5f771d]
/usr/local/mariadb-10.2.10-linux-x86_64/bin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x235)[0x6dc155]
/usr/local/mariadb-10.2.10-linux-x86_64/bin/mysqld(handle_one_connection+0x3f)[0x6dc31f]
/usr/local/mariadb-10.2.10-linux-x86_64/bin/mysqld[0xa03bc9]
/lib64/libpthread.so.0(+0x77f1)[0x7fab7ef8e7f1]
/lib64/libc.so.6(clone+0x6d)[0x7fab7df0eccd]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7fab2400f040): update t2 set col150=@a
Connection ID (thread ID): 8
Status: NOT_KILLED

Comment by Marko Mäkelä [ 2017-10-26 ]

Resolved symbols:

btr/btr0cur.cc:7761 btr_copy_blob_prefix()
row/row0sel.cc:3029 btr_copy_externally_stored_field()
row/row0sel.cc:3231 row_sel_store_mysql_rec()
row/row0sel.cc:5432 row_search_mvcc()
ha_innodb.cc:10154 ha_innobase::general_fetch()
ha_innodb.cc:10365 ha_innobase::rnd_next()

The SIGSEGV occurs here:

   0x0000000000bd2cf1 <+353>:	
    callq  0xc01080 <buf_page_get_gen(page_id_t const&, page_size_t const&, unsigned long, buf_block_t*, unsigned long, char const*, unsigned int, mtr_t*, dberr_t*)>
   0x0000000000bd2cf6 <+358>:	mov    0xa0(%rax),%rbx

The corresponding source code line are:

		block = buf_page_get(page_id_t(space_id, page_no),
				     univ_page_size, RW_S_LATCH, &mtr);
		buf_block_dbg_add_level(block, SYNC_EXTERN_STORAGE);
		page = buf_block_get_frame(block);

That is, we are accessing block->frame, even though block=NULL was returned (%rax is the return value register in the AMD64 ABI). The offset 0xa0 indeed is the one of buf_block_t::frame:

(gdb) p ((buf_block_t*)0)->frame
Cannot access memory at address 0xa0

Comment by Marko Mäkelä [ 2017-10-26 ]

According to the server error log of one failure, block=NULL was returned because of a failed page read.
In my opinion, the corruption is more severe than our inability to deal with it. (MDEV-13542 has already been filed to improve the handling.)
We must definitely try to find out what causes the corruption. It appears to be tied to operations on larger InnoDB tables. See also MDEV-12235, which reports a crash somewhere else due to a corrupted page, in the test sys_vars.log_slow_admin_statements_func.

Comment by Elena Stepanova [ 2017-10-26 ]

The similar failure on sys_vars.log_slow_admin_statements_func from comments to MDEV-12235 is reproducible rather easily on a non-debug build on centos6-amd64 VM. The one with innodb.innodb-64k happens with less probability, but also happens on that VM with high enough --repeat.

The failures appeared in 10.2 tree with this commit:

commit b731a5bcf2a0c86b6e31e4a99e3c632bb39a9c53
Author: Vladislav Vaintroub <wlad@mariadb.com>
Date:   Sun Oct 8 00:13:05 2017 +0000
 
    Innodb : Refactor os_file_set_size() to be compatible 10.1
    
    The last parameter to this function is now,"bool is_sparse", like in 10.1
    rather than the  unused/useless "bool is_readonly", merged from MySQL 5.7
    
    Like in 10.1, this function now supports sparse files, and efficient
    platform specific mechanisms for file extension
    
    os_file_set_size() is now consistenly used in all places where
    innodb files are extended.

Per marko's suggestion, I tried to revert commits 18a979df6ffea027ae39da9a0923fde7ae552464 ff2d9e125f7f2a70f67bc4e7ad40990139a1cf66 b731a5bcf2a0c86b6e31e4a99e3c632bb39a9c53
together, it seems to make the failures go away.

Comment by Vladislav Vaintroub [ 2017-10-27 ]

https://github.com/MariaDB/server/commit/057a6cf768e8308b79bf571c30e62ca81d31d68d

Comment by Elena Stepanova [ 2017-10-29 ]

For the record, innodb.innodb-64k-crash was also failing in the similar way.

Generated at Thu Feb 08 08:11:10 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.