[MDEV-11075] encryption tests fail on Power in buildbot Created: 2016-10-17  Updated: 2016-12-20  Resolved: 2016-12-09

Status: Closed
Project: MariaDB Server
Component/s: Encryption, Platform Power
Affects Version/s: 10.2.2
Fix Version/s: 10.2.3

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None

Sprint: 10.2.4-4

 Description   

http://buildbot.askmonty.org/buildbot/builders/p8-xenial-bintar/builds/264/steps/test/logs/stdio

Failing test(s): encryption.innochecksum encryption.innodb-bad-key-change encryption.innodb-bad-key-change5 innodb.innochecksum encryption.innodb-bad-key-change2 encryption.innodb-bad-key-change3 encryption.innodb-bad-key-change4 encryption.innodb-bad-key-shutdown

encryption.innochecksum 'cbc,xtradb'     w1 [ fail ]
        Test ended at 2016-10-17 14:35:59
 
CURRENT_TEST: encryption.innochecksum
Segmentation fault (core dumped)
mysqltest: At line 73: exec of '/var/lib/buildbot/maria-slave/p8-xenial-bintar/build/extra/innochecksum /var/lib/buildbot/maria-slave/p8-xenial-bintar/build/mysql-test/var/1/mysqld.1/data//test/t2.ibd' failed, error: 35584, status: 139, errno: 32
Output from before failure:
# Run innochecksum on t2

encryption.innodb-bad-key-change 'cbc,xtradb' w1 [ fail ]
        Test ended at 2016-10-17 14:36:17
 
CURRENT_TEST: encryption.innodb-bad-key-change
mysqltest: At line 65: query 'DROP TABLE t1' failed: 2013: Lost connection to MySQL server during query
...
2016-10-17 14:36:15 70366569397184 [ERROR] InnoDB: Block in space_id 8 in file test/t1 encrypted.
2016-10-17 14:36:15 70366569397184 [ERROR] InnoDB: However key management plugin or used key_id 1 is not found or used encryption algorithm or method does not match.
2016-10-17 14:36:15 70366569397184 [ERROR] InnoDB: Marking tablespace as missing. You may drop this table or install correct key management plugin and key file.
2016-10-17 14:36:15 70366569397184 [ERROR] InnoDB: Block in space_id 8 in file test/t1 encrypted.
2016-10-17 14:36:15 70366569397184 [ERROR] InnoDB: However key management plugin or used key_id 1 is not found or used encryption algorithm or method does not match.
2016-10-17 14:36:15 70366569397184 [ERROR] InnoDB: Marking tablespace as missing. You may drop this table or install correct key management plugin and key file.
...
161017 14:36:15 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.2.3-MariaDB-log
key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=1
max_threads=153
thread_count=2
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 62977 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x3fff5d624008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x3fff7e5f7078 thread_stack 0x48c00
/var/lib/buildbot/maria-slave/p8-xenial-bintar/build/sql/mysqld(my_print_stacktrace+0x68)[0x3e5e0088]
/var/lib/buildbot/maria-slave/p8-xenial-bintar/build/sql/mysqld(handle_fatal_signal+0x3d0)[0x3def7e20]
[0x3fff7f4804d8]
[0x0]
/var/lib/buildbot/maria-slave/p8-xenial-bintar/build/sql/mysqld(+0xacbdc4)[0x3e30bdc4]
/var/lib/buildbot/maria-slave/p8-xenial-bintar/build/sql/mysqld(+0xab7530)[0x3e2f7530]
/var/lib/buildbot/maria-slave/p8-xenial-bintar/build/sql/mysqld(+0xabcc8c)[0x3e2fcc8c]
/var/lib/buildbot/maria-slave/p8-xenial-bintar/build/sql/mysqld(+0xaf31cc)[0x3e3331cc]
/var/lib/buildbot/maria-slave/p8-xenial-bintar/build/sql/mysqld(+0xabe0f8)[0x3e2fe0f8]
/var/lib/buildbot/maria-slave/p8-xenial-bintar/build/sql/mysqld(+0xa82b9c)[0x3e2c2b9c]
/var/lib/buildbot/maria-slave/p8-xenial-bintar/build/sql/mysqld(+0xafc5d8)[0x3e33c5d8]
/var/lib/buildbot/maria-slave/p8-xenial-bintar/build/sql/mysqld(+0x9ef7e0)[0x3e22f7e0]
mysys/stacktrace.c:267(my_print_stacktrace)[0x3e231e98]
sql/signal_handler.cc:166(handle_fatal_signal)[0x3e23226c]
buf/buf0buf.cc:1259(buf_page_print(unsigned char const*, page_size_t const&, unsigned long))[0x3e1b5934]
buf/buf0rea.cc:217(buf_read_page_low)[0x3e1b67fc]
btr/btr0btr.cc:1112(btr_free_root_check)[0x3e1f8e60]
dict/dict0crea.cc:1165(dict_drop_index_tree(unsigned char*, btr_pcur_t*, mtr_t*))[0x3e107324]
que/que0que.cc:1038(que_thr_step)[0x3df02fe4]
/var/lib/buildbot/maria-slave/p8-xenial-bintar/build/sql/mysqld(_Z15ha_delete_tableP3THDP10handlertonPKcS4_S4_b+0x164)[0x3df03174]
handler/ha_innodb.cc:14942(ha_innobase::delete_table(char const*))[0x3dd86bc8]
sql/sql_table.cc:2468(mysql_rm_table_no_locks(THD*, TABLE_LIST*, bool, bool, bool, bool, bool))[0x3dd87578]
sql/sql_parse.cc:4647(mysql_execute_command(THD*))[0x3dcf4234]
sql/sql_parse.cc:7796(mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool))[0x3dcf972c]
sql/sql_parse.cc:1806(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool))[0x3dcfc5ec]
sql/sql_parse.cc:1366(do_command(THD*))[0x3dcfcf68]
sql/sql_connect.cc:1354(do_handle_one_connection(CONNECT*))[0x3dde9304]
sql/sql_connect.cc:1260(handle_one_connection)[0x3dde9564]
perfschema/pfs.cc:1862(pfs_spawn_thread)[0x3e4660c8]
/lib/powerpc64le-linux-gnu/libpthread.so.0(+0x84a0)[0x3fff7efa84a0]
/lib/powerpc64le-linux-gnu/libc.so.6(clone+0xe4)[0x3fff7eb27e74]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x3fff59d16020): DROP TABLE t1
Connection ID (thread ID): 2
Status: NOT_KILLED

etc.



 Comments   
Comment by Elena Stepanova [ 2016-11-23 ]

A failure in a non-encryption test, not sure if it's related:
http://buildbot.askmonty.org/buildbot/builders/p8-trusty-bintar-debug/builds/332/steps/test/logs/stdio

innodb.innochecksum 'xtradb'             w3 [ fail ]
        Test ended at 2016-11-23 00:32:53
 
CURRENT_TEST: innodb.innochecksum
Segmentation fault (core dumped)
mysqltest: At line 61: exec of '/var/lib/buildbot/maria-slave/p8-trusty-bintar-debug/build/extra/innochecksum /var/lib/buildbot/maria-slave/p8-trusty-bintar-debug/build/mysql-test/var/3/mysqld.1/data//test/t2.ibd' failed, error: 35584, status: 139, errno: 32
Output from before failure:
# Run innochecksum on t2
 
 
 
The result from queries just before the failure was:
< snip >
INSERT INTO t1 (b) VALUES ('corrupt me');
INSERT INTO t1 (b) VALUES ('corrupt me');
CREATE TABLE t2 (a INT AUTO_INCREMENT PRIMARY KEY, b TEXT)
ROW_FORMAT=COMPRESSED ENGINE=InnoDB ;
INSERT INTO t2(b) SELECT b from t1;
CREATE TABLE t3 (a INT AUTO_INCREMENT PRIMARY KEY, b TEXT)
ROW_FORMAT=COMPRESSED ENGINE=InnoDB KEY_BLOCK_SIZE=16;
INSERT INTO t3(b) SELECT b from t1;
# Write file to make mysql-test-run.pl expect the "crash", but don't
# start it until it's told to
# We give 30 seconds to do a clean shutdown because we do not want
# to redo apply the pages of t1.ibd at the time of recovery.
# We want SQL to initiate the first access to t1.ibd.
# Wait until disconnected.
# Run innochecksum on t1
# Run innochecksum on t2

Comment by Marko Mäkelä [ 2016-11-30 ]

There are also failures with no core dump, but errno 32 (EPIPE), broken pipe:
http://buildbot.askmonty.org/buildbot/builders/p8-rhel71-bintar/builds/1539/steps/test/logs/stdio

encryption.innochecksum 'cbc,xtradb'     w1 [ fail ]
        Test ended at 2016-11-30 01:13:01
 
CURRENT_TEST: encryption.innochecksum
mysqltest: At line 73: exec of '/home/buildbot/maria-slave/power8-vlp06-bintar/build/extra/innochecksum /home/buildbot/maria-slave/power8-vlp06-bintar/build/mysql-test/var/1/mysqld.1/data//test/t2.ibd' failed, error: 11, status: 0, errno: 32

Comment by Marko Mäkelä [ 2016-12-08 ]

Recently, I have only seen failures where the innochecksum executable fails to run, like in the very first report. I have not observed mysqld server crashes, like in the second report (Lost connection to MySQL server).
The tests encryption.innochecksum and innodb.innochecksum do not look like their outcome could depend too much on what the server previously did. The tests invoke CREATE TABLE and INSERT to populate some .ibd files, and then invoke shutdown_server with a somewhat short-looking timeout of 30 seconds. Because there are no secondary indexes and no UPDATE or DELETE, the files would not be affected by InnoDB change buffering or purge.
The only nondeterminism should be whether the file pages were flushed to disk before innochecksum was invoked. If not, that should result in different output from innochecksum, but not in the failure to start up innochecksum.

I wonder if there is something unstable about the machine that would explain why an exec sometimes fails with EPIPE or in other strange ways. When I run these tests alone, they do not fail for me on the machine. And when I once tried to run the full test suite inside GNU Emacs, the Emacs process froze at some point (something that I never have experienced before). This would hint at some kind of hardware failure or memory corruption.

Could a reboot improve the situation?

Comment by Daniel Black [ 2016-12-09 ]

mysqlcippc64le:/mysql/mariadb-upstream-Release-10.2/mysql-test$ gdb --args  /mysql/mariadb-upstream-Release-10.2/bin/innochecksum var/mysqld.1/data/test/t2.ibd
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.04) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "powerpc64le-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /mysql/mariadb-upstream-Release-10.2/bin/innochecksum...done.
(gdb) run
Starting program: /mysql/mariadb-upstream-Release-10.2/bin/innochecksum var/mysqld.1/data/test/t2.ibd
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/powerpc64le-linux-gnu/libthread_db.so.1".
 
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x0000000020027ed4 in page_zip_calc_checksum (data=0x203b1d00, size=8192, algo=<optimized out>, use_legacy_big_endian=true) at /source/storage/innobase/page/page0zip.cc:4981
#2  0x000000002002829c in page_zip_verify_checksum (data=0x203b1d00, size=8192, page_no=0, strict_check=<optimized out>, is_log_enabled=<optimized out>, log_file=0x0)
    at /source/storage/innobase/page/page0zip.cc:5247
#3  0x0000000020027400 in buf_page_is_corrupted (check_lsn=<optimized out>, read_buf=0x203b1d00 "\257\261\207\220", page_size=..., skip_checksum=<optimized out>, page_no=0, strict_check=<optimized out>, 
    is_log_enabled=<optimized out>, log_file=0x0) at /source/storage/innobase/buf/buf0buf.cc:867
#4  0x00000000200212d8 in is_page_corrupted (page_size=..., buf=<optimized out>) at /source/extra/innochecksum.cc:567
#5  main (argc=1, argv=0x3ffffffff4a0) at /source/extra/innochecksum.cc:1837
(gdb) up
#1  0x0000000020027ed4 in page_zip_calc_checksum (data=0x203b1d00, size=8192, algo=<optimized out>, use_legacy_big_endian=true) at /source/storage/innobase/page/page0zip.cc:4981
 
(gdb) up
#2  0x000000002002829c in page_zip_verify_checksum (data=0x203b1d00, size=8192, page_no=0, strict_check=<optimized out>, is_log_enabled=<optimized out>, log_file=0x0)
    at /source/storage/innobase/page/page0zip.cc:5247
5247    in /source/storage/innobase/page/page0zip.cc
(gdb) p curr_algo
$1 = SRV_CHECKSUM_ALGORITHM_INNODB
(gdb) up
#3  0x0000000020027400 in buf_page_is_corrupted (check_lsn=<optimized out>, read_buf=0x203b1d00 "\257\261\207\220", page_size=..., skip_checksum=<optimized out>, page_no=0, strict_check=<optimized out>,
    is_log_enabled=<optimized out>, log_file=0x0) at /source/storage/innobase/buf/buf0buf.cc:867
warning: Source file is more recent than executable.
867                     return(!page_zip_verify_checksum(read_buf,
 
(gdb) p ut_crc32_legacy_big_endian
$3 = (ut_crc32_func_t) 0x0

ut_crc32_legacy_big_endian isn't initialised on Power (it wasn't part of 10.1).

By partial coincidence this is fixed in PR 212 (MDEV-9872)

https://github.com/MariaDB/server/pull/212/files#diff-0e6aff492e82acc8267ae3c804a57f70R704

Comment by Marko Mäkelä [ 2016-12-09 ]

Thank you very much, Daniel! I will merge PR 212.
I wonder why the test sometimes passed. If a core file was generated (like I think it should be by default on SIGSEGV if permitted by ulimit -c), mysql-test-run did not reveal its existence in the output.

Comment by Marko Mäkelä [ 2016-12-09 ]

It turns out that I was testing the 10.1 branch, not 10.2. That is why it did not fail for me. Sorry for the stupid mistake!
There is no legacy_big_endian in the MariaDB 10.1 or MySQL 5.6 code bases, so it seems to me that these versions will write non-portable files when using innodb_checksum_algorithm=crc32 on big-endian systems that use the original software implementation of CRC-32C which was intended for little-endian systems only.
This crash was introduced into MariaDB 10.2.2:

commit fec844aca88e1c6b9c36bb0b811e92d9d023ffb9
Author: Jan Lindström <jan.lindstrom@mariadb.com>
Date:   Tue Sep 6 09:43:16 2016 +0300
 
    Merge InnoDB 5.7 from mysql-5.7.14.

Generated at Thu Feb 08 07:47:07 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.