[MDEV-9549] Trying to decrypt a not encrypted page Created: 2016-02-11  Updated: 2016-03-12  Resolved: 2016-03-12

Status: Closed
Project: MariaDB Server
Component/s: Encryption, Storage Engine - InnoDB, Storage Engine - XtraDB
Affects Version/s: 10.1.10
Fix Version/s: 10.1.13

Type: Bug Priority: Major
Reporter: René Cannaò Assignee: Jan Lindström (Inactive)
Resolution: Fixed Votes: 1
Labels: None
Environment:

Ubuntu trusty



 Description   

Crashing bug, the server is trying to decrypt a page that is not encrypted.

/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x7f0fe2763a6e]
/usr/sbin/mysqld(handle_fatal_signal+0x38d)[0x7f0fe229216d]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7f0fe07ec340]
/usr/sbin/mysqld(_Z8do_cryptPKhjPhPjP20st_encryption_schemejjjyi+0x48)[0x7f0fe2246e48]
/usr/sbin/mysqld(encryption_scheme_decrypt+0x2b)[0x7f0fe22470bb]
/usr/sbin/mysqld(+0x9ac476)[0x7f0fe2691476]
/usr/sbin/mysqld(+0x948d33)[0x7f0fe262dd33]
/usr/sbin/mysqld(+0x949370)[0x7f0fe262e370]
/usr/sbin/mysqld(+0x9627c6)[0x7f0fe26477c6]
/usr/sbin/mysqld(+0x940acd)[0x7f0fe2625acd]
/usr/sbin/mysqld(+0x9c2760)[0x7f0fe26a7760]
/usr/sbin/mysqld(+0x90f547)[0x7f0fe25f4547]
/usr/sbin/mysqld(+0x9121e0)[0x7f0fe25f71e0]
/usr/sbin/mysqld(+0x9208f1)[0x7f0fe26058f1]
/usr/sbin/mysqld(+0x816c5c)[0x7f0fe24fbc5c]
/usr/sbin/mysqld(+0x819f54)[0x7f0fe24fef54]
/usr/sbin/mysqld(+0x91d59a)[0x7f0fe260259a]
/usr/sbin/mysqld(+0x8b4dd6)[0x7f0fe2599dd6]
/usr/sbin/mysqld(+0x8b1bc7)[0x7f0fe2596bc7]
/usr/sbin/mysqld(+0x8b2aca)[0x7f0fe2597aca]
/usr/sbin/mysqld(+0x8b33e8)[0x7f0fe25983e8]
/usr/sbin/mysqld(+0x87a067)[0x7f0fe255f067]
/usr/sbin/mysqld(+0x8e129e)[0x7f0fe25c629e]
/usr/sbin/mysqld(+0x8cefb6)[0x7f0fe25b3fb6]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f0fe07e4182]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f0fdff0747d]

Recompiled MariaDB 10.1.10 from source and debug symbols , the backtrace is:

(gdb) bt
#0  0x00005555560890b2 in fil_space_decrypt (crypt_data=0x0, tmp_frame=0x7fee38008000 "", page_size=16384, src_frame=0x7ff7778a8000 "i13\222", err=0x7feeb49b494c) at /root/mariadb/mariadb-10.1.10/storage/xtradb/fil/fil0crypt.cc:722
#1  0x000055555608930c in fil_space_decrypt (space=0, tmp_frame=0x7fee38008000 "", page_size=16384, src_frame=0x7ff7778a8000 "i13\222") at /root/mariadb/mariadb-10.1.10/storage/xtradb/fil/fil0crypt.cc:808
#2  0x000055555600f64c in buf_page_decrypt_after_read (bpage=0x7ff577c46700) at /root/mariadb/mariadb-10.1.10/storage/xtradb/buf/buf0buf.cc:6397
#3  0x000055555600b956 in buf_page_io_complete (bpage=0x7ff577c46700) at /root/mariadb/mariadb-10.1.10/storage/xtradb/buf/buf0buf.cc:4630
#4  0x0000555556029d7f in buf_read_page_low (err=0x7feeb49b4b5c, sync=true, mode=132, space=0, zip_size=0, unzip=0, tablespace_version=1, offset=768, trx=0x0, rbpage=0x7feeb49b4bd8) at /root/mariadb/mariadb-10.1.10/storage/xtradb/buf/buf0rea.cc:262
#5  0x000055555602a201 in buf_read_page (space=0, zip_size=0, offset=768, trx=0x0, bpage=0x7feeb49b4bd8) at /root/mariadb/mariadb-10.1.10/storage/xtradb/buf/buf0rea.cc:474
#6  0x0000555556007a3d in buf_page_get_gen (space=0, zip_size=0, offset=768, rw_latch=2, guess=0x0, mode=10, file=0x555556557020 "/root/mariadb/mariadb-10.1.10/storage/xtradb/include/fut0fut.ic", line=51, mtr=0x7feeb49b4d30, err=0x0)
    at /root/mariadb/mariadb-10.1.10/storage/xtradb/buf/buf0buf.cc:2942
#7  0x000055555609b1db in fut_get_ptr (space=0, zip_size=0, addr=..., rw_latch=2, mtr=0x7feeb49b4d30) at /root/mariadb/mariadb-10.1.10/storage/xtradb/include/fut0fut.ic:51
#8  0x000055555609ca5f in flst_validate (base=0x7fffe3fd004a "", mtr1=0x7feeb49b58b0) at /root/mariadb/mariadb-10.1.10/storage/xtradb/fut/fut0lst.cc:479
#9  0x0000555555fc84fe in btr_page_free_for_ibuf (index=0x55560c2fe058, block=0x7ffbc8f0c200, mtr=0x7feeb49b58b0) at /root/mariadb/mariadb-10.1.10/storage/xtradb/btr/btr0btr.cc:1336
#10 0x0000555555fc8741 in btr_page_free_low (index=0x55560c2fe058, block=0x7ffbc8f0c200, level=0, blob=false, mtr=0x7feeb49b58b0) at /root/mariadb/mariadb-10.1.10/storage/xtradb/btr/btr0btr.cc:1436
#11 0x0000555555fc88c1 in btr_page_free (index=0x55560c2fe058, block=0x7ffbc8f0c200, mtr=0x7feeb49b58b0) at /root/mariadb/mariadb-10.1.10/storage/xtradb/btr/btr0btr.cc:1486
#12 0x0000555555fce957 in btr_compress (cursor=0x7feeb49b57d0, adjust=0, mtr=0x7feeb49b58b0) at /root/mariadb/mariadb-10.1.10/storage/xtradb/btr/btr0btr.cc:4123
#13 0x0000555555fe0fde in btr_cur_compress_if_useful (cursor=0x7feeb49b57d0, adjust=0, mtr=0x7feeb49b58b0) at /root/mariadb/mariadb-10.1.10/storage/xtradb/btr/btr0cur.cc:3446
#14 0x0000555555fe1961 in btr_cur_pessimistic_delete (err=0x7feeb49b568c, has_reserved_extents=1, cursor=0x7feeb49b57d0, flags=0, rb_ctx=RB_NONE, mtr=0x7feeb49b58b0) at /root/mariadb/mariadb-10.1.10/storage/xtradb/btr/btr0cur.cc:3710
#15 0x0000555555e64fc2 in ibuf_delete_rec (space=92, page_no=490819, pcur=0x7feeb49b57d0, search_tuple=0x7fee38000ab8, mtr=0x7feeb49b58b0) at /root/mariadb/mariadb-10.1.10/storage/xtradb/ibuf/ibuf0ibuf.cc:4580
#16 0x0000555555e65e7c in ibuf_merge_or_delete_for_page (block=0x7ffbc8f0bec0, space=92, page_no=490819, zip_size=0, update_ibuf_bitmap=1) at /root/mariadb/mariadb-10.1.10/storage/xtradb/ibuf/ibuf0ibuf.cc:4947
#17 0x000055555600c03b in buf_page_io_complete (bpage=0x7ffbc8f0bec0) at /root/mariadb/mariadb-10.1.10/storage/xtradb/buf/buf0buf.cc:4840
#18 0x0000555556082410 in fil_aio_wait (segment=7) at /root/mariadb/mariadb-10.1.10/storage/xtradb/fil/fil0fil.cc:6034
#19 0x0000555555f73a86 in io_handler_thread (arg=0x555556d93d98 <n+56>) at /root/mariadb/mariadb-10.1.10/storage/xtradb/srv/srv0start.cc:542
#20 0x00007ffff7787182 in start_thread (arg=0x7feeb49b6700) at pthread_create.c:312
#21 0x00007ffff6eaa47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

key_version seems to be not zero:

(gdb) l
715
716             *err = DB_SUCCESS;
717
718             if (key_version == ENCRYPTION_KEY_NOT_ENCRYPTED) {
719                     return false;
720             }
721
722             ut_ad(crypt_data->encryption != FIL_SPACE_ENCRYPTION_OFF);
723
724             /* read space & offset & lsn */
(gdb) p key_version
$5 = 713

Although:
a) encryption was never enabled on this server
b) examining the page, the content is obviously not encrypted



 Comments   
Comment by Elena Stepanova [ 2016-02-12 ]

rcannao,

Are you sure this backtrace is of the crashed thread? I don't see the signal handler there. Could you please attach all threads' stack trace?

Please also paste and attach your cnf file(s) and specify which build you were using.

Thanks.

Comment by Valerie Parham-Thompson [ 2016-02-22 ]

Rene, Chris, Did you pinpoint a workload and setup that reproduced this error?

I upgraded a test server from 10.0.21 to 10.1.11 and ran a large amount of inserts and selects against it, with no issues. I also set up replication between 10.0.21 and 10.1.11 and ran concurrent inserts and selects, with no issues. This is on Centos7.

My tests were a very simple workload, though. I would like to continue to try to reproduce the error. What are the 10.0 and 10.1 versions you're using? OS and configs?

Comment by René Cannaò [ 2016-03-02 ]

I don't have a scientific reproducible test case for this, but I can share some interesting patterns I found.
a) if the server that crashes because of this bug is a slave, when started InnoDB will perform a crash recovery without crashing again. But as soon as replication is resumed, the server will crash immediately
b) if the server that crashes because of this bug is a master, when started InnoDB will perform a crash recovery without crashing again. The server can receive again writes without crashing (go to point #c for exceptions)

Based on #a and #b , it seems that what caused the crash is not present anymore after a crash recovery. But #a points that maybe a specific write (that is stored in the relay log) causes the server to crash again.

c) the master crashes when executing large transactions involving updating several hundreds of thousands of rows and deleting a similar amount
d) reducing the size of the transactions to tens of thousands of rows (instead of hundreds of thousands of rows) won't crash the server

I want to highlight again that I don't have a reproducible test case to share (no core dump, no specific queries), but #c and #d were quite common patterns.

As the crashes seem somehow related to the size of the transactions, my hypothesis is:
Is it possible that the crash is caused by FIL_PAGE_FLUSH_LSN being written to undo pages, and incorrectly interpreted as an encryption key during read?
According to https://dev.mysql.com/worklog/task/?id=7990 , FIL_PAGE_FLUSH_LSN "was also being written to InnoDB undo tablespace files, even though the fields are not going to be consulted on crash recovery." (and in fact, no crash during crash recovery).

Comment by Chris Calender (Inactive) [ 2016-03-02 ]

Similar to René, I don't have a reproducible test case for this either.

I did find out some extra details, which will hopefully help to isolate this.

1. This occurred after an in-place upgrade from MariaDB 10.0 to MariaDB 10.1.10

2. There had been a global tablespace error that also existed in the MariaDB 10.0 instance (but didn't cause problems in 10.0, so it went unnoticed). But when this tablespace error was encountered in 10.1.10 by mysqldump (note a mysqldump was in progress at the time of the crash), it triggered the crash. Unfortunately, I don't have further specifics regarding the exact tablespace error.

3. Here is the error log snippet of the crash & backtrace:

2016-02-25 19:42:46 140508033857280 [Warning] View 'test'.'vuser': there is unknown charset/collation names (client: ''; connection: '').
2016-02-25 19:42:46 140508033857280 [Warning] View 'test'.'vuser': there is unknown charset/collation names (client: ''; connection: '').
2016-02-25 19:42:46 140508033857280 [Warning] View 'test'.'vuser': there is unknown charset/collation names (client: ''; connection: '').
2016-02-25 19:43:40 140508084012800 [Warning] View 'test'.'vuser': there is unknown charset/collation names (client: ''; connection: '').
2016-02-25 19:43:40 140508084012800 [Warning] View 'test'.'vuser': there is unknown charset/collation names (client: ''; connection: '').
2016-02-25 19:43:40 140508084012800 [Warning] View 'test'.'vuser': there is unknown charset/collation names (client: ''; connection: '').
160225 19:51:53 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see http://kb.askmonty.org/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.1.10-MariaDB-enterprise-log
key_buffer_size=268435456
read_buffer_size=131072
max_used_connections=10
max_threads=502
thread_count=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1364731 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0x7fca36f5d008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fca966f7e00 thread_stack 0x48000
mysys/stacktrace.c:247(my_print_stacktrace)[0x7fca9723ae4b]
sql/signal_handler.cc:160(handle_fatal_signal)[0x7fca96d9bea5]
/lib64/libpthread.so.0(+0xf850)[0x7fca963b4850]
sql/encryption.cc:108(scheme_get_key)[0x7fca96d4fb33]
sql/encryption.cc:215(encryption_scheme_decrypt)[0x7fca96d4fdab]
fil/fil0crypt.cc:753(fil_space_decrypt(fil_space_crypt_struct*, unsigned char*, unsigned long, unsigned char*, dberr_t*))[0x7fca971ba134]
fil/fil0crypt.cc:810(fil_space_decrypt(unsigned long, unsigned char*, unsigned long, unsigned char*))[0x7fca971ba2ae]
buf/buf0buf.cc:6397(buf_page_decrypt_after_read(buf_page_t*))[0x7fca971586c3]
buf/buf0buf.cc:4630(buf_page_io_complete(buf_page_t*))[0x7fca97158a80]
buf/buf0rea.cc:262(buf_read_page_low(dberr_t*, bool, unsigned long, unsigned long, unsigned long, unsigned long, long, unsigned long, trx_t*, buf_page_t**))[0x7fca97170fd8]
buf/buf0rea.cc:474(buf_read_page(unsigned long, unsigned long, unsigned long, trx_t*, buf_page_t**))[0x7fca97171aef]
buf/buf0buf.cc:2942(buf_page_get_gen(unsigned long, unsigned long, unsigned long, unsigned long, buf_block_t*, unsigned long, char const*, unsigned long, mtr_t*, dberr_t*))[0x7fca97150ddf]
include/btr0btr.ic:62(btr_block_get_func)[0x7fca9713b154]
include/btr0pcur.ic:368(btr_pcur_move_to_next)[0x7fca970c98a8]
row/row0sel.cc:5129(row_search_for_mysql(unsigned char*, unsigned long, row_prebuilt_t*, unsigned long, unsigned long))[0x7fca970cd51e]
handler/ha_innodb.cc:9914(ha_innobase::general_fetch(unsigned char*, unsigned int, unsigned int))[0x7fca9700be0f]
sql/handler.cc:2577(handler::ha_rnd_next(unsigned char*))[0x7fca96da066f]
sql/records.cc:470(rr_sequential(READ_RECORD*))[0x7fca96e92480]
sql/sql_select.cc:18302(sub_select(JOIN*, st_join_table*, bool))[0x7fca96c613e2]
sql/sql_select.cc:17937(do_select)[0x7fca96c6de2d]
sql/sql_select.cc:3228(JOIN::exec_inner())[0x7fca96c7e9b0]
sql/sql_select.cc:2519(JOIN::exec())[0x7fca96c8092d]
sql/sql_select.cc:3453(mysql_select(THD*, Item**, TABLE_LIST, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*))[0x7fca96c7d2ea]
sql/sql_select.cc:384(handle_select(THD*, LEX*, select_result*, unsigned long))[0x7fca96c80c0d]
sql/sql_parse.cc:5903(execute_sqlcom_select)[0x7fca96c23b52]
sql/sql_parse.cc:2961(mysql_execute_command(THD*))[0x7fca96c2fdc3]
sql/sql_parse.cc:7302(mysql_parse)[0x7fca96c3313d]
sql/sql_parse.cc:1487(dispatch_command(enum_server_command, THD*, char*, unsigned int))[0x7fca96c35bf3]
sql/sql_parse.cc:1111(do_command(THD*))[0x7fca96c3618b]
sql/sql_connect.cc:1349(do_handle_one_connection(THD*))[0x7fca96cf150f]
sql/sql_connect.cc:1263(handle_one_connection)[0x7fca96cf1667]
/lib64/libpthread.so.0(+0x7806)[0x7fca963ac806]
/lib64/libc.so.6(clone+0x6d)[0x7fca94b3264d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7fca48ff2020): is an invalid pointer
Connection ID (thread ID): 3927
Status: NOT_KILLED

Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
160225 19:51:54 mysqld_safe Number of processes running now: 0
160225 19:51:54 mysqld_safe mysqld restarted

Comment by Elena Stepanova [ 2016-03-09 ]

rcannao,

Did you also have any kind of a global tablespace error before or upon the upgrade, similar to what ccalender mentioned in the previous comment? I suppose if you had it, it would show in the error log, could you maybe check it for errors?
Also, when you are saying

The master crashes when executing large transactions involving updating several hundreds of thousands of rows and deleting a similar amount

is it about concurrent updates/deletes, or sequential actions on the same table? Is it limited to the same table(s)? If so, can you paste SHOW CREATE and SHOW INDEX? Do you remember whether these tables were created in 10.0 or in previous versions?

Thanks.

Comment by Jan Lindström (Inactive) [ 2016-03-12 ]

commit f341d94423daa37bf4bee4d9b96ba8e8d93484c6
Author: Jan Lindström <jan.lindstrom@mariadb.com>
Date: Sat Mar 12 13:43:33 2016 +0200

MDEV-9549: Trying to decrypt a not encrypted page

Make sure that on decrypt we do not try to reference
NULL pointer and if page contains undefined
FIL_PAGE_FILE_FLUSH_LSN field on when page is not
the first page or page is not in system tablespace,
clear it.

Generated at Thu Feb 08 07:35:30 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.